First things first, before deep diving into Data Science and Machine Learning, I ll explore in this post, the famous Pandas Python library. Let s consider it a cheat sheet.

Pandas

Importing Pandas

import pandas as pd

Reading a file

df = pd.read_csv('source_file.csv', index_col='uuid')

This example is pretty self explanatory. After creating a Data Frame variable “df” we assign it the value of a .csv file through pandas “read_csv” function. This function can take several options, one of them being the .csv column that the Data Frame will use as index. #todo Other intersting options are …

DataFrame options

df.shape # Shows the size of your DataFrame
pd.set_option('display.max_column', 125)
pd.set_option('display.max_row', 125)

Filters & Display

Creating filters

filter_1 = (df['col_a'] == 'value_in_col_a')
filter_2 = (df['col_n'] == 'value_in_col_n')

filters = (
        filter_1
        & filter_2
)

Selecting columns we want to display

columns_to_display = ['col1', 'col2', 'col3', 'col4', 'col5']

Applying filters and columns to the DataFrame

df = df.loc[filters, columns_to_display].sort_values(by=['col3', 'col5'])

#todo : add column, apply function, apply style, export to csv/xlsx