First things first, before deep diving into Data Science and Machine Learning, I ll explore in this post, the famous Pandas Python library. Let s consider it a cheat sheet.
Pandas
Importing Pandas
import pandas as pd
Reading a file
df = pd.read_csv('source_file.csv', index_col='uuid')
This example is pretty self explanatory. After creating a Data Frame variable “df” we assign it the value of a .csv file through pandas “read_csv” function. This function can take several options, one of them being the .csv column that the Data Frame will use as index. #todo Other intersting options are …
DataFrame options
df.shape # Shows the size of your DataFrame
pd.set_option('display.max_column', 125)
pd.set_option('display.max_row', 125)
Filters & Display
Creating filters
filter_1 = (df['col_a'] == 'value_in_col_a')
filter_2 = (df['col_n'] == 'value_in_col_n')
filters = (
filter_1
& filter_2
)
Selecting columns we want to display
columns_to_display = ['col1', 'col2', 'col3', 'col4', 'col5']
Applying filters and columns to the DataFrame
df = df.loc[filters, columns_to_display].sort_values(by=['col3', 'col5'])
#todo : add column, apply function, apply style, export to csv/xlsx