Pandas basics
DataFrame is the fundamental data structure provided by Pandas. It is a class
with different attributes. In this article, df
is the name of the
dateframe.
Creating a dataframe from a file
import pandas as pd
df = pd.read_csv("filename.csv")
df = pd.read_csv("ES.txt",delimiter=',') #TXT file
What columns are there in the dataframe?
df.columns #Output is an Index - a pandas data structure. This doesn't mention all the column headers
df.columns.values #Prints ALL column headers. An Index
list(df.columns.values) #Writes columns headers into a list
The output is an ‘Index’ - a data structure provided by pandas.
df.columns[1] #Accessing a column by its index
df['col_name'] #Accessing a column by its name
Individual columns are of type Series - one more data structure provided by pandas. Printing a Series prints out column values with indices.
col_name = df['col_name']
type(col_name)
col_name[0] #Print first element of column col_name
col_name[col_name] #Slicing. The result is another Series
Making sense of information in columns
Suppose col_name
has five unique values, say 1 to 5. Since each of them can
appear in multiple rows, it makes sense to know that how many times they have
appeared. In other words, We want to know in how many rows 1 has appeared, in
how many rows 2 has appeared etc. The result is a series.
df.col_name.value_counts() #Counts the occurence of each value
df.col_name.value_counts()_sort_incex() #Sorts the list based on the values of #col_name