#2 -> More on Data Frames in Pandas

#2 -> More on Data Frames in Pandas

  • Before we start I would like you to go through the below article which will help you get started with Pandas.

    Introduction to Pandas

Let's first create a data frame, we are using the same from the above article, just in case you want to recreate it, here is the code.

import numpy as np 
import pandas as pd 
from numpy.random import randn
np.random.seed(101)
dataframe = pd.DataFrame(randn(5,4),['A','B','C','D','E'],['W','X','Y','Z'])
          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
C -2.018168  0.740122  0.528813 -0.589001
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509

Conditional Selection

  • The conditional selection in Pandas is somehow similar to what we had in NumPy. So if you want to check for elements greater than 0, then it will return in Boolean. print(dataframe > 0)
       W      X      Y      Z
A   True   True   True   True
B   True  False  False   True
C  False   True   True  False
D   True  False  False   True
E   True   True   True   True
  • Now if you want to print the elements corresponding to these booleans then we can simply do. print(dataframe[dataframe>0])
          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118       NaN       NaN  0.605965
C       NaN  0.740122  0.528813       NaN
D  0.188695       NaN       NaN  0.955057
E  0.190794  1.978757  2.605967  0.683509
  • Assume you don't want to get the null values then, we can do that instead of passing the whole data frame we can pass the particular column. print(dataframe[dataframe['W']>0])
          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509
  • What if you just want the X column where the corresponding to W column which is having values greater than 0. print(dataframe[dataframe['W']>0]['X'])
A    0.628133
B   -0.319318
D   -0.758872
E    1.978757

Multiple conditions

  • Most of the times we might need multiple conditions to works in one go, assume you want the values greater than 0 in the W column and the values greater than 1 in the Y column. You could do something like this-> print(dataframe[(dataframe['W'] > 0)&(dataframe['Y']>1)])
          W         X         Y         Z
E  0.190794  1.978757  2.605967  0.683509
  • When you try to use multiple conditions you can't really use Python's conditional operator and hence we use the & sign. If you want the or operator we can use the pipe | sign.

Reseting the Index

  • In order to reset the Index to the default we can use the following function dataframe.reset_index(inplace=True) . Remember if you don't use inplace = True then it will not get permanent and if you again print the data frame, you will get the old data frame.

  • Now I will create a new Index list and try to set it to the data frame.

import numpy as np 
import pandas as pd 
from numpy.random import randn
np.random.seed(101)
dataframe = pd.DataFrame(randn(5,4),['A','B','C','D','E'],['W','X','Y','Z'])
new_index = 'MP AP UP TN WB'.split()
print(new_index)
#OUTPUT -> ['MP', 'AP', 'UP', 'TN', 'WB']
  • Now let's try to resent the Index of the data frame.
dataframe['States'] = new_index
dataframe.set_index('States',inplace=True)
print(dataframe)
  • The output will be.
               W         X         Y         Z
States
MP      2.706850  0.628133  0.907969  0.503826
AP      0.651118 -0.319318 -0.848077  0.605965
UP     -2.018168  0.740122  0.528813 -0.589001
TN      0.188695 -0.758872 -0.933237  0.955057
WB      0.190794  1.978757  2.605967  0.683509

Multi-Level Indexing

  • Here in order to create multilevel indexing, we will be using a special function which is available under PANDAS.
import numpy as np 
import pandas as pd 
outside = (' G1 G1 G1 G2 G2 G2').split()
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside)) #Just to create a tuple
print(hier_index )
#OUTPUT -> [('G1', 1), ('G1', 2), ('G1', 3), ('G2', 1), ('G2', 2), ('G2', 3)]

hier_index = pd.MultiIndex.from_tuples(hier_index) #Main function to create a multilevel Indexing
print(hier_index)
'''
Output will be ->
MultiIndex([('G1', 1),
            ('G1', 2),
            ('G1', 3),
            ('G2', 1),
            ('G2', 2),
            ('G2', 3)],
           )
'''

#Now create a dataframe
dataframe = pd.DataFrame(randn(6,2),hier_index,['A','B'])
print(dataframe)
'''
             A         B
G1 1  1.913472  0.444590
   2 -1.013842 -1.064901
   3 -1.102353  0.255780
G2 1 -1.300105 -0.552788
   2  0.704361  0.850760
   3  0.199433  0.864586
'''
  • Now in order to select items from the data frame we can use the following code.
dataframe = pd.DataFrame(randn(6,2),hier_index,['A','B'])
print(dataframe.loc['G1'].loc[1])  
#The Idea here is first you call the outside index then you keep on calling the inside index using the **loc function**
'''
OUTPUT is ->
A    1.090103
B   -1.771748  
'''
  • Now in order to name the outside and the inside Index columns we can use the following function.
dataframe.index.names = ['Groups','Nums']
print(dataframe)

'''
OUTPUT ->
                    A         B
Groups Nums
G1     1    -1.456893  0.642521
       2    -0.266246 -0.880315
       3    -2.137056  0.451063
G2     1     0.362317 -1.317669
       2     1.165863 -0.823856
       3    -1.090601  0.420701
'''
  • Another way of grabbing the group is by using the Cross-section function . Let's see how to do that.
print(dataframe.xs('G1'))
'''
OUTPUT ->
             A         B
Nums
1    -1.035331  2.059875
2     0.731371  2.915778
3     1.312425 -0.988248
'''
  • Now for .loc() it will be a bit tricky to get the 1st sub row from both G1 and G2, hence we make use of .xs().
print(dataframe.xs(1,level='Nums'))

'''
OUTPUT -> 
               A         B
Groups
G1      0.064494 -0.254242
G2      0.664111  0.722540
'''

Thank-you!

I am glad you made it to the end of this article. I hope you got to learn something, if so please leave a Like which will encourage me for my upcoming write-ups.