#4 -> Operations in Pandas
- Before we start I would like you to go through the below articles which will help you get started with Pandas.
Unique Operation
- First, let's create a data frame.
import numpy as np
import pandas as pd
dataframe = pd.DataFrame({'col1':[1,2,3,4],
'col2':[444,555,666,444],
'col3':['abc','def','ghi','xyz']})
print(dataframe)
'''
OUTPUT->
col1 col2 col3
0 1 444 abc
1 2 555 def
2 3 666 ghi
3 4 444 xyz
'''
- How to find unique values in a data frame. If we want to find unique values in a particular column or row we can make use of .unique()
print(dataframe['col2'].unique())
'''
OUTPUT ->
[444 555 666]
'''
else if you just want to find the number of unique elements then instead of using .unique() you could make use of .nunique().
if you want to find how many time that particular unique values occurred in that column then we can do it as below.
print(dataframe['col2'].value_counts())
'''
OUTPUT ->
444 2
666 1
555 1
'''
Apply Operation
- So there are multiple functions that are actually builtin, but what if you want to customize your own function and apply them. Pandas have the ability to do that.
import numpy as np
import pandas as pd
def times2(x):
return x*2
dataframe = pd.DataFrame({'col1':[1,2,3,4],
'col2':[444,555,666,444],
'col3':['abc','def','ghi','xyz']})
print(dataframe['col2'].apply(times2))
'''
OUTPUT ->
0 888
1 1110
2 1332
3 888
'''
Sorting and Ordering
- Inorder to sort a column we can make use of the function .sort_values()
print(dataframe['col2'].sort_values())
print(dataframe.sort_values('col2'))
'''
OUTPUT ->
0 444
3 444
1 555
2 666
col1 col2 col3
0 1 444 abc
3 4 444 xyz
1 2 555 def
2 3 666 ghi
'''
Missing Data
A very useful way to find out if you have any null values in the data frame is using the function .isnull(). This will return in the boolean form.
A lot of time when you use Pandas to read-in data if you have missing point what will happen is that Pandas will automatically fill it will Null Value. We can try to change that to any n value that we want, let's see how we can do that.
First, we go ahead and create a data frame using a dictionary.
import numpy as np
import pandas as pd
from numpy.random import randn
dic = {'A':[1,2,np.nan],'B':[5,np.nan,np.nan],'C':[1,2,3]}
dataframe = pd.DataFrame(dic)
print(dataframe)
'''
OUTPUT ->
A B C
0 1.0 5.0 1
1 2.0 NaN 2
2 NaN NaN 3
'''
- A lot of time you might just want to drop the missing values from the database, for this we will make use of dataframe.dropna() but if you simply use this, it will drop any row which has a null value, by default if we want to drop the column then we can mention the dataframe.dropna(axis=1)
- We can also set the Threshold, when we give Threshold as a number it will keep that particular row that has at least that amount of non-zero values.
print(dataframe.dropna(thresh=2))
'''
OUTPUT->
A B C
0 1.0 5.0 1
1 2.0 NaN 2
'''
- What if you want to fill-in different values instead of NULL then we could make use of dataframe.fillna(value='NON')
print(dataframe.fillna(value='NON'))
'''
OUTPUT->
A B C
0 1.0 5.0 1
1 2.0 NON 2
2 NON NON 3
'''
- What if we want to fill this with the mean of the values.
print(dataframe.fillna(value=dataframe['A'].mean()))
'''
A B C
0 1.0 5.0 1
1 2.0 1.5 2
2 1.5 1.5 3
'''
Pivot Table
- This one is similar to what we have in Excel. Don't worry if you are not familiar with the same will now look into an example to see how this works. Let's first start with creating a new data frame.
import numpy as np
import pandas as pd
foo_bar = ('foo foo foo bar bar bar').split()
one_two = ('one one two two one one ').split()
x = ('x y z x y z').split()
data= {'A':foo_bar, 'B':one_two,'C':x , 'D': [1,3,2,5,4,1]}
dataframe = pd.DataFrame(data)
print(dataframe)
'''
OUTPUT ->
A B C D
0 foo one x 1
1 foo one y 3
2 foo two z 2
3 bar two x 5
4 bar one y 4
5 bar one z 1
'''
- let's now create a pivot table, which is just multi-level indexing.
print(dataframe.pivot_table(values='D',index=['A','B'],columns=['C']))
'''
OUTPUT->
C x y z
A B
bar one NaN 4.0 1.0
two 5.0 NaN NaN
foo one 1.0 3.0 NaN
two NaN NaN 2.0
'''
Thank-you!
I am glad you made it to the end of this article. I hope you got to learn something, if so please leave a Like which will encourage me for my upcoming write-ups.
- My GitHub Repos
- Connect with me on Linkedin
- Start your own blogs