- Before we start I would like you to go through the below articles which will help you get started with Pandas.
Types of Data
Pandas in Python has the ability to read data and write data through a wide variety of resource. The four main data sources are below.
- CSV
- Excel
- HTML
- SQL
In order to work with SQL files or HTML data, you need to install the following using pip or conda.
pip install sqlalchemy
pip install lxml
pip install html5lib
pip install BeautifulSoup4
pip install xlrd
CSV
- Now let's read a CSV file and save it also into a new file. We need to keep in mind that it can only read the data, not the images, macros or formulas.
import pandas as pd
dataframe = pd.read_csv('example.csv')
dataframe.to_csv('output',index=False)
Excel
- Reading from an Excel file is also similar, here we do need to specify the sheet name as well and each sheet is treated as a data frame.
import pandas as pd
dataframe = pd.read_excel('new.xlsx',sheet_name='Sheet1')
print(dataframe)
HTML
- Let's now see how to read from an HTML link. For this, we are using the failed bank link -> fdic.gov/resources/resolutions/bank-failure.. . What Pandas tries to do, that it extracts all the data which it thinks is a table. I go through the HTML code, searches for the Table markings and makes a list of them before converting it to the data frame.
import pandas as pd
dataframe = pd.read_html('https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/')
print(dataframe)
SQL Data
- Now let's see the same with SQL data. Now we need to consider the fact that Pandas is not the very best way for getting the SQL data because there are already many flavours of SQL itself. First, we need to create a SQL like a database this can be done as follows.
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
- Now just add few values from our excel to the SQL and try to read it.
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('sqlite:///:memory:')
dataframe = pd.read_excel('new.xlsx',sheet_name='Sheet1')
dataframe.to_sql('my_table',engine)
sqldf = pd.read_sql('my_table',con=engine)
print(sqldf)
Thank-you!
I am glad you made it to the end of this article. I hope you got to learn something, if so please leave a Like which will encourage me for my upcoming write-ups.
- My GitHub Repos
- Connect with me on Linkedin
- Start your own blogs