#5 -> Data Input and Output in Pandas

#5 -> Data Input and Output in Pandas

  • Before we start I would like you to go through the below articles which will help you get started with Pandas.

Types of Data

image.png Pandas in Python has the ability to read data and write data through a wide variety of resource. The four main data sources are below.

  • CSV
  • Excel
  • HTML
  • SQL

In order to work with SQL files or HTML data, you need to install the following using pip or conda.

pip install sqlalchemy
pip install lxml
pip install html5lib 
pip install BeautifulSoup4
pip install xlrd

CSV

  • Now let's read a CSV file and save it also into a new file. We need to keep in mind that it can only read the data, not the images, macros or formulas.
import pandas as pd 
dataframe = pd.read_csv('example.csv')
dataframe.to_csv('output',index=False)

Excel

  • Reading from an Excel file is also similar, here we do need to specify the sheet name as well and each sheet is treated as a data frame.
import pandas as pd 
dataframe = pd.read_excel('new.xlsx',sheet_name='Sheet1')
print(dataframe)

HTML

  • Let's now see how to read from an HTML link. For this, we are using the failed bank link -> fdic.gov/resources/resolutions/bank-failure.. . What Pandas tries to do, that it extracts all the data which it thinks is a table. I go through the HTML code, searches for the Table markings and makes a list of them before converting it to the data frame.
import pandas as pd 
dataframe = pd.read_html('https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/')
print(dataframe)

SQL Data

  • Now let's see the same with SQL data. Now we need to consider the fact that Pandas is not the very best way for getting the SQL data because there are already many flavours of SQL itself. First, we need to create a SQL like a database this can be done as follows.
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
  • Now just add few values from our excel to the SQL and try to read it.
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('sqlite:///:memory:')
dataframe = pd.read_excel('new.xlsx',sheet_name='Sheet1')
dataframe.to_sql('my_table',engine)
sqldf = pd.read_sql('my_table',con=engine)
print(sqldf)

Thank-you!

I am glad you made it to the end of this article. I hope you got to learn something, if so please leave a Like which will encourage me for my upcoming write-ups.