Importing and Storing Data#

In this section, we will explore how to import comma-separated values (CSV) and Excel (XLSX) files into a DataFrame, as well as how to export DataFrames back to CSV or Excel formats using the Pandas library. We will also review the various parameters available to customize the import and export process effectively.

Since we will be using methods from the Pandas library, we first need to import it using:

import pandas as pd

read_csv()#

The read_csv() function reads a comma-separated values (CSV) file and loads it into a DataFrame.

  • The file path is provided as a string inside the read_csv() function, including the file name and its extension.

  • In the following code, the CSV file person_info_v1.csv is located in the sample_files folder.

pd.read_csv('sample_files/person_info_v1.csv')
Name Birthdate State Age Weight
0 Amy 10/25/1972 NY 34 150
1 Joe 2/5/2010 NJ 21 175
2 Mia 10/23/2015 AZ 67 130
3 Amy 11/6/1983 FL 19 145
4 Liz 4/19/1994 CA 29 150
5 John 8/29/2023 OH 34 170

index_col#

Specifies which column to use as the index of the DataFrame, using the column’s index number or label.

df = pd.read_csv('sample_files/person_info_v1.csv', index_col=0)
df
Birthdate State Age Weight
Name
Amy 10/25/1972 NY 34 150
Joe 2/5/2010 NJ 21 175
Mia 10/23/2015 AZ 67 130
Amy 11/6/1983 FL 19 145
Liz 4/19/1994 CA 29 150
John 8/29/2023 OH 34 170

parse_dates#

Specifies columns to be parsed as Timestamp objects during import.

The values in the Date column are currently stored as strings.

df.loc['Joe', 'Birthdate']
'2/5/2010'

In the following code, Date column values imported as a Timestamp object.

df = pd.read_csv('sample_files/person_info_v1.csv', index_col=0, parse_dates=['Birthdate'])
df
Birthdate State Age Weight
Name
Amy 1972-10-25 NY 34 150
Joe 2010-02-05 NJ 21 175
Mia 2015-10-23 AZ 67 130
Amy 1983-11-06 FL 19 145
Liz 1994-04-19 CA 29 150
John 2023-08-29 OH 34 170
df.loc['Joe', 'Birthdate']
Timestamp('2010-02-05 00:00:00')
type(df.loc['Joe', 'Birthdate'])
pandas._libs.tslibs.timestamps.Timestamp

usecols#

This parameter is used to specify which columns should be imported from the file.

pd.read_csv('sample_files/person_info_v2.csv',header=5, index_col=0, 
            usecols=['Name', 'Birthdate', 'State', 'Age', 'Weight'])
Birthdate State Age Weight
Name
Amy 10/25/1972 NY 34 150
Joe 2/5/2010 NJ 21 175
Mia 10/23/2015 AZ 67 130
Amy 11/6/1983 FL 19 145
Liz 4/19/1994 CA 29 150
John 8/29/2023 OH 34 170

read_excel()#

It is similar to the read_csv() method but is used to import data from Excel files.

to_csv()#

The to_csv() method is used to write a DataFrame to a comma-separated values (CSV) file.

The file path is passed as a string to the to_csv() function, including the file name and its extension.

In the following code, the DataFrame df is saved as a CSV file named countries.csv inside the sample_files folder.

df = pd.DataFrame( index=['France', 'Germany', 'North America', 'Japan'],
                 columns=['Continent', 'Company'],
                 data = [['Europe', 'Carrefour'], ['Europe', 'Siemens'],
                         ['USA', 'Apple'],
                         ['Asia', 'Sony']])
df
Continent Company
France Europe Carrefour
Germany Europe Siemens
North America USA Apple
Japan Asia Sony
df.to_csv('sample_files/countries.csv')

A screenshot of an Excel file containing index, 2 columns and 4 rows of data.

index#

Set the index parameter False to prevent the DataFrame’s index from being written to the output file.

df.to_csv('sample_files/countries.csv', index=False)

A screenshot of an Excel file containing 2 columns and 4 rows of data.

columns#

Specifies which columns of the DataFrame should be written to the output file.

df.to_csv('sample_files/countries.csv', columns=['Company'])

A screenshot of an Excel file containing 2 columns and 4 rows of data.

header#

Specifies whether to write column labels to the output file. It can also accept a list of custom labels.

df.to_csv('sample_files/countries.csv', header=False)

A screenshot of an Excel file containing 2 columns and 4 rows of data.

df.to_csv('sample_files/countries.csv', header=['CONTINENT', 'COMPANY'])

A screenshot of an Excel file containing 2 columns and 4 rows of data.

to_excel()#

It is similar to the to_csv() method but is used to save data as an Excel files.