In this tutorial, we will see how we can read data from a CSV file and save a pandas data-frame as a CSV (comma separated values) file in
Read CSV file in Pandas as Data Frame
read_csv() method of pandas will read the data from a comma-separated values file having .csv as a pandas data-frame and also provide some arguments to give some flexibility according to the requirement.
The official documentation provides the syntax below, We will learn the most commonly used among these in the following sections with an example.
pandas.read_csv(filepath, sep=',', header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, infer_datetime_format=False, keep_date_col=False, date_parser=None,iterator=False, chunksize=None, compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
1. Reading a CSV file:
In this example, we will try to read a CSV file using the below arguments along with the file path.
Country,Age,Salary,Purchased France,44,72000,No Spain,27,48000,Yes Germany,30,54000,No Spain,38,61000,No Germany,40,,Yes France,35,58000,Yes Spain,,52000,No France,48,79000,Yes Germany,50,83000,No France,37,67000,Yes
- file-path – This is the path to the file in string format.
- sep – It is the delimiter that tells the symbol to use for splitting the data.
- header – integer list of rows to be used as the columns. If multiple rows are passed then we will get a multi-column index data.
See the code below where we will use these arguments to read the file.
# importing pandas and giving an alias name import pandas as pd # URL of the data url = "home/user/kunalgupta2616/datasets/master/Data.csv" # method to be used to read the data data = pd.read_csv(url,header=,sep=',') print(data)
Country Age Salary Purchased 0 France 44.0 72000.0 No 1 Spain 27.0 48000.0 Yes 2 Germany 30.0 54000.0 No 3 Spain 38.0 61000.0 No 4 Germany 40.0 NaN Yes 5 France 35.0 58000.0 Yes 6 Spain NaN 52000.0 No 7 France 48.0 79000.0 Yes 8 Germany 50.0 83000.0 No 9 France 37.0 67000.0 Yes
2. Reading custom no. of rows and columns:
- usecols – List of column names from data to be read.
- index_col – This defines the names of row labels, it can be a column from the data or the list of integer or string, None by default.
- skiprows – list of rows number / No. or rows to be skipped from the top. It is 0-indexed.
- skipfooter – No. or rows to be skipped from the bottom.
- skip_blank_lines – If there is any blank line it will be skipped instead of using NaN.
- nrows – The number of rows to be read from the file.
Let’s see an example code to see some of these parameters.
import pandas as pd url = "home/user/kunalgupta2616/datasets/master/Data2.csv" data1 = pd.read_csv(url,usecols=['Country','Age','Purchased'],skiprows = [1,2],nrows=4,index_col='Country') print(data1)
Age Purchased Country Germany 30 No Spain 38 No Germany 40 Yes France 35 Yes
3. Parsing column containing Date:
For this example, we will be using employee data of an organization that can be found at this link.
- parse_dates – List of 0-indexed column numbers that can contain data containing dates.
Let us read top 10 rows of this data and parse a column containing dates using parse_dates argument. To verify that the column is of DateTime type, we will print the dtypes attribute.
import pandas as pd url = "https://raw.githubusercontent.com/kunalgupta2616/datasets/master/employees.csv" data2 = pd.read_csv(url,nrows=5,parse_dates=) print(data2.dtypes)
First Name object Gender object Start Date datetime64[ns] Last Login Time object Salary int64 Bonus % float64 Senior Management bool Team object dtype: object
These are the most commonly used arguments that are used when reading a CSV file in pandas. Let us see how we can save a data frame as a CSV file in pandas.
Happy Learning 🙂