Pandas is a versatile and powerful data analysis library in Python, known for its easy-to-use data structures and tools for handling complex datasets. It simplifies data manipulation, cleaning, and transformation, making it a go-to choice for data scientists and analysts.
CSV files, being lightweight and universally readable, are an ideal format for storing and sharing data. This guide will show you how to use pandas to read a CSV file from both your local directory, and from an online source. We will also include some extra information on how to make your CSV files available for download with CSV Getter.
If you have installed python or python3, then this can be done easily with pip or pip3 respectively. The following commands can be used in terminal.
# with pip
pip install pandas
# with pip3
pip3 install pandas
Unsure what pip is? Check out this W3Schools Article.
It is best when you do this in a new folder or directory to keep things organised. If you want to do this in terminal, navigate to your working directory with a command like cd <path_to_my_folder>
. When you are in the right place, create a python file with the following command.
touch main.py
Alternatively, head to your new folder in you favourite development software, and create the file in the UI.
In this case, our data is called data.csv and we will put it in the same directory as our python file for simplicity. This way, when it comes to specifying the filepath in the code, it will be more simple.
Paste the following code block into your main.py file.
import pandas as pd
df = pd.read_csv("data.csv")
print(df.head())
import pandas as pd
makes the pandas directory available to use in our script, under the pseudonym pd
. This means all pandas functions can be used with the prefix pd.
.
df = pd.read_csv("data.csv")
turns a file into a pandas dataframe called df
. The filepath is simply the filename, data.csv. (This is because we placed the file in the same directory as the script). A pandas dataframe is a powerful data object that can be used for analysis.
print(df.head())
will print a sample of the data. df.head()
is a shorter version of df
with only the first 5 rows of data. print()
is the python command for displaying content in the terminal.
Now your CSV file is a pandas dataframe, you can explore the library to perform data analysis in python, and other useful functionality.
Below is a URL which will download the following sample data:
https://api.csvgetter.com/yj4iuhzWUJnVSE9U44A1
Name,Age,Gender,Occupation
John Doe,35,Male,Engineer
Jane Smith,28,Female,Doctor
Michael Johnson,42,Male,Teacher
Emily Brown,31,Female,Software Developer
David Wilson,45,Male,Manager
Online URLs like this can be used similar to a filepath with the pandas read_csv()
method.
The following code will load the online CSV data into a pandas dataframe. Give it a try by running the code.
import pandas as pd
df = pd.read_csv("https://api.csvgetter.com/yj4iuhzWUJnVSE9U44A1")
print(df.head())
You can make a CSV download link like the one above by uploading your CSV here.
Build a CSV API in a few clicks with CSV Getter