How to read a CSV file from a URL with Python?
when I do curl to a API call link http://example.com/passkey=wedsmdjsjmdd
curl 'http://example.com/passkey=wedsmdjsjmdd'
I get the employee output data on a csv file format, like:
"Steve","421","0","421","2","","","","","","","","","421","0","421","2"
how can parse through this using python.
I tried:
import csv
cr = csv.reader(open('http://example.com/passkey=wedsmdjsjmdd',"rb"))
for row in cr:
print row
but it didn't work and I got an error
http://example.com/passkey=wedsmdjsjmdd No such file or directory:
Thanks!
Solution 1:
Using pandas it is very simple to read a csv file directly from a url
import pandas as pd
data = pd.read_csv('https://example.com/passkey=wedsmdjsjmdd')
This will read your data in tabular format, which will be very easy to process
Solution 2:
You need to replace open
with urllib.urlopen or urllib2.urlopen.
e.g.
import csv
import urllib2
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
print row
This would output the following
Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
...
The original question is tagged "python-2.x", but for a Python 3 implementation (which requires only minor changes) see below.
Solution 3:
You could do it with the requests module as well:
url = 'http://winterolympicsmedals.com/medals.csv'
r = requests.get(url)
text = r.iter_lines()
reader = csv.reader(text, delimiter=',')
Solution 4:
To increase performance when downloading a large file, the below may work a bit more efficiently:
import requests
from contextlib import closing
import csv
url = "http://download-and-process-csv-efficiently/python.csv"
with closing(requests.get(url, stream=True)) as r:
reader = csv.reader(r.iter_lines(), delimiter=',', quotechar='"')
for row in reader:
# Handle each row here...
print row
By setting stream=True
in the GET request, when we pass r.iter_lines()
to csv.reader(), we are passing a generator to csv.reader(). By doing so, we enable csv.reader() to lazily iterate over each line in the response with for row in reader
.
This avoids loading the entire file into memory before we start processing it, drastically reducing memory overhead for large files.