Convert URL Request from JSON to Pandas DataFrame
I've tried a few tutorials on requesting and converting data from requests to pandas dataframes, however when it is converted to a pandas dataframe it shows no data. Please see below.
Attempt 1
First, check URL for a 200 status:
import requests
query = "https://data.epa.gov/efservice/PUB_DIM_FACILITY/ROWS/0:10/JSON"
response = requests.get(query)
print(response.status_code)
>>> 200
Second, check content of request. Looks good so far as there is multiple rows of data pulled from the url in JSON format.
s=requests.get(query).content
print(s)
>>> b'[{"FACILITY_ID":1000001,"LATITUDE":48.828707,"LONGITUDE":-122.685533,"CITY":"FERNDALE"
Lastly, convert s
request JSON data to Pandas Dataframe.
However, this does not work.
dataframe=pd.read_csv(io.StringIO(s.decode('utf-8')), encoding='utf-8', error_bad_lines=False)
dataframe.head()
>>> 0 rows × 459 columns
Attempt 2
import pandas as pd
import io
import requests
url = 'https://data.epa.gov/efservice/PUB_DIM_FACILITY/ROWS/0:10/JSON'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r.t
ext))
>>> 0 rows × 459 columns
What am I doing incorrect here? Any input would be appreciated. Thank you.
Solution 1:
Try this
import pandas as pd
df = pd.read_json('https://data.epa.gov/efservice/PUB_DIM_FACILITY/ROWS/0:10/JSON')
FACILITY_ID LATITUDE LONGITUDE CITY STATE ZIP COUNTY_FIPS COUNTY ADDRESS1 ADDRESS2 ... REPORTED_INDUSTRY_TYPES FACILITY_TYPES SUBMISSION_ID UU_RD_EXEMPT REPORTING_STATUS PROCESS_STATIONARY_CML COMMENTS RR_MRV_PLAN_URL RR_MONITORING_PLAN RR_MONITORING_PLAN_FILENAME
0 1000001 48.828707 -122.685533 FERNDALE WA 98248 53073 WHATCOM COUNTY 5105 LAKE TERRELL ROAD NaN ... D Direct Emitter 176997 NaN NaN NaN NaN NaN NaN NaN
1 1000001 48.828707 -122.685533 FERNDALE WA 98248 53073 WHATCOM COUNTY 5105 LAKE TERRELL ROAD NaN ... C Direct Emitter 5752 NaN NaN NaN NaN NaN NaN NaN
Solution 2:
The solutions by @Starbucks and @Marcello Chiuminatto work great for this use case. If you have more complex json you're trying to parse into a dataframe, then consider using pd.json_normalize
(docs). For this example, the following is equivalent to using pd.read_json
:
import requests
url = "https://data.epa.gov/efservice/PUB_DIM_FACILITY/ROWS/0:10/JSON"
json = requests.get(url).json()
df = pd.json_normalize(json)
When the json you're parsing into a df has many levels, this will give you more control over the level of depth to parse, what level to start parsing at, how to handle nested objects, etc. Check docs for more details. This has been quite useful when dealing with complex json objects that I needed to convert to specific dataframe formats.