Convert URL Request from JSON to Pandas DataFrame

I've tried a few tutorials on requesting and converting data from requests to pandas dataframes, however when it is converted to a pandas dataframe it shows no data. Please see below.

Attempt 1

First, check URL for a 200 status:

import requests
query = "https://data.epa.gov/efservice/PUB_DIM_FACILITY/ROWS/0:10/JSON"
response = requests.get(query)
print(response.status_code)
>>> 200

Second, check content of request. Looks good so far as there is multiple rows of data pulled from the url in JSON format.

s=requests.get(query).content
print(s)
>>> b'[{"FACILITY_ID":1000001,"LATITUDE":48.828707,"LONGITUDE":-122.685533,"CITY":"FERNDALE"

Lastly, convert s request JSON data to Pandas Dataframe.

However, this does not work.

dataframe=pd.read_csv(io.StringIO(s.decode('utf-8')), encoding='utf-8', error_bad_lines=False)
dataframe.head()
>>> 0 rows × 459 columns

Attempt 2

import pandas as pd
import io
import requests

url = 'https://data.epa.gov/efservice/PUB_DIM_FACILITY/ROWS/0:10/JSON'
r = requests.get(url)  
df = pd.read_csv(io.StringIO(r.t
ext))
>>> 0 rows × 459 columns

What am I doing incorrect here? Any input would be appreciated. Thank you.


Solution 1:

Try this

import pandas as pd

df = pd.read_json('https://data.epa.gov/efservice/PUB_DIM_FACILITY/ROWS/0:10/JSON')

FACILITY_ID LATITUDE    LONGITUDE   CITY    STATE   ZIP COUNTY_FIPS COUNTY  ADDRESS1    ADDRESS2    ... REPORTED_INDUSTRY_TYPES FACILITY_TYPES  SUBMISSION_ID   UU_RD_EXEMPT    REPORTING_STATUS    PROCESS_STATIONARY_CML  COMMENTS    RR_MRV_PLAN_URL RR_MONITORING_PLAN  RR_MONITORING_PLAN_FILENAME
0   1000001 48.828707   -122.685533 FERNDALE    WA  98248   53073   WHATCOM COUNTY  5105 LAKE TERRELL ROAD  NaN ... D   Direct Emitter  176997  NaN NaN NaN NaN NaN NaN NaN
1   1000001 48.828707   -122.685533 FERNDALE    WA  98248   53073   WHATCOM COUNTY  5105 LAKE TERRELL ROAD  NaN ... C   Direct Emitter  5752    NaN NaN NaN NaN NaN NaN NaN

Solution 2:

The solutions by @Starbucks and @Marcello Chiuminatto work great for this use case. If you have more complex json you're trying to parse into a dataframe, then consider using pd.json_normalize (docs). For this example, the following is equivalent to using pd.read_json:

import requests
url = "https://data.epa.gov/efservice/PUB_DIM_FACILITY/ROWS/0:10/JSON"
json = requests.get(url).json()
df = pd.json_normalize(json)

When the json you're parsing into a df has many levels, this will give you more control over the level of depth to parse, what level to start parsing at, how to handle nested objects, etc. Check docs for more details. This has been quite useful when dealing with complex json objects that I needed to convert to specific dataframe formats.