Parse a response text to json object
I am hitting an api endpoint using python requests to get some data and here is my template of my response.text(Actual response contains millions of event objects)
{"event":"Session","properties":{"time":"1642145186", "distinct_id":"ABC-123", "Region":"EU"}}
{"event":"Login","properties":{"time":"1642125126", "distinct_id":"ABC-123", "Region":"EU"}}
{"event":"Register","properties":{"time":"16432125126", "distinct_id":"ABC-123", "Region":"EU"}}
When I try to convert my response.text to json object using response.json()
or json.loads(json.dumps(response.text))
it throws the following error
json.decoder.JSONDecodeError: Extra data:
I do understand why it is throwing that error as the response.text is not in the right format what JSON object need to be and I would like to convert that text into a json object. Is it something we can do with regex so we can add a comma
at the end of each line(I am newbie with Regex concept)? Or any other option? I would really appreciate some help here.
Solution 1:
That data is ndjson.
The spec says:
Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character
\n
(0x0A). The newline character MAY be preceded by a carriage return\r
(0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
Since the delimeter is a newline you should split on newlines and decode each object individually.