How to convert raw javascript object to python dictionary?
When screen-scraping some website, I extract data from <script>
tags.
The data I get is not in standard JSON
format. I cannot use json.loads()
.
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = {'x':1, 'y':2, 'z':3}
Currently, I use regex
to transform the raw data to JSON
format.
But I feel pretty bad when I encounter complicated data structure.
Do you have some better solutions?
Solution 1:
demjson.decode()
import demjson
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = demjson.decode(js_obj)
jsonnet.evaluate_snippet()
import json, _jsonnet
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = json.loads(_jsonnet.evaluate_snippet('snippet', js_obj))
ast.literal_eval()
import ast
# from
js_obj = "{'x':1, 'y':2, 'z':3}"
# to
py_obj = ast.literal_eval(js_obj)
Solution 2:
I'm facing the same problem this afternoon, and I finally found a quite good solution. That is JSON5.
The syntax of JSON5 is more similar to native JavaScript, so it can help you parse non-standard JSON objects.
You might want to check pyjson5 out.
Solution 3:
This will likely not work everywhere, but as a start, here's a simple regex that should convert the keys into quoted strings so you can pass into json.loads. Or is this what you're already doing?
In[70] : quote_keys_regex = r'([\{\s,])(\w+)(:)'
In[71] : re.sub(quote_keys_regex, r'\1"\2"\3', js_obj)
Out[71]: '{"x":1, "y":2, "z":3}'
In[72] : js_obj_2 = '{x:1, y:2, z:{k:3,j:2}}'
Int[73]: re.sub(quote_keys_regex, r'\1"\2"\3', js_obj_2)
Out[73]: '{"x":1, "y":2, "z":{"k":3,"j":2}}'
Solution 4:
Use json5
import json5
js_obj = '{x:1, y:2, z:3}'
py_obj = json5.loads(js_obj)
print(py_obj)
# output
# {'x': 1, 'y': 2, 'z': 3}