Functions that help to understand json(dict) structure
Solution 1:
Here are a family of recursive generators that can be used to search through an object composed of dicts and lists. find_key
yields a tuple containing a list of the dictionary keys and list indices that lead to the key that you pass in; the tuple also contains the value associated with that key. Because it's a generator it will find all matching keys if the object contains multiple matching keys, if desired.
def find_key(obj, key):
if isinstance(obj, dict):
yield from iter_dict(obj, key, [])
elif isinstance(obj, list):
yield from iter_list(obj, key, [])
def iter_dict(d, key, indices):
for k, v in d.items():
if k == key:
yield indices + [k], v
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
def iter_list(seq, key, indices):
for k, v in enumerate(seq):
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
# test
data = {
'1_data': {
'4_data': [
{'5_data': 'hooray'},
{'3_data': 'hooray2'}
],
'2_data': []
}
}
for t in find_key(data, '3_data'):
print(t)
output
(['1_data', '4_data', 1, '3_data'], 'hooray2')
To get a single key list you can pass find_key
to the next
function. And if you want to use a key list to fetch the associated value you can use a simple for
loop.
seq, val = next(find_key(data, '3_data'))
print('seq:', seq, 'val:', val)
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: ['1_data', '4_data', 1, '3_data'] val: hooray2
obj: hooray2 True
If the key may be missing, then give next
an appropriate default tuple. Eg:
seq, val = next(find_key(data, '6_data'), ([], None))
print('seq:', seq, 'val:', val)
if seq:
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: [] val: None
Note that this code is for Python 3. To run it on Python 2 you need to replace all the yield from
statements, eg replace
yield from iter_dict(obj, key, [])
with
for u in iter_dict(obj, key, []):
yield u
How it works
To understand how this code works you need to be familiar with recursion and with Python generators. You may also find this page helpful: Understanding Generators in Python; there are also various Python generators tutorials available online.
The Python object returned by json.load
or json.loads
is generally a dict, but it can also be a list. We pass that object to the find_key
generator as the obj
arg, along with the key
string that we want to locate. find_key
then calls either iter_dict
or iter_list
, as appropriate, passing them the object, the key, and an empty list indices
, which is used to collect the dict keys and list indices that lead to the key we want.
iter_dict
iterates over each (k, v) pair at the top level of its d
dict arg. If k
matches the key we're looking for then the current indices
list is yielded with k
appended to it, along with the associated value. Because iter_dict
is recursive the yielded (indices list, value) pairs get passed up to the previous level of recursion, eventually making their way up to find_key
and then to the code that called find_key
. Note that this is the "base case" of our recursion: it's the part of the code that determines whether this recursion path leads to the key we want. If a recursion path never finds a key matching the key we're looking for then that recursion path won't add anything to indices
and it will terminate without yielding anything.
If the current v
is a dict, then we need to examine all the (key, value) pairs it contains. We do that by making a recursive call to iter_dict
, passing that v
is its starting object and the current indices
list. If the current v
is a list we instead call iter_list
, passing it the same args.
iter_list
works similarly to iter_dict
except that a list doesn't have any keys, it only contains values, so we don't perform the k == key
test, we just recurse into any dicts or lists that the original list contains.
The end result of this process is that when we iterate over find_key
we get pairs of (indices, value) where each indices
list is the sequence of dict keys and list indices that succesfully terminate in a dict item with our desired key, and value
is the value associated with that particular key.
If you'd like to see some other examples of this code in use please see how to modify the key of a nested Json and How can I select deeply nested key:values from dictionary in python.
Also take look at my new, more streamlined show_indices
function.