How to completely traverse a complex dictionary of unknown depth?

Importing from JSON can get very complex and nested structures. For example:

{u'body': [{u'declarations': [{u'id': {u'name': u'i',
                                       u'type': u'Identifier'},
                               u'init': {u'type': u'Literal', u'value': 2},
                               u'type': u'VariableDeclarator'}],
            u'kind': u'var',
            u'type': u'VariableDeclaration'},
           {u'declarations': [{u'id': {u'name': u'j',
                                       u'type': u'Identifier'},
                               u'init': {u'type': u'Literal', u'value': 4},
                               u'type': u'VariableDeclarator'}],
            u'kind': u'var',
            u'type': u'VariableDeclaration'},
           {u'declarations': [{u'id': {u'name': u'answer',
                                       u'type': u'Identifier'},
                               u'init': {u'left': {u'name': u'i',
                                                   u'type': u'Identifier'},
                                         u'operator': u'*',
                                         u'right': {u'name': u'j',
                                                    u'type': u'Identifier'},
                                         u'type': u'BinaryExpression'},
                               u'type': u'VariableDeclarator'}],
            u'kind': u'var',
            u'type': u'VariableDeclaration'}],
 u'type': u'Program'}

What is the recommended way to walk complex structures like the above?

Apart of a few list there are mostly dictionaries, the structure can become even more imbricated so I need a general solution.


Solution 1:

You can use a recursive generator for converting your dictionary to flat lists.

def dict_generator(indict, pre=None):
    pre = pre[:] if pre else []
    if isinstance(indict, dict):
        for key, value in indict.items():
            if isinstance(value, dict):
                for d in dict_generator(value, pre + [key]):
                    yield d
            elif isinstance(value, list) or isinstance(value, tuple):
                for v in value:
                    for d in dict_generator(v, pre + [key]):
                        yield d
            else:
                yield pre + [key, value]
    else:
        yield pre + [indict]

It returns

[u'body', u'kind', u'var']
[u'init', u'declarations', u'body', u'type', u'Literal']
[u'init', u'declarations', u'body', u'value', 2]
[u'declarations', u'body', u'type', u'VariableDeclarator']
[u'id', u'declarations', u'body', u'type', u'Identifier']
[u'id', u'declarations', u'body', u'name', u'i']
[u'body', u'type', u'VariableDeclaration']
[u'body', u'kind', u'var']
[u'init', u'declarations', u'body', u'type', u'Literal']
[u'init', u'declarations', u'body', u'value', 4]
[u'declarations', u'body', u'type', u'VariableDeclarator']
[u'id', u'declarations', u'body', u'type', u'Identifier']
[u'id', u'declarations', u'body', u'name', u'j']
[u'body', u'type', u'VariableDeclaration']
[u'body', u'kind', u'var']
[u'init', u'declarations', u'body', u'operator', u'*']
[u'right', u'init', u'declarations', u'body', u'type', u'Identifier']
[u'right', u'init', u'declarations', u'body', u'name', u'j']
[u'init', u'declarations', u'body', u'type', u'BinaryExpression']
[u'left', u'init', u'declarations', u'body', u'type', u'Identifier']
[u'left', u'init', u'declarations', u'body', u'name', u'i']
[u'declarations', u'body', u'type', u'VariableDeclarator']
[u'id', u'declarations', u'body', u'type', u'Identifier']
[u'id', u'declarations', u'body', u'name', u'answer']
[u'body', u'type', u'VariableDeclaration']
[u'type', u'Program']

UPDATE: Fixed keys list from [key] + pre to pre + [key] as mentioned in comments.

Solution 2:

If you only need to walk the dictionary, I'd suggest using a recursive walk function that takes a dictionary and then recursively walks through its elements. Something like this:

def walk(node):
    for key, item in node.items():
        if item is a collection:
            walk(item)
        else:
            It is a leaf, do your thing

If you also want to search for elements, or query several elements that pass certain criteria, have a look at the jsonpath module.

Solution 3:

Instead of writing your own parser, depending on the task, you could extend encoders and decoders from the standard library json module.

I recommend this especially if you need to encode objects belonging to custom classes into the json. If you have to do some operation which could be done also on a string representation of the json, consider also iterating JSONEncoder().iterencode

For both the reference is http://docs.python.org/2/library/json.html#encoders-and-decoders