Creating a nested dictionary from a flattened dictionary
I have a flattened dictionary which I want to make into a nested one, of the form
flat = {'X_a_one': 10,
'X_a_two': 20,
'X_b_one': 10,
'X_b_two': 20,
'Y_a_one': 10,
'Y_a_two': 20,
'Y_b_one': 10,
'Y_b_two': 20}
I want to convert it to the form
nested = {'X': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}},
'Y': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}}}
The structure of the flat dictionary is such that there should not be any problems with ambiguities. I want it to work for dictionaries of arbitrary depth, but performance is not really an issue. I've seen lots of methods for flattening a nested dictionary, but basically none for nesting a flattened dictionary. The values stored in the dictionary are either scalars or strings, never iterables.
So far I have got something which can take the input
test_dict = {'X_a_one': '10',
'X_b_one': '10',
'X_c_one': '10'}
to the output
test_out = {'X': {'a_one': '10',
'b_one': '10',
'c_one': '10'}}
using the code
def nest_once(inp_dict):
out = {}
if isinstance(inp_dict, dict):
for key, val in inp_dict.items():
if '_' in key:
head, tail = key.split('_', 1)
if head not in out.keys():
out[head] = {tail: val}
else:
out[head].update({tail: val})
else:
out[key] = val
return out
test_out = nest_once(test_dict)
But I'm having trouble working out how to make this into something which recursively creates all levels of the dictionary.
Any help would be appreciated!
(As for why I want to do this: I have a file whose structure is equivalent to a nested dict, and I want to store this file's contents in the attributes dictionary of a NetCDF file and retrieve it later. However NetCDF only allows you to put flat dictionaries as the attributes, so I want to unflatten the dictionary I previously stored in the NetCDF file.)
Here is my take:
def nest_dict(flat):
result = {}
for k, v in flat.items():
_nest_dict_rec(k, v, result)
return result
def _nest_dict_rec(k, v, out):
k, *rest = k.split('_', 1)
if rest:
_nest_dict_rec(rest[0], v, out.setdefault(k, {}))
else:
out[k] = v
flat = {'X_a_one': 10,
'X_a_two': 20,
'X_b_one': 10,
'X_b_two': 20,
'Y_a_one': 10,
'Y_a_two': 20,
'Y_b_one': 10,
'Y_b_two': 20}
nested = {'X': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}},
'Y': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}}}
print(nest_dict(flat) == nested)
# True
output = {}
for k, v in source.items():
# always start at the root.
current = output
# This is the part you're struggling with.
pieces = k.split('_')
# iterate from the beginning until the second to last place
for piece in pieces[:-1]:
if not piece in current:
# if a dict doesn't exist at an index, then create one
current[piece] = {}
# as you walk into the structure, update your current location
current = current[piece]
# The reason you're using the second to last is because the last place
# represents the place you're actually storing the item
current[pieces[-1]] = v
Here's one way using collections.defaultdict
, borrowing heavily from this previous answer. There are 3 steps:
- Create a nested
defaultdict
ofdefaultdict
objects. - Iterate items in
flat
input dictionary. - Build
defaultdict
result according to the structure derived from splitting keys by_
, usinggetFromDict
to iterate the result dictionary.
This is a complete example:
from collections import defaultdict
from functools import reduce
from operator import getitem
def getFromDict(dataDict, mapList):
"""Iterate nested dictionary"""
return reduce(getitem, mapList, dataDict)
# instantiate nested defaultdict of defaultdicts
tree = lambda: defaultdict(tree)
d = tree()
# iterate input dictionary
for k, v in flat.items():
*keys, final_key = k.split('_')
getFromDict(d, keys)[final_key] = v
{'X': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}},
'Y': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}}}
As a final step, you can convert your defaultdict
to a regular dict
, though usually this step is not necessary.
def default_to_regular_dict(d):
"""Convert nested defaultdict to regular dict of dicts."""
if isinstance(d, defaultdict):
d = {k: default_to_regular_dict(v) for k, v in d.items()}
return d
# convert back to regular dict
res = default_to_regular_dict(d)
The other answers are cleaner, but since you mentioned recursion we do have other options.
def nest(d):
_ = {}
for k in d:
i = k.find('_')
if i == -1:
_[k] = d[k]
continue
s, t = k[:i], k[i+1:]
if s in _:
_[s][t] = d[k]
else:
_[s] = {t:d[k]}
return {k:(nest(_[k]) if type(_[k])==type(d) else _[k]) for k in _}
You can use itertools.groupby
:
import itertools, json
flat = {'Y_a_two': 20, 'Y_a_one': 10, 'X_b_two': 20, 'X_b_one': 10, 'X_a_one': 10, 'X_a_two': 20, 'Y_b_two': 20, 'Y_b_one': 10}
_flat = [[*a.split('_'), b] for a, b in flat.items()]
def create_dict(d):
_d = {a:list(b) for a, b in itertools.groupby(sorted(d, key=lambda x:x[0]), key=lambda x:x[0])}
return {a:create_dict([i[1:] for i in b]) if len(b) > 1 else b[0][-1] for a, b in _d.items()}
print(json.dumps(create_dict(_flat), indent=3))
Output:
{
"Y": {
"b": {
"two": 20,
"one": 10
},
"a": {
"two": 20,
"one": 10
}
},
"X": {
"b": {
"two": 20,
"one": 10
},
"a": {
"two": 20,
"one": 10
}
}
}