Serialize in JSON a base64 encoded data

I'm writing a script to automate data generation for a demo and I need to serialize in a JSON some data. Part of this data is an image, so I encoded it in base64, but when I try to run my script I get:

Traceback (most recent call last):
  File "lazyAutomationScript.py", line 113, in <module>
    json.dump(out_dict, outfile)
  File "/usr/lib/python3.4/json/__init__.py", line 178, in dump
    for chunk in iterable:
  File "/usr/lib/python3.4/json/encoder.py", line 422, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.4/json/encoder.py", line 396, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.4/json/encoder.py", line 396, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.4/json/encoder.py", line 429, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.4/json/encoder.py", line 173, in default
    raise TypeError(repr(o) + " is not JSON serializable")
  TypeError: b'iVBORw0KGgoAAAANSUhEUgAADWcAABRACAYAAABf7ZytAAAABGdB...
     ...
   BF2jhLaJNmRwAAAAAElFTkSuQmCC' is not JSON serializable

As far as I know, a base64-encoded-whatever (a PNG image, in this case) is just a string, so it should pose to problem to serializating. What am I missing?


Solution 1:

You must be careful about the datatypes.

If you read a binary image, you get bytes. If you encode these bytes in base64, you get ... bytes again! (see documentation on b64encode)

json can't handle raw bytes, that's why you get the error.

I have just written some example, with comments, I hope it helps:

from base64 import b64encode
from json import dumps

ENCODING = 'utf-8'
IMAGE_NAME = 'spam.jpg'
JSON_NAME = 'output.json'

# first: reading the binary stuff
# note the 'rb' flag
# result: bytes
with open(IMAGE_NAME, 'rb') as open_file:
    byte_content = open_file.read()

# second: base64 encode read data
# result: bytes (again)
base64_bytes = b64encode(byte_content)

# third: decode these bytes to text
# result: string (in utf-8)
base64_string = base64_bytes.decode(ENCODING)

# optional: doing stuff with the data
# result here: some dict
raw_data = {IMAGE_NAME: base64_string}

# now: encoding the data to json
# result: string
json_data = dumps(raw_data, indent=2)

# finally: writing the json string to disk
# note the 'w' flag, no 'b' needed as we deal with text here
with open(JSON_NAME, 'w') as another_open_file:
    another_open_file.write(json_data)

Solution 2:

Alternative solution would be encoding stuff on the fly with a custom encoder:

import json
from base64 import b64encode

class Base64Encoder(json.JSONEncoder):
    # pylint: disable=method-hidden
    def default(self, o):
        if isinstance(o, bytes):
            return b64encode(o).decode()
        return json.JSONEncoder.default(self, o)

Having that defined you can do:

m = {'key': b'\x9c\x13\xff\x00'}
json.dumps(m, cls=Base64Encoder)

It will produce:

'{"key": "nBP/AA=="}'

Solution 3:

What am I missing?

The error is yelling that a binary is not JSON serializable.

from base64 import b64encode

# *binary representation* of the base64 string
assert b64encode(b"binary content")                 == b'YmluYXJ5IGNvbnRlbnQ='

# base64 string
assert b64encode(b"binary content").decode('utf-8') ==  'YmluYXJ5IGNvbnRlbnQ='

The latter is definitely "JSON serializable" because is the base64 string representation of the binary b"binary content".