Python load json file with UTF-8 BOM header

Solution 1:

You can open with codecs:

import json
import codecs

json.load(codecs.open('sample.json', 'r', 'utf-8-sig'))

or decode with utf-8-sig yourself and pass to loads:

json.loads(open('sample.json').read().decode('utf-8-sig'))

Solution 2:

Simple! You don't even need to import codecs.

with open('sample.json', encoding='utf-8-sig') as f:
    data = json.load(f)

Solution 3:

Since json.load(stream) uses json.loads(stream.read()) under the hood, it won't be that bad to write a small hepler function that lstrips the BOM:

from codecs import BOM_UTF8

def lstrip_bom(str_, bom=BOM_UTF8):
    if str_.startswith(bom):
        return str_[len(bom):]
    else:
        return str_

json.loads(lstrip_bom(open('sample.json').read()))    

In other situations where you need to wrap a stream and fix it somehow you may look at inheriting from codecs.StreamReader.

Solution 4:

you can also do it with keyword with

import codecs
with codecs.open('samples.json', 'r', 'utf-8-sig') as json_file:  
    data = json.load(json_file)

or better:

import io
with io.open('samples.json', 'r', encoding='utf-8-sig') as json_file:  
    data = json.load(json_file)