Parsing large JSON file in .NET
I have used the "JsonConvert.Deserialize(json)" method of Json.NET so far which worked quite well and to be honest, I didn't need anything more than this.
I am working on a background (console) application which constantly downloads the JSON content from different URLs, then deserializes the result into a list of .NET objects.
using (WebClient client = new WebClient())
{
string json = client.DownloadString(stringUrl);
var result = JsonConvert.DeserializeObject<List<Contact>>(json);
}
The simple code snippet above doesn't probably seem perfect, but it does the job. When the file is large (15,000 contacts - 48 MB file), JsonConvert.DeserializeObject isn't the solution and the line throws an exception type of JsonReaderException.
The downloaded JSON content is an array and this is how a sample looks like. Contact is a container class for the deserialized JSON object.
[
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
}
]
My initial guess is it runs out of memory. Just out of curiosity, I tried to parse it as JArray which caused the same exception too.
I have started to dive into Json.NET documentation and read similar threads. As I haven't managed to produce a working solution yet, I decided to post a question here.
UPDATE: While deserializing line by line, I got the same error: " [. Path '', line 600003, position 1." So downloaded two of them and checked them in Notepad++. I noticed that if the array length is more than 12,000, after 12000th element, the "[" is closed and another array starts. In other words, the JSON looks exactly like this:
[
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
}
]
[
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
}
]
As you've correctly diagnosed in your update, the issue is that the JSON has a closing ]
followed immediately by an opening [
to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.NET throws an error.
Fortunately, this problem seems to come up often enough that Json.NET actually has a special setting to deal with it. If you use a JsonTextReader
directly to read the JSON, you can set the SupportMultipleContent
flag to true
, and then use a loop to deserialize each item individually.
This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many arrays there are or how many items in each array.
using (WebClient client = new WebClient())
using (Stream stream = client.OpenRead(stringUrl))
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
reader.SupportMultipleContent = true;
var serializer = new JsonSerializer();
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
Contact c = serializer.Deserialize<Contact>(reader);
Console.WriteLine(c.FirstName + " " + c.LastName);
}
}
}
Full demo here: https://dotnetfiddle.net/2TQa8p
Json.NET supports deserializing directly from a stream. Here is a way to deserialize your JSON using a StreamReader
reading the JSON string one piece at a time instead of having the entire JSON string loaded into memory.
using (WebClient client = new WebClient())
{
using (StreamReader sr = new StreamReader(client.OpenRead(stringUrl)))
{
using (JsonReader reader = new JsonTextReader(sr))
{
JsonSerializer serializer = new JsonSerializer();
// read the json from a stream
// json size doesn't matter because only a small piece is read at a time from the HTTP request
IList<Contact> result = serializer.Deserialize<List<Contact>>(reader);
}
}
}
Reference: JSON.NET Performance Tips
I have done a similar thing in Python for the file size of 5 GB. I downloaded the file in some temporary location and read it line by line to form an JSON object similar on how SAX works.
For C# using Json.NET, you can download the file, use a stream reader to read the file, and pass that stream to JsonTextReader and parse it to JObject using JTokens.ReadFrom(your JSonTextReader object)
.