How to edit specific textline of large json/textfile (~25gb)?
I have a json file with ElasticSearch events which can't be parsed by jq (funnily the json comes from jq) due to a missing comma. Here is an extract from the problematic place in the json file:
"end",
"protocol"
],
"dataset": "test",
"outcome": "success"
},
"@timestamp": "2020-08-23T04:47:10.000+02:00"
}
{
"agent": {
"hostname": "fb",
"type": "filebeat"
},
"destination": {
My jq command crashes at the closing brace (above "agent") as there is missing a comma after that brace (since a new event starts there). Now I know exactly the line and would like to add a comma there but couldn't find any options on how to do that efficiently. Since the file is around 25gb it is unsuitable to open it by nano or other tools. The error is parse error: Expected separator between values at line 192388762
Does anyone know if there is an efficient way to add a comma there so it looks like this?
"@timestamp": "2020-08-23T04:47:10.000+02:00"
},
{
"agent": {
Is there a command which I can tell to go to line X, column 1 and add a comma there (after column1)?
Are there brackets []
surrounding all these objects? If so, it is an array and there's missing commas indeed. But jq wouldn't have missed to produce them unless the previous filter was designed on purpose to behave that way. If there aren't surrounding brackets (which I presume according to the indentation of the sample), then it is a stream of objects that do not need a comma in between. In fact, putting a comma in between without the surrounding brackets would render the file inprocessible as it wouldn't be valid JSON anymore.
If it is a faulty array (the former case) maybe you're better off not using jq but rather a text stream editor such as sed or awk as you seem to know exactly where the commas are missing ("Is there a command which I can tell to go to line X, column 1 and add a comma there?")
If it is infact a stream of objects (the latter case), then you could use jq --slurp '…'
or jq -n '[inputs] | …'
to make it an array (surrounded by brackets and with commas in between) but the file (25 GB) has to fit entirely into your memory. If it doesn't, you need to use jq --stream '…'
and handle the document (which has a different format then) according to the documentation for processing streams.
Illustrations:
This is an array of objects:
[
{"a": 1},
{"b": 2},
{"c": 3}
]
This is a stream of objects:
{"a": 1}
{"b": 2}
{"c": 3}
This is not valid JSON:
{"a": 1},
{"b": 2},
{"c": 3}