How can I import a JSON file into PostgreSQL?
For example I have a file customers.json
which is an array of objects (strictly formed) and it's pretty plain (without nested objects) like this (what is important: it's already include ids):
[
{
"id": 23635,
"name": "Jerry Green",
"comment": "Imported from facebook."
},
{
"id": 23636,
"name": "John Wayne",
"comment": "Imported from facebook."
}
]
And I want to import them all into my postgres db into a table customers
.
I found some pretty difficult ways when I should import it as json-typed column to a table like imported_json
and column named data
with objects listed there, then to use sql to get these values and insert it into a real table.
But is there a simple way of importing json to postgres with no touching of sql?
You can feed the JSON into a SQL statement that extracts the information and inserts that into the table. If the JSON attributes have exactly the name as the table columns you can do something like this:
with customer_json (doc) as (
values
('[
{
"id": 23635,
"name": "Jerry Green",
"comment": "Imported from facebook."
},
{
"id": 23636,
"name": "John Wayne",
"comment": "Imported from facebook."
}
]'::json)
)
insert into customer (id, name, comment)
select p.*
from customer_json l
cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update
set name = excluded.name,
comment = excluded.comment;
New customers will be inserted, existing ones will be updated. The "magic" part is the json_populate_recordset(null::customer, doc)
which generates a relational representation of the JSON objects.
The above assumes a table definition like this:
create table customer
(
id integer primary key,
name text not null,
comment text
);
If the data is provided as a file, you need to first put that file into some table in the database. Something like this:
create unlogged table customer_import (doc json);
Then upload the file into a single row of that table, e.g. using the \copy
command in psql
(or whatever your SQL client offers):
\copy customer_import from 'customers.json' ....
Then you can use the above statement, just remove the CTE and use the staging table:
insert into customer (id, name, comment)
select p.*
from customer_import l
cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update
set name = excluded.name,
comment = excluded.comment;
It turns out there's an easy way to import a multi-line JSON object into a JSON column in a postgres database using the command line psql tool, without needing to explicitly embed the JSON into the SQL statement. The technique is documented in the postgresql docs, but it's a bit hidden.
The trick is to load the JSON into a psql variable using backticks. For example, given a multi-line JSON file in /tmp/test.json such as:
{
"dog": "cat",
"frog": "frat"
}
We can use the following SQL to load it into a temporary table:
sql> \set content `cat /tmp/test.json`
sql> create temp table t ( j jsonb );
sql> insert into t values (:'content');
sql> select * from t;
which gives the result:
j
────────────────────────────────
{"dog": "cat", "frog": "frat"}
(1 row)
You can also perform operations on the data directly:
sql> select :'content'::jsonb -> 'dog';
?column?
──────────
"cat"
(1 row)
Under the covers this is just embedding the JSON in the SQL, but it's a lot neater to let psql perform the interpolation itself.
In near big-data cases, the most efficient way to import json from a file, appear to not import a single json from a file but rather a single column csv: A list of one-line jsons:
data.json.csv:
{"id": 23635,"name": "Jerry Green","comment": "Imported from facebook."}
{"id": 23636,"name": "John Wayne","comment": "Imported from facebook."}
then, under psql:
create table t ( j jsonb )
\copy t from 'd:\path\data.json.csv'
One record per json (line) will be added into t table.
"\copy from" import was made for csv, and as such loads data line by line. As a result reading one json per line rather than a single json array to be later splited, will not use any intermediate table.
More of that you will less likely hit the max input line-size limitation that will arise if your input json file is too big.
I would thus first convert your input into a single column csv to then import it using the copy command.
If you want to do it from a command line ...
NOTE: This isn't a direct answer to your question, as this will require you to convert your JSON to SQL. You will probably have to deal with JSON 'null' when converting anyway. You could use a view or materialized view to make that problem invisible-ish, though.
Here is a script I've used for importing JSON into PostgreSQL (WSL Ubuntu), which basically requires that you mix psql meta commands and SQL in the same command line. Note use of the somewhat obscure script command, which allocates a pseudo-tty:
$ more update.sh
#!/bin/bash
wget <filename>.json
echo '\set content `cat $(ls -t <redacted>.json.* | head -1)` \\ delete from <table>; insert into <table> values(:'"'content'); refresh materialized view <view>; " | PGPASSWORD=<passwd> psql -h <host> -U <user> -d <database>
$
(Copied from my answer at Shell script to execute pgsql commands in files)
You can use spyql. Running the following command would generate INSERT statements that you can pipe into psql:
$ jq -c .[] customers.json | spyql -Otable=customer "SELECT json->id, json->name, json->comment FROM json TO sql"
INSERT INTO "customer"("id","name","comment") VALUES (23635,'Jerry Green','Imported from facebook.'),(23636,'John Wayne','Imported from facebook.');
jq is used to transform the json array into json lines (1 json object per line) and then spyql takes care of converting json lines into INSERT statements.
To import the data into PostgreSQL:
$ jq -c .[] customers.json | spyql -Otable=customer "SELECT json->id, json->name, json->comment FROM json TO sql" | psql -U your_user_name -h your_host your_database
Disclaimer: I am the author of spyql.