Copy a few of the columns of a csv file into a table

Solution 1:

If it is an ad hoc task

Create a temporary table with all the columns in the input file

create temporary table t (x1 integer, ... , x10 text)

Copy from the file into it:

copy t (x1, ... , x10)
from '/path/to/my_file'
with (format csv)

Now insert into the definitive table from the temp:

insert into my_table (x2, x5, x7, x10)
select x2, x5, x7, x10
from t

And drop it:

drop table t

If it is a frequent task

Use the file_fdw extension. As superuser:

create extension file_fdw;

create server my_csv foreign data wrapper file_fdw;

create foreign table my_csv (
    x1 integer,
    x2 text,
    x3 text
) server my_csv
options (filename '/tmp/my_csv.csv', format 'csv' )
;

Grant select permission on the table to the user who will read it:

grant select on table my_csv to the_read_user;

Then whenever necessary read directly from the csv file as if it were a table:

insert into my_table (x2)
select x2
from my_csv
where x1 = 2

Solution 2:

You can provide the columns your want to fill with the COPY command. Like so:

\copy your_table (x2,x5,x7,x10) FROM '/path/to/your-file.csv' DELIMITER ',' CSV;

Here's the doc for the COPY command.

Solution 3:

As other answers have pointed out, it's been possible to specify columns to copy into the PG table. However, without the option to reference column names in the CSV, this had little utility apart from loading into a table where columns had a different order.

Fortunately, as of Postgres 9.3, it's possible to copy columns not only from a file or from standard input, but also from a shell command using PROGRAM:

PROGRAM

A command to execute. In COPY FROM, the input is read from standard output of the command, and in COPY TO, the output is written to the standard input of the command.

Note that the command is invoked by the shell, so if you need to pass any arguments to shell command that come from an untrusted source, you must be careful to strip or escape any special characters that might have a special meaning for the shell. For security reasons, it is best to use a fixed command string, or at least avoid passing any user input in it.

This was the missing piece that we needed for such an eagerly awaited functionality. For example, we could use this option in combination with cut (in a UNIX-based system) to select certain columns by order:

COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'cut -d "," -f 2,5,7,10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)

However, cut has several limitations when manipulating CSV's: it can't adequately manipulate strings with commas (or other delimeters) inside them and doesn't allow to select columns by name.

There are several other open source command-line tools that are better at manipulating CSV files, such as csvkit or miller. Here's an example using miller to select columns by name:

COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'mlr --csv lf cut -f x2,x5,x7,x10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)