How to estimate the size of one column in a Postgres table?

There is a column of type text in a table in Postgres 9.1. I'd like to know the impact of just that column on the disk space needed. It doesn't need to be precise, but I'd like to get an idea if that column is responsible for 20%/30%/... of the disk space consumed by the database.

I know pg_relation_size, but it only operates at table level.

I have many databases with this same schema. I dumped a smaller one and cut out the column with grep and cut and compared the size of the plain text dumps. But this is not necessarily a good indicator of space requirements in the live db, and it's also more difficult to do that for large databases.


Solution 1:

select
    sum(pg_column_size(the_text_column)) as total_size,
    avg(pg_column_size(the_text_column)) as average_size,
    sum(pg_column_size(the_text_column)) * 100.0 / pg_relation_size('t') as percentage
from t;

Solution 2:

Slight improvement on the accepted answer: pretty print the size and use pg_total_relation_size to be more accurate.

select
    pg_size_pretty(sum(pg_column_size(column_name))) as total_size,
    pg_size_pretty(avg(pg_column_size(column_name))) as average_size,
    sum(pg_column_size(column_name)) * 100.0 / pg_total_relation_size('table_name') as percentage
from table_name;

Solution 3:

If you want a report for all the columns in a database sorted by size then here is the way

BEGIN;
CREATE FUNCTION tc_column_size(table_name text, column_name text)
    RETURNS BIGINT AS
$$
    declare response BIGINT;
BEGIN
    EXECUTE 'select sum(pg_column_size(t."' || column_name || '")) from ' || table_name || ' t ' into response;
    return response;
END;
$$
    LANGUAGE plpgsql;

SELECT
    z.table_name,
    z.column_name,
    pg_size_pretty(z.size)
FROM (
    SELECT
        table_name,
        column_name,
        tc_column_size(table_name, column_name) size
    FROM
        information_schema.columns
    WHERE
        table_schema = 'public') AS z
WHERE
    size IS NOT NULL
    -- and z.table_name = 'my_table' -- <--- uncomment to filter a table
ORDER BY
    z.size DESC;

ROLLBACK; -- <--- You may not want to keep that function