Replace empty strings with null values
I am rolling up a huge table by counts into a new table, where I want to change all the empty strings to NULL
, and typecast some columns as well. I read through some of the posts and I could not find a query, which would let me do it across all the columns in a single query, without using multiple statements.
Let me know if it is possible for me to iterate across all columns and replace cells with empty strings with null.
Ref: How to convert empty spaces into null values, using SQL Server?
Solution 1:
To my knowledge there is no built-in function to replace empty strings across all columns of a table. You can write a plpgsql function to take care of that.
The following function replaces empty strings in all basic character-type columns of a given table with NULL
. You can then cast to integer
if the remaining strings are valid number literals.
CREATE OR REPLACE FUNCTION f_empty_text_to_null(_tbl regclass, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_typ CONSTANT regtype[] := '{text, bpchar, varchar}'; -- ARRAY of all basic character types
_sql text;
BEGIN
SELECT INTO _sql -- build SQL command
'UPDATE ' || _tbl
|| E'\nSET ' || string_agg(format('%1$s = NULLIF(%1$s, '''')', col), E'\n ,')
|| E'\nWHERE ' || string_agg(col || ' = ''''', ' OR ')
FROM (
SELECT quote_ident(attname) AS col
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
AND NOT attnotnull -- exclude columns defined NOT NULL!
AND atttypid = ANY(_typ) -- only character types
ORDER BY attnum
) sub;
-- RAISE NOTICE '%', _sql; -- test?
-- Execute
IF _sql IS NULL THEN
updated_rows := 0; -- nothing to update
ELSE
EXECUTE _sql;
GET DIAGNOSTICS updated_rows = ROW_COUNT; -- Report number of affected rows
END IF;
END
$func$;
Call:
SELECT f_empty2null('mytable');
SELECT f_empty2null('myschema.mytable');
To also get the column name updated_rows
:
SELECT * FROM f_empty2null('mytable');
db<>fiddle here
Old sqlfiddle
Major points
-
Table name has to be valid and visible and the calling user must have all necessary privileges. If any of these conditions are not met, the function will do nothing - i.e. nothing can be destroyed, either. I cast to the object identifier type
regclass
to make sure of it.
The table name can be supplied as is ('mytable'
), then thesearch_path
decides. Or schema-qualified to pick a certain schema ('myschema.mytable'
). -
Query the system catalog to get all (character-type) columns of the table. The provided function uses these basic character types:
text
,bpchar
,varchar
,"char"
. Only relevant columns are processed. -
Use
quote_ident()
orformat()
to sanitize column names and safeguard against SQLi. -
The updated version uses the basic SQL aggregate function
string_agg()
to build the command string without looping, which is simpler and faster. And more elegant. :) -
Has to use dynamic SQL with
EXECUTE
. -
The updated version excludes columns defined
NOT NULL
and only updates each row once in a single statement, which is much faster for tables with multiple character-type columns. -
Should work with any modern version of PostgreSQL. Tested with Postgres 9.1, 9.3, 9.5 and 13.