Replace empty strings with null values

I am rolling up a huge table by counts into a new table, where I want to change all the empty strings to NULL, and typecast some columns as well. I read through some of the posts and I could not find a query, which would let me do it across all the columns in a single query, without using multiple statements.

Let me know if it is possible for me to iterate across all columns and replace cells with empty strings with null.

Ref: How to convert empty spaces into null values, using SQL Server?


Solution 1:

To my knowledge there is no built-in function to replace empty strings across all columns of a table. You can write a plpgsql function to take care of that.

The following function replaces empty strings in all basic character-type columns of a given table with NULL. You can then cast to integer if the remaining strings are valid number literals.

CREATE OR REPLACE FUNCTION f_empty_text_to_null(_tbl regclass, OUT updated_rows int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _typ  CONSTANT regtype[] := '{text, bpchar, varchar}';  -- ARRAY of all basic character types
   _sql  text;
BEGIN
   SELECT INTO _sql                       -- build SQL command
          'UPDATE ' || _tbl
          || E'\nSET    ' || string_agg(format('%1$s = NULLIF(%1$s, '''')', col), E'\n      ,')
          || E'\nWHERE  ' || string_agg(col || ' = ''''', ' OR ')
   FROM  (
      SELECT quote_ident(attname) AS col
      FROM   pg_attribute
      WHERE  attrelid = _tbl              -- valid, visible, legal table name 
      AND    attnum >= 1                  -- exclude tableoid & friends
      AND    NOT attisdropped             -- exclude dropped columns
      AND    NOT attnotnull               -- exclude columns defined NOT NULL!
      AND    atttypid = ANY(_typ)         -- only character types
      ORDER  BY attnum
      ) sub;

   -- RAISE NOTICE '%', _sql;  -- test?

   -- Execute
   IF _sql IS NULL THEN
      updated_rows := 0;                        -- nothing to update
   ELSE
      EXECUTE _sql;
      GET DIAGNOSTICS updated_rows = ROW_COUNT; -- Report number of affected rows
   END IF;
END
$func$;

Call:

SELECT f_empty2null('mytable');
SELECT f_empty2null('myschema.mytable');

To also get the column name updated_rows:

SELECT * FROM f_empty2null('mytable');

db<>fiddle here
Old sqlfiddle

Major points

  • Table name has to be valid and visible and the calling user must have all necessary privileges. If any of these conditions are not met, the function will do nothing - i.e. nothing can be destroyed, either. I cast to the object identifier type regclass to make sure of it.
    The table name can be supplied as is ('mytable'), then the search_path decides. Or schema-qualified to pick a certain schema ('myschema.mytable').

  • Query the system catalog to get all (character-type) columns of the table. The provided function uses these basic character types: text, bpchar, varchar, "char". Only relevant columns are processed.

  • Use quote_ident() or format() to sanitize column names and safeguard against SQLi.

  • The updated version uses the basic SQL aggregate function string_agg() to build the command string without looping, which is simpler and faster. And more elegant. :)

  • Has to use dynamic SQL with EXECUTE.

  • The updated version excludes columns defined NOT NULL and only updates each row once in a single statement, which is much faster for tables with multiple character-type columns.

  • Should work with any modern version of PostgreSQL. Tested with Postgres 9.1, 9.3, 9.5 and 13.