Replace empty strings with null values

I am rolling up a huge table by counts into a new table, where I want to change all the empty strings to NULL, and typecast some columns as well. I read through some of the posts and I could not find a query, which would let me do it across all the columns in a single query, without using multiple statements.

Let me know if it is possible for me to iterate across all columns and replace cells with empty strings with null.

Ref: How to convert empty spaces into null values, using SQL Server?

Solution 1:

To my knowledge there is no built-in function to replace empty strings across all columns of a table. You can write a plpgsql function to take care of that.

The following function replaces empty strings in all basic character-type columns of a given table with NULL. You can then cast to integer if the remaining strings are valid number literals.

CREATE OR REPLACE FUNCTION f_empty_text_to_null(_tbl regclass, OUT updated_rows int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _typ  CONSTANT regtype[] := '{text, bpchar, varchar}';  -- ARRAY of all basic character types
   _sql  text;
BEGIN
   SELECT INTO _sql                       -- build SQL command
          'UPDATE ' || _tbl
          || E'\nSET    ' || string_agg(format('%1$s = NULLIF(%1$s, '''')', col), E'\n      ,')
          || E'\nWHERE  ' || string_agg(col || ' = ''''', ' OR ')
   FROM  (
      SELECT quote_ident(attname) AS col
      FROM   pg_attribute
      WHERE  attrelid = _tbl              -- valid, visible, legal table name 
      AND    attnum >= 1                  -- exclude tableoid & friends
      AND    NOT attisdropped             -- exclude dropped columns
      AND    NOT attnotnull               -- exclude columns defined NOT NULL!
      AND    atttypid = ANY(_typ)         -- only character types
      ORDER  BY attnum
      ) sub;

   -- RAISE NOTICE '%', _sql;  -- test?

   -- Execute
   IF _sql IS NULL THEN
      updated_rows := 0;                        -- nothing to update
   ELSE
      EXECUTE _sql;
      GET DIAGNOSTICS updated_rows = ROW_COUNT; -- Report number of affected rows
   END IF;
END
$func$;

Call:

SELECT f_empty2null('mytable');
SELECT f_empty2null('myschema.mytable');

To also get the column name updated_rows:

SELECT * FROM f_empty2null('mytable');

db<>fiddle here
_{Old sqlfiddle}

Major points

Table name has to be valid and visible and the calling user must have all necessary privileges. If any of these conditions are not met, the function will do nothing - i.e. nothing can be destroyed, either. I cast to the object identifier type regclass to make sure of it.
The table name can be supplied as is ('mytable'), then the search_path decides. Or schema-qualified to pick a certain schema ('myschema.mytable').
Query the system catalog to get all (character-type) columns of the table. The provided function uses these basic character types: text, bpchar, varchar, "char". Only relevant columns are processed.
Use quote_ident() or format() to sanitize column names and safeguard against SQLi.
The updated version uses the basic SQL aggregate function string_agg() to build the command string without looping, which is simpler and faster. And more elegant. :)
Has to use dynamic SQL with EXECUTE.
The updated version excludes columns defined NOT NULL and only updates each row once in a single statement, which is much faster for tables with multiple character-type columns.
Should work with any modern version of PostgreSQL. Tested with Postgres 9.1, 9.3, 9.5 and 13.