SQL QUERY replace NULL value in a row with a value from the previous known value

Solution 1:

If you are using Sql Server this should work

DECLARE @Table TABLE(
        ID INT,
        Val INT
)

INSERT INTO @Table (ID,Val) SELECT 1, 3
INSERT INTO @Table (ID,Val) SELECT 2, NULL
INSERT INTO @Table (ID,Val) SELECT 3, 5
INSERT INTO @Table (ID,Val) SELECT 4, NULL
INSERT INTO @Table (ID,Val) SELECT 5, NULL
INSERT INTO @Table (ID,Val) SELECT 6, 2


SELECT  *,
        ISNULL(Val, (SELECT TOP 1 Val FROM @Table WHERE ID < t.ID AND Val IS NOT NULL ORDER BY ID DESC))
FROM    @Table t

Solution 2:

Here's a MySQL solution:

UPDATE mytable
SET number = (@n := COALESCE(number, @n))
ORDER BY date;

This is concise, but won't necessary work in other brands of RDBMS. For other brands, there might be a brand-specific solution that is more relevant. That's why it's important to tell us the brand you're using.

It's nice to be vendor-independent, as @Pax commented, but failing that, it's also nice to use your chosen brand of database to its fullest advantage.


Explanation of the above query:

@n is a MySQL user variable. It starts out NULL, and is assigned a value on each row as the UPDATE runs through rows. Where number is non-NULL, @n is assigned the value of number. Where number is NULL, the COALESCE() defaults to the previous value of @n. In either case, this becomes the new value of the number column and the UPDATE proceeds to the next row. The @n variable retains its value from row to row, so subsequent rows get values that come from the prior row(s). The order of the UPDATE is predictable, because of MySQL's special use of ORDER BY with UPDATE (this is not standard SQL).

Solution 3:

The best solution is the one offered by Bill Karwin. I recently had to solve this in a relatively large resultset (1000 rows with 12 columns each needing this type of "show me last non-null value if this value is null on the current row") and using the update method with a top 1 select for the previous known value (or subquery with a top 1 ) ran super slow.

I am using SQL 2005 and the syntax for a variable replacement is slightly different than mysql:

UPDATE mytable 
SET 
    @n = COALESCE(number, @n),
    number = COALESCE(number, @n)
ORDER BY date

The first set statement updates the value of the variable @n to the current row's value of 'number' if the 'number' is not null (COALESCE returns the first non-null argument you pass into it) The second set statement updates the actual column value for 'number' to itself (if not null) or the variable @n (which always contains the last non NULL value encountered).

The beauty of this approach is that there are no additional resources expended on scanning the temporary table over and over again... The in-row update of @n takes care of tracking the last non-null value.

I don't have enough rep to vote his answer up, but someone should. It's the most elegant and best performant.