Select last nontrivial value from each column, by group

How might I write in SQL a query that accumulates a table of rows representing selective updates into a single "latest and greatest" row per BY group?

For instance from the following table:

index	date	ssn	first	last	title	car	shoe
1	Apr 1	100-00-0001	Joseph	Schmoe	Analyst	Honda	Adidas
2	May 1	100-00-0001	Joe
3	May 16	100-00-0001					Nike
4	June 20	100-00-0001			Sr Analyst
5	Jul 13	200-22-2222	Jane	Doe	Architect
6	Aug 4	100-00-0001				Tesla
7	Sep 9	100-00-0001			VP		All birds
8	Sep 9	200-22-2222					Hoka
9	Oct 4	100-00-0001			''		Timberlands
10	Nov 9	100-00-0001				Jeep
11	Dec 4	200-22-2222			Principal

I'd like a query that returns something like this:

index	date	ssn	first	last	title	car	shoe
10	Nov 9	100-00-0001	Joe	Schmoe	''	Jeep	Timberlands
11	Dec 4	200-22-2222	Jane	Doe	Principal		Hoka

My first instinct is there might be some aggregate function analogous to MAX() that works on an ordered set of rows, e.g. LAST(), e.g. something like:

  SELECT 
      LATEST(index) OVER (PARTITION BY ssn ORDER BY date) AS index,
      LATEST(date) OVER (PARTITION BY ssn ORDER BY date) AS date,
      LATEST(ssn) OVER (PARTITION BY ssn ORDER BY date) AS ssn,
      LATEST(first) OVER (PARTITION BY ssn ORDER BY date) AS first,
      LATEST(last) OVER (PARTITION BY ssn ORDER BY date) AS last,
      LATEST(title) OVER (PARTITION BY ssn ORDER BY date) AS title,
      LATEST(car) OVER (PARTITION BY ssn ORDER BY date) AS car,
      LATEST(shoe) OVER (PARTITION BY ssn ORDER BY date) AS shoe
  FROM myschema.updates
  GROUP BY ssn

Apologies if this is trivial, I just don't yet know what term to search for.

If you want the most recent not-null, not-blank value of each column, for each ssn, reported against the last date, then you can use something like this:


 select 
     s1.ssn
  , max(s1.date) as date
  , max(case when s1.first_rn=1 then first else null end) as first
  , max(case when s1.last_rn=1 then last else null end) as last
from (  
   select t.ssn,
         t.date,
         t.first,
         t.last,
         row_number() over (partition by ssn 
                            order by  case when coalesce(first,'')='' 
                                           then 2 else 1 
                                      end asc
                                     ,  date desc
                           ) as first_rn,
         row_number() over (partition by ssn 
                            order by  case when coalesce(last,'')='' 
                                           then 2 else 1 
                                       end asc
                                     , date desc
                           ) as last_rn

    from Tbl t
) s1
group by s1.ssn

We first find the most recent row (marking it with xxx_rn=1) with non-empty column (by making sure that nulls/blanks come last in numbering), then in the outer level we summarise, picking up values for each ssn by inspecting the rows with xxx_rn=1 (you could use max or min, it wouldn't matter, because there will only be one xxx_rn=1 record for each ssn in the subquery; min/max will ignore all the others). I have only demonstrated on first and last, you need to repeat for the other columns.

Select last nontrivial value from each column, by group

Related

Recent Posts