How to compare the current row with next and previous row in PostgreSQL?
I want to know how to retrieve results in a SQL query doing some logic comparison with the next or previous rows. I'm using PostgreSQL.
Example
Supposing I have a table in my database with two attributes (ordered position and random numbers), I want to retrieve the odd numbers that are between even numbers. How can I do this?
The real usage
I want to find words that are between two another words which have the category NAME (and the word is not a name). The ordering is provided by sentence and position.
Edit I want to know if the Window function of PostgreSQL are best solution for this kind of problem than doing queries. I heard about them, but never used.
This is my solution using WINDOW functions
. I used the lag
and lead
functions. Both returns a value from a column from a row in offset from the current row. lag
goes back and lead
goes next in the offset.
SELECT tokcat.text
FROM (
SELECT text, category, chartype, lag(category,1) OVER w as previousCategory, lead(category,1) OVER w as nextCategory
FROM token t, textBlockHasToken tb
WHERE tb.tokenId = t.id
WINDOW w AS (
PARTITION BY textBlockId, sentence
ORDER BY textBlockId, sentence, position
)
) tokcat
WHERE 'NAME' = ANY(previousCategory)
AND 'NAME' = ANY(nextCategory)
AND 'NAME' <> ANY(category)
Simplified version:
SELECT text
FROM (
SELECT text
,category
,lag(category) OVER w as previous_cat
,lead(category) OVER w as next_cat
FROM token t
JOIN textblockhastoken tb ON tb.tokenid = t.id
WINDOW w AS (PARTITION BY textblockid, sentence ORDER BY position)
) tokcat
WHERE category <> 'NAME'
AND previous_cat = 'NAME'
AND next_cat = 'NAME';
Major points
-
= ANY()
is not needed, the window function returns a single value - some redundant fields in the subquery
- no need to order by columns, that you
PARTITION BY
- the ORDER BY applies within partitions - Don't use mixed case identifiers without quoting, it only leads to confusion. (Better yet: don't use mixed case identifiers in PostgreSQL ever)
You can find the best solution in this address:
http://blog.sqlauthority.com/2013/09/25/sql-server-how-to-access-the-previous-row-and-next-row-value-in-select-statement-part-4/
Query 1 for SQL Server 2012 and later version:
SELECT
LAG(p.FirstName) OVER(ORDER BY p.BusinessEntityID) PreviousValue,
p.FirstName,
LEAD(p.FirstName) OVER(ORDER BY p.BusinessEntityID) NextValue
FROM Person.Person p
GO
Query 2 for SQL Server 2005+ and later version:
WITH CTE AS(
SELECT rownum = ROW_NUMBER() OVER(ORDER BY p.BusinessEntityID),
p.FirstName FROM Person.Person p
)
SELECT
prev.FirstName PreviousValue,
CTE.FirstName,
nex.FirstName NextValue
FROM CTE
LEFT JOIN CTE prev ON prev.rownum = CTE.rownum - 1
LEFT JOIN CTE nex ON nex.rownum = CTE.rownum + 1
GO
This should work:
SELECT w1.word AS word_before, w.word, w2.word AS word_after
FROM word w
JOIN word w1 USING (sentence)
JOIN word w2 USING (sentence)
WHERE w.category <> 'name'
AND w1.pos = (w.pos - 1)
AND w1.category = 'name'
AND w2.pos = (w.pos + 1)
AND w2.category = 'name'
- Use two self-joins
- All words must be in the same sentence (?) and in order.
- Word before and word after have to be of category 'name'. Word itself not 'name'
- This assumes that category
IS NOT NULL
To answer your additional question: no, a window function would not be particularly useful in this case, self-join is the magic word here.
Edit:
I stand corrected. Renato demonstrates a cool solution with the window functions lag() and lead().
Note the subtle differences:
- the self joins operate on absolute values: if the row with
pos -1
is missing, then the row withpos
does not qualify. - Renatos version with
lag()
andlead()
operates on the relative position of rows created byORDER BY
.
In many cases (like probably in the one at hand?) both versions lead to identical results. With gaps in the id space there will be different results.