PostgreSQL ORDER BY issue - natural sort
I've got a Postgres ORDER BY
issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
- I select the last record with
SELECT * FROM employees ORDER BY em_code DESC
- Strip alphabets from em_code usiging reg exp and store in
ec_alpha
- Cast the remating part to integer
ec_num
- Increment by one
ec_num++
- Pad with sufficient zeors and prefix
ec_alpha
again
When em_code
reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code
, breaking the unique key constraint.
Any idea how to select EM1000?
Solution 1:
One approach you can take is to create a naturalsort
function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
Solution 2:
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1
sorts before 9
.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code
- if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D
is the regular expression class-shorthand for "non-digits".'g'
as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.