REGEX to select nth value from a list, allowing for nulls
I am using REGEXP_SUBSTR()
to return the nth value from a comma-separated list. This works fine when all values are present, but fails if an item is null. Here is an example that works where all values are present and I am selecting the 2nd occurrence of 1 or more characters that are not a comma:
SQL> select REGEXP_SUBSTR('1,2,3,4,5,6', '[^,]+', 1, 2) data
2 from dual;
D
-
2
But when the second value is null, I am really getting the third item in the list, which of course really is the 2nd occurrence of 1 or more characters that are not a comma. However, I need it to return NULL as the 2nd item is empty:
SQL> select REGEXP_SUBSTR('1,,3,4,5,6', '[^,]+', 1, 2) data
2 from dual;
D
-
3
If I change the regex to allow for zero or more characters instead of 1 or more, it also fails for numbers past the null:
SQL> select REGEXP_SUBSTR('1,,3,4,5,6', '[^,]*', 1, 4) data
2 from dual;
D
-
3
I need to allow for the null but can't seem to get the syntax right. Logically I need to return what is before the nth occurrence of a comma whether data is present or not (and allow for the last value also). Any ideas?
Solution 1:
Thanks to those who replied. After perusing your answers and the answers in the link supplied, I arrived at this solution:
SQL> select REGEXP_SUBSTR('1,,3,4,5', '(.*?)(,|$)', 1, 2, NULL, 1) data
2 from dual;
Data
----
Which can be described as "look at the 2nd occurrence of an optional set of zero or more characters that are followed by a comma or the end of the line, and return the 1st subgroup (which is the data less the comma or end of the line).
I forgot to mention I tested with the null in various positions, multiple nulls, selecting various positions, etc.
The only caveat I could find is if the field you look for is greater than the number available, it just returns NULL so you need to be aware of that. Not a problem for my case.
EDIT: I am updating the accepted answer for the benefit of future searchers that may stumble upon this.
The next step is to encapsulate the code so it can be made into a simpler, reusable function. Here is the function source:
FUNCTION GET_LIST_ELEMENT(string_in VARCHAR2, element_in NUMBER, delimiter_in VARCHAR2 DEFAULT ',') RETURN VARCHAR2 IS
BEGIN
RETURN REGEXP_SUBSTR(string_in, '(.*?)(\'||delimiter_in||'|$)', 1, element_in, NULL, 1);
END GET_LIST_ELEMENT;
This hides the regex complexities from developers who may not be so comfortable with it and makes the code cleaner anyway when in use. Call it like this to get the 4th element:
select get_list_element('123,222,,432,555', 4) from dual;