Can you split/explode a field in a MySQL query?

Solution 1:

Until now, I wanted to keep those comma separated lists in my SQL db - well aware of all warnings!

I kept thinking that they have benefits over lookup tables (which provide a way to a normalized data base). After some days of refusing, I've seen the light:

  • Using lookup tables is NOT causing more code than those ugly string operations when using comma separated values in one field.
  • The lookup table allows for native number formats and is thus NOT bigger than those csv fields. It is SMALLER though.
  • The involved string operations are slim in high level language code (SQL and PHP), but expensive compared to using arrays of integers.
  • Databases are not meant to be human readable, and it is mostly stupid to try to stick to structures due to their readability / direct editability, as I did.

In short, there is a reason why there is no native SPLIT() function in MySQL.

Solution 2:

Seeing that it's a fairly popular question - the answer is YES.

For a column column in table table containing all of your coma separated values:

CREATE TEMPORARY TABLE temp (val CHAR(255));
SET @S1 = CONCAT("INSERT INTO temp (val) VALUES ('",REPLACE((SELECT GROUP_CONCAT( DISTINCT  `column`) AS data FROM `table`), ",", "'),('"),"');");
PREPARE stmt1 FROM @s1;
EXECUTE stmt1;
SELECT DISTINCT(val) FROM temp;

Please remember however to not store CSV in your DB


Per @Mark Amery - as this translates coma separated values into an INSERT statement, be careful when running it on unsanitised data


Just to reiterate, please don't store CSV in your DB; this function is meant to translate CSV into sensible DB structure and not to be used anywhere in your code. If you have to use it in production, please rethink your DB structure

Solution 3:

You can create a function for this:

/**
* Split a string by string (Similar to the php function explode())
*
* @param VARCHAR(12) delim The boundary string (delimiter).
* @param VARCHAR(255) str The input string.
* @param INT pos The index of the string to return
* @return VARCHAR(255) The (pos)th substring
* @return VARCHAR(255) Returns the [pos]th string created by splitting the str parameter on boundaries formed by the delimiter.
* @{@example
*     SELECT SPLIT_STRING('|', 'one|two|three|four', 1);
*     This query
* }
*/
DROP FUNCTION IF EXISTS SPLIT_STRING;
CREATE FUNCTION SPLIT_STRING(delim VARCHAR(12), str VARCHAR(255), pos INT)
RETURNS VARCHAR(255) DETERMINISTIC
RETURN
    REPLACE(
        SUBSTRING(
            SUBSTRING_INDEX(str, delim, pos),
            LENGTH(SUBSTRING_INDEX(str, delim, pos-1)) + 1
        ),
        delim, ''
    );

Converting the magical pseudocode to use this, you would have:

SELECT e.`studentId`, SPLIT_STRING(',', c.`courseNames`, e.`courseId`)
FROM...

Solution 4:

MySQL's only string-splitting function is SUBSTRING_INDEX(str, delim, count). You can use this, to, for example:

  • Return the item before the first separator in a string:

    mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', 1);
    +--------------------------------------------+
    | SUBSTRING_INDEX('foo#bar#baz#qux', '#', 1) |
    +--------------------------------------------+
    | foo                                        |
    +--------------------------------------------+
    1 row in set (0.00 sec)
    
  • Return the item after the last separator in a string:

    mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', -1);
    +---------------------------------------------+
    | SUBSTRING_INDEX('foo#bar#baz#qux', '#', -1) |
    +---------------------------------------------+
    | qux                                         |
    +---------------------------------------------+
    1 row in set (0.00 sec)
    
  • Return everything before the third separator in a string:

    mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', 3);
    +--------------------------------------------+
    | SUBSTRING_INDEX('foo#bar#baz#qux', '#', 3) |
    +--------------------------------------------+
    | foo#bar#baz                                |
    +--------------------------------------------+
    1 row in set (0.00 sec)
    
  • Return the second item in a string, by chaining two calls:

    mysql> SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('foo#bar#baz#qux', '#', 2), '#', -1);
    +----------------------------------------------------------------------+
    | SUBSTRING_INDEX(SUBSTRING_INDEX('foo#bar#baz#qux', '#', 2), '#', -1) |
    +----------------------------------------------------------------------+
    | bar                                                                  |
    +----------------------------------------------------------------------+
    1 row in set (0.00 sec)
    

In general, a simple way to get the nth element of a #-separated string (assuming that you know it definitely has at least n elements) is to do:

SUBSTRING_INDEX(SUBSTRING_INDEX(your_string, '#', n), '#', -1);

The inner SUBSTRING_INDEX call discards the nth separator and everything after it, and then the outer SUBSTRING_INDEX call discards everything except the final element that remains.

If you want a more robust solution that returns NULL if you ask for an element that doesn't exist (for instance, asking for the 5th element of 'a#b#c#d'), then you can count the delimiters using REPLACE and then conditionally return NULL using IF():

IF(
    LENGTH(your_string) - LENGTH(REPLACE(your_string, '#', '')) / LENGTH('#') < n - 1,
    NULL,
    SUBSTRING_INDEX(SUBSTRING_INDEX(your_string, '#', n), '#', -1)
)

Of course, this is pretty ugly and hard to understand! So you might want to wrap it in a function:

CREATE FUNCTION split(string TEXT, delimiter TEXT, n INT)
RETURNS TEXT DETERMINISTIC
RETURN IF(
    (LENGTH(string) - LENGTH(REPLACE(string, delimiter, ''))) / LENGTH(delimiter) < n - 1,
    NULL,
    SUBSTRING_INDEX(SUBSTRING_INDEX(string, delimiter, n), delimiter, -1)
);

You can then use the function like this:

mysql> SELECT SPLIT('foo,bar,baz,qux', ',', 3);
+----------------------------------+
| SPLIT('foo,bar,baz,qux', ',', 3) |
+----------------------------------+
| baz                              |
+----------------------------------+
1 row in set (0.00 sec)

mysql> SELECT SPLIT('foo,bar,baz,qux', ',', 5);
+----------------------------------+
| SPLIT('foo,bar,baz,qux', ',', 5) |
+----------------------------------+
| NULL                             |
+----------------------------------+
1 row in set (0.00 sec)

mysql> SELECT SPLIT('foo###bar###baz###qux', '###', 2);
+------------------------------------------+
| SPLIT('foo###bar###baz###qux', '###', 2) |
+------------------------------------------+
| bar                                      |
+------------------------------------------+
1 row in set (0.00 sec)

Solution 5:

Based on Alex answer above (https://stackoverflow.com/a/11022431/1466341) I came up with even better solution. Solution which doesn't contain exact one record ID.

Assuming that the comma separated list is in table data.list, and it contains listing of codes from other table classification.code, you can do something like:

SELECT 
    d.id, d.list, c.code
FROM 
    classification c
    JOIN data d
        ON d.list REGEXP CONCAT('[[:<:]]', c.code, '[[:>:]]');

So if you have tables and data like this:

CLASSIFICATION (code varchar(4) unique): ('A'), ('B'), ('C'), ('D')
MY_DATA (id int, list varchar(255)): (100, 'C,A,B'), (150, 'B,A,D'), (200,'B')

above SELECT will return

(100, 'C,A,B', 'A'),
(100, 'C,A,B', 'B'),
(100, 'C,A,B', 'C'),
(150, 'B,A,D', 'A'),
(150, 'B,A,D', 'B'),
(150, 'B,A,D', 'D'),
(200, 'B', 'B'),