Comma separated values in a database field

I have a products table. Each row in that table corresponds to a single product and it's identified by a unique Id. Now each product can have multiple "codes" associated with that product. For example:

Id     |    Code
----------------------
0001   |   IN,ON,ME,OH
0002   |   ON,VI,AC,ZO
0003   |   QA,PS,OO,ME

What I'm trying to do is create a stored procedure so that I can pass in a codes like "ON,ME" and have it return every product that contains the "ON" or "ME" code. Since the codes are comma separated, I don't know how I can split those and search them. Is this possible using only TSQL?

Edit: It's a mission critical table. I don't have the authority to change it.

You should be storing the codes in a separate table, since you have a many to many relationship. If you separate them, then you will easily be able to check.

It would be possible to do in the type of system you have now, but would require text searching of the columns, with multiple searches per row to work, which will have huge performance problems as your data grows.

If you try to go down you current path : You will have to break apart your input string, because nothing guarantees the codes on each record are in the same order (or contiguous) as the input parameter. Then you would have to do a

Code LIKE '%IN%'
AND Code Like '%QA%'

query with an additional statement for every code you are checking for. Very inefficient.

The UDF idea below is also a good idea. However, depending on the size of your data and the frequency of queries and updates, you may have issues there as well.

would it be possible to create an additional table that is normalized that is synchronized on a scheduled basis (or based on a trigger) for you to query against?

First, let's make the original table to become like this:


Id   | Value
-----+------
0001 | IN
0001 | ME
0001 | OH
0001 | ON
0002 | AC
0002 | ON
0002 | VI
0002 | ZO
0003 | ME
0003 | OO
0003 | PS
0003 | QA

It is accomplished by parsing the comma separated values into rows. Then use the powerful CROSS APPLY keyword to join with the original table to retrieve it's Id. Next step is simply to query this CTE.


create function FnSplitToTable
(
    @param nvarchar(4000)
)
returns table as
return
    with
    Num(Pos) as -- list of positions, numbered from 1 to 4000, largest nvarchar
    (
        select cast(1 as int)
        union all 
        select cast(Pos + 1 as int) from Num where Pos < 4000
    )
    select substring(@Param, Pos, 
        charindex(',', @Param + ',', Pos) - Pos) as Value
        from Num where Pos <= convert(int, len(@Param)) 
        and substring(',' + @Param, Pos, 1) = ','
go


create proc ProcGetProductId
(
    @Codes nvarchar(4000)
)
as
with
Src
(
    Id,
    Code
)
as
(
    select '0001', 'IN,ON,ME,OH'
    union all
    select '0002', 'ON,VI,AC,ZO'
    union all
    select '0003', 'QA,PS,OO,ME'
),
Parse as
(
    select 
        s.Id, 
        f.Value
    from 
        Src as s
    cross apply
        FnSplitToTable(s.Code) as f 
)
select distinct 
    p.Id
from 
    Parse as p
join
    FnSplitToTable(@Codes) as f
on
    p.Value = f.Value
option (maxrecursion 4000)
go

exec ProcGetProductId 'IN,ME' -- returns 0001 & 0003

Everybody else seems very eager to tell you that you should not do this, although I don't see any explicit explanation for why not.

Apart from breaking the normalization rules, the reason is that you'll do a table-scan through all rows, since you can't have an index on the individual "values" in that column.

Simply put, there's no way for the database engine to keep some kind of quick-list of which rows contains the code 'AC', unless you either break it up into a separate table, or put it in a column by itself.

Now, if you have other criteria in your SELECT statements that will limit the number of rows down to some manageable number, then perhaps this will be ok, but otherwise I would, if you can, try to avoid this solution and do what others have already told you, split it up into a separate table.

Now, if you're stuck with this design, you can do a search using the following type of query:

...
WHERE ',' + Code + ',' LIKE '%,AC,%'

This will:

Match 'ON,VI,AC,ZO'
Not match 'ON,VI,TAC,ZO'

I don't know if the last one is a viable option in your case, if you only have 2-letter codes, then you can use just this:

...
WHERE Code LIKE '%AC%'

But again, this will perform horribly unless you limit the number of rows using other criteria.

Although all the previous posters are correct about the normalization of your db schema, you can do what you want using a "Table-Valued UDF" that takes a delimited string and returns a Table, with one row per value in the string... You can use this table as you would any other table in your stored proc , joining to it, etc... this will solve your immediate issue...

Here's a link to such a UDF: FN_Split UDF

Although the article talks about using it to pass a delimited list of data values in to a stored proc, you can use the same UDF to operate on a delimited string stored in a column of an existing table....

The way you are storing data breaks normalization rules. Only a single atomic value should be stored in each field. You should store each item in a single row.

Comma separated values in a database field

Related

Recent Posts