Is it possible to make efficient queries that use the complete regular expression feature set.

If not Microsoft really should consider that feature.


Solution 1:

For SQL Server 2000 (and any other 32 bit edition of SQL Server), there is xp_pcre, which introduces Perl compatible regular expressions as a set of extended stored procedures. I've used it, it works.

The more recent versions give you direct access to the .NET integrated regular expressions (this link seems to be dead, here is another one: MSDN: How to: Work with CLR Database Objects).

Solution 2:

The answer is no, not in the general case, although it might depend on what you mean by efficient. For these purposes, I'll use the following definition: 'Makes effective use of indexes and joins in a sensible order' which is probably as good as any.

In this case, 'Efficient' queries are 's-arg'-able, which means that they can use index lookups to narrow down search predicates. Equalities (t-joins) and simple inequalities can do this. 'AND' predicates can also do this. After that, we get into table, index and range scanning - i.e. operations that have to do record-by-record (or index-keyby index-key) comparisons.

Sontek's answer describes a method of in-lining regexp functionality into a query, but the operations still have to do comparisons on a record by record basis. Wrapping it up in a function would allow a function-based index where the result of a calculation is materialised in the index (Oracle supports this and you can get equivalent functionality in SQL Server by using the sort of tricks discussed in this article). However, you could not do this for an arbitrary regexp.

In the general case, the semantics of a regular expression do not lend themselves to pruning match sets in the sort of way that an index does, so integrating rexegp support into the query optimiser is probably not possible.

Solution 3:

Check out this and this. They are great posts on how to do it.

Solution 4:

I would love to have the ability to natively call regular expressions in SQL Server for ad hoc queries and use in stored procedures. Our DBA's won't allow us to create CLR functions so I have been using LINQ Pad as a kind of poor man's query editor for the ad hoc stuff. It is especially useful when working with structured data such as JSON or XML that has been saved to the database.

And I agree that it seems like an oversight that there is no regular expression support, it seems like an obvious feature for a query language. Hopefully we will see it in a future version but people have been asking for it for a long time and it hasn't made it's way into the product yet.

The most frequent reason I have seen against it is that a poorly formed expression can cause catastrophic backtracking which in .NET will not abort and almost always requires the machine to be restarted. Maybe once they address that in the framework we will see it included in a future version of SQL Server.