Regular expression to match common SQL syntax?

I was writing some Unit tests last week for a piece of code that generated some SQL statements.

I was trying to figure out a regex to match SELECT, INSERT and UPDATE syntax so I could verify that my methods were generating valid SQL, and after 3-4 hours of searching and messing around with various regex editors I gave up.

I managed to get partial matches but because a section in quotes can contain any characters it quickly expands to match the whole statement.

Any help would be appreciated, I'm not very good with regular expressions but I'd like to learn more about them.

By the way it's C# RegEx that I'm after.

Clarification

I don't want to need access to a database as this is part of a Unit test and I don't wan't to have to maintain a database to test my code. which may live longer than the project.


Regular expressions can match languages only a finite state automaton can parse, which is very limited, whereas SQL is a syntax. It can be demonstrated you can't validate SQL with a regex. So, you can stop trying.


SQL is a type-2 grammar, it is too powerful to be described by regular expressions. It's the same as if you decided to generate C# code and then validate it without invoking a compiler. Database engine in general is too complex to be easily stubbed.

That said, you may try ANTLR's SQL grammars.


As far as I know this is beyond regex and your getting close to the dark arts of BnF and compilers.

http://savage.net.au/SQL/

Same things happens to people who want to do correct syntax highlighting. You start cramming things into regex and then you end up writing a compiler...