Using regular expression, I need to match everything except for a certain date format
Solution 1:
Let's work with this expression:
\d\d[.]\d\d[.]\d\d\d\d
If you are going to use the date value later, you're going to want to capture the matching part. You can do this by putting round brackets around it, like this:
(\d\d[.]\d\d[.]\d\d\d\d)
Now, let's try to match anything on a line, and then the date. "Any character" in regular expressions is .
and any number of these is .*
. So we now have:
(.*)(\d\d[.]\d\d[.]\d\d\d\d)
This will match anything and then the date. You will find that your "anything" is captured in group 1, and the date in group 2. If it doesn't match, there is no date on the line.
The problem comes when you have more than one date on the line. By preference anything is a much as possible, so if there are two dates, you'll find one of them as part of the group 1 (anything) and the second in group 2. If this isn't what you want, you can put a ?
after the *
to make it non-greedy, and you get this:
(.*?)(\d\d[.]\d\d[.]\d\d\d\d)
and then, if it matches, group 2 will be the first date available, and group 1 will be the stuff on the line before it.
Lastly, you can use whichever language you are using to apply this repeatedly to a line until it doesn't match: that way each time you will get "the stuff before the date" in group 1, and the date in group 2.