Why order matters in this RegEx with alternation?

Requirements for a TextBox control were to accept the following as valid inputs:

  1. A sequence of numbers.
  2. Literal string 'Number of rooms'.
  3. No value at all (left blank). Not specifying a value at all should allow for the RegularExpressionValidator to pass.

Following RegEx yielded the desired results (successfully validated the 3 types of inputs):

"Number of rooms|[0-9]*"

However, I couldn't come up with an explanation when a colleague asked why the following fails to validate when the string 'Number of rooms' is specified (requirement #2):

"[0-9]*|Number of rooms"

An explanation as to why the ordering of alternation matters in this case would be very insightful indeed.

UPDATE:

The second regex successfully matches the target string "Number of rooms" in console app as shown here. However, using the identical expression in aspx markup doesn't match when the input is "Number of rooms". Here's the relevant aspx markup:

<asp:TextBox runat="server" ID="textbox1" >
</asp:TextBox>

<asp:RegularExpressionValidator ID="RegularExpressionValidator1" 
EnableClientScript="false" runat="server" ControlToValidate="textbox1" 
ValidationExpression="[0-9]*|Number of rooms" 
ErrorMessage="RegularExpressionValidator"></asp:RegularExpressionValidator>

<asp:Button ID="Button1" runat="server" Text="Button" />

Solution 1:

The order matters since that is the order which the Regex engine will try to match.

Case 1: Number of rooms|[0-9]*

In this case the regex engine will first try to match the text "Number of room". If this fails will then try to match numbers or nothing.

Case 2: [0-9]*|Number of rooms:

In this case the engine will first try to match number or nothing. But nothing will always match. In this case it never needs to try "Number of rooms"

This is kind of like the || operator in C#. Once the left side matches the right side is ignored.

Update: To answer your second question. It behaves differently with the RegularExpressionValidator because that is doing more than just checking for a match.

// .....
Match m = Regex.Match(controlValue, ValidationExpression);
return(m.Success && m.Index == 0 && m.Length == controlValue.Length); 
// .....

It is checking for a match as well as making sure the length of the match is the whole string. This rules out partial or empty matches.

Solution 2:

The point is that the [0-9]* at the beginning is matching empty strings if you specify that first.
If you specify that the whole string should be digits, then it should work:

^[0-9]*$|Number of rooms

Unless you specify ^ and $, to indicate that the whole string must be a match, an empty string will be matched at the beginning of "Number of rooms", and at that point the second alternative will not be tried out.
I hope this answers your question in the comment, I'm not sure if it's clear...

Solution 3:

You probably wanted to use regex Number of rooms|[0-9]+ or [0-9]+|Number of rooms, because pattern [0-9]* (with star) will always match at least empty string (* means {0,}, so "zero or more...").