How to detect a floating point number using a regular expression

Just make both the decimal dot and the E-then-exponent part optional:

[1-9][0-9]*\.?[0-9]*([Ee][+-]?[0-9]+)?

I don't see why you don't want a leading [+-]? to capture a possible sign too, but, whatever!-)

Edit: there might in fact be no digits left of the decimal point (in which case I imagine there must be the decimal point and 1+ digits after it!), so a vertical-bar (alternative) is clearly needed:

(([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?

[This is the answer from the professor]

Define:

N = [1-9]
D = 0 | N
E = [eE] [+-]? D+
L = 0 | ( N D* )

Then floating point numbers can be matched with:

( ( L . D* | . D+ ) E? ) | ( L E )

It was also acceptable to use D+ rather than L, and to prepend [+-]?.

A common mistake was to write D* . D*, but this can match just '.'.

[Edit]
Someone asked about a leading sign; I should have asked him why it was excluded but never got the chance. Since this was part of the lecture on grammars, my guess is that either it made the problem easier (not likely) or there is a small detail in parsing where you divide the problem set such that the floating point value, regardless of sign, is the focus (possible).

If you are parsing through an expression, e.g.

-5.04e-10 + 3.14159E10

the sign of the floating point value is part of the operation to be applied to the value and not an attribute of the number itself. In other words,

subtract (5.04e-10)
add (3.14159E10)

to form the result of the expression. While I'm sure mathematicians may argue the point, remember this was from a lecture on parsing.


http://www.regular-expressions.info/floatingpoint.html


Here is what I turned in.

(([1-9]+\.[0-9]*)|([1-9]*\.[0-9]+)|([1-9]+))([eE][-+]?[0-9]+)?

To make it easier to discuss, I'll label the sections

( ([1-9]+ \. [0-9]* ) | ( [1-9]* \. [0-9]+ ) | ([1-9]+))  ( [eE] [-+]? [0-9]+ )?     
--------------------------------------------------------  ----------------------    
                           A                                       B

A: matches everything up to the 'e/E'
B: matches the scientific notation

Breaking down A we get three parts

 ( ([1-9]+ \. [0-9]* ) | ( [1-9]* \. [0-9]+ ) | ([1-9]+) )
   ----------1----------   ---------2----------   ---3----

Part 1: Allows 1 or more digits from 1-9, decimal, 0 or more digits after the decimal (target 1)
Part 2: Allows 0 or more digits from 1-9, decimal, 1 or more digits after the decimal (target 2)
Part 3: Allows 1 or more digits from 1-9 with no decimal (see #4 in target list)


Breaking down B we get 4 basic parts

 ( [eE] [-+]? [0-9]+  )?   
   ..--1- --2-- --3--- -4- .. 

Part 1: requires either upper or lowercase 'e' for scientific notation (e.g. targets 8 & 9)
Part 2: allows an optional positive or negative sign for the exponent (e.g. targets 4, 5, & 6)
Part 3: allows 1 or more digits for the exponent (target 8)
Part 4: allows the scientific notation to be optional as a group (target 3)