What is a word boundary in regex?
I'm trying to use regexes to match space-separated numbers.
I can't find a precise definition of \b
("word boundary").
I had assumed that -12
would be an "integer word" (matched by \b\-?\d+\b
) but it appears that this does not work. I'd be grateful to know of ways of .
[I am using Java regexes in Java 1.6]
Example:
Pattern pattern = Pattern.compile("\\s*\\b\\-?\\d+\\s*");
String plus = " 12 ";
System.out.println(""+pattern.matcher(plus).matches());
String minus = " -12 ";
System.out.println(""+pattern.matcher(minus).matches());
pattern = Pattern.compile("\\s*\\-?\\d+\\s*");
System.out.println(""+pattern.matcher(minus).matches());
This returns:
true
false
true
Solution 1:
A word boundary, in most regex dialects, is a position between \w
and \W
(non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ([0-9A-Za-z_]
).
So, in the string "-12"
, it would match before the 1 or after the 2. The dash is not a word character.
Solution 2:
In the course of learning regular expression, I was really stuck in the metacharacter which is \b
. I indeed didn't comprehend its meaning while I was asking myself "what it is, what it is" repetitively. After some attempts by using the website, I watch out the pink vertical dashes at the every beginning of words and at the end of words. I got it its meaning well at that time. It's now exactly word(\w
)-boundary.
My view is merely to immensely understanding-oriented. Logic behind of it should be examined from another answers.
Solution 3:
A word boundary can occur in one of three positions:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
Word characters are alpha-numeric; a minus sign is not. Taken from Regex Tutorial.
Solution 4:
I would like to explain Alan Moore's answer
A word boundary is a position that is either preceded by a word character and not followed by one or followed by a word character and not preceded by one.
Suppose I have a string "This is a cat, and she's awesome", and I want to replace all occurrences of the letter 'a' only if this letter ('a') exists at the "Boundary of a word",
In other words: the letter a
inside 'cat' should not be replaced.
So I'll perform regex (in Python) as
re.sub(r"\ba","e", myString.strip())
//replace a
with e
so the output will be
This is a cat and she's awesome ->
This is e cat end she's ewesome //Result
Solution 5:
A word boundary is a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one.