How do I get the match data for all occurrences of a Ruby regular expression in a string?
I need the MatchData
for each occurrence of a regular expression in a string. This is different than the scan method suggested in Match All Occurrences of a Regex, since that only gives me an array of strings (I need the full MatchData, to get begin and end information, etc).
input = "abc12def34ghijklmno567pqrs"
numbers = /\d+/
numbers.match input # #<MatchData "12"> (only the first match)
input.scan numbers # ["12", "34", "567"] (all matches, but only the strings)
I suspect there is some method that I've overlooked. Suggestions?
Solution 1:
You want
"abc12def34ghijklmno567pqrs".to_enum(:scan, /\d+/).map { Regexp.last_match }
which gives you
[#<MatchData "12">, #<MatchData "34">, #<MatchData "567">]
The "trick" is, as you see, to build an enumerator in order to get each last_match
.
Solution 2:
My current solution is to add an each_match
method to Regexp:
class Regexp
def each_match(str)
start = 0
while matchdata = self.match(str, start)
yield matchdata
start = matchdata.end(0)
end
end
end
Now I can do:
numbers.each_match input do |match|
puts "Found #{match[0]} at #{match.begin(0)} until #{match.end(0)}"
end
Tell me there is a better way.
Solution 3:
I’ll put it here to make the code available via a search:
input = "abc12def34ghijklmno567pqrs"
numbers = /\d+/
input.gsub(numbers) { |m| p $~ }
The result is as requested:
⇒ #<MatchData "12">
⇒ #<MatchData "34">
⇒ #<MatchData "567">
See "input.gsub(numbers) { |m| p $~ } Matching data in Ruby for all occurrences in a string" for more information.