How do I create a Stream of regex matches?
Well, in Java 8, there is Pattern.splitAsStream
which will provide a stream of items split by a delimiter pattern but unfortunately no support method for getting a stream of matches.
If you are going to implement such a Stream
, I recommend implementing Spliterator
directly rather than implementing and wrapping an Iterator
. You may be more familiar with Iterator
but implementing a simple Spliterator
is straight-forward:
final class MatchItr extends Spliterators.AbstractSpliterator<String> {
private final Matcher matcher;
MatchItr(Matcher m) {
super(m.regionEnd()-m.regionStart(), ORDERED|NONNULL);
matcher=m;
}
public boolean tryAdvance(Consumer<? super String> action) {
if(!matcher.find()) return false;
action.accept(matcher.group());
return true;
}
}
You may consider overriding forEachRemaining
with a straight-forward loop, though.
If I understand your attempt correctly, the solution should look more like:
Pattern pattern = Pattern.compile(
"[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)");
try(BufferedReader br=new BufferedReader(System.console().reader())) {
br.lines()
.flatMap(line -> StreamSupport.stream(new MatchItr(pattern.matcher(line)), false))
.collect(Collectors.groupingBy(o->o, TreeMap::new, Collectors.counting()))
.forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}
Java 9 provides a method Stream<MatchResult> results()
directly on the Matcher
. But for finding matches within a stream, there’s an even more convenient method on Scanner
. With that, the implementation simplifies to
try(Scanner s = new Scanner(System.console().reader())) {
s.findAll(pattern)
.collect(Collectors.groupingBy(MatchResult::group,TreeMap::new,Collectors.counting()))
.forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}
This answer contains a back-port of Scanner.findAll
that can be used with Java 8.
Going off of Holger's solution, we can support arbitrary Matcher
operations (such as getting the nth group) by having the user provide a Function<Matcher, String>
operation. We can also hide the Spliterator
as an implementation detail, so that callers can just work with the Stream
directly. As a rule of thumb StreamSupport
should be used by library code, rather than users.
public class MatcherStream {
private MatcherStream() {}
public static Stream<String> find(Pattern pattern, CharSequence input) {
return findMatches(pattern, input).map(MatchResult::group);
}
public static Stream<MatchResult> findMatches(
Pattern pattern, CharSequence input) {
Matcher matcher = pattern.matcher(input);
Spliterator<MatchResult> spliterator = new Spliterators.AbstractSpliterator<MatchResult>(
Long.MAX_VALUE, Spliterator.ORDERED|Spliterator.NONNULL) {
@Override
public boolean tryAdvance(Consumer<? super MatchResult> action) {
if(!matcher.find()) return false;
action.accept(matcher.toMatchResult());
return true;
}};
return StreamSupport.stream(spliterator, false);
}
}
You can then use it like so:
MatcherStream.find(Pattern.compile("\\w+"), "foo bar baz").forEach(System.out::println);
Or for your specific task (borrowing again from Holger):
try(BufferedReader br = new BufferedReader(System.console().reader())) {
br.lines()
.flatMap(line -> MatcherStream.find(pattern, line))
.collect(Collectors.groupingBy(o->o, TreeMap::new, Collectors.counting()))
.forEach((k, v) -> System.out.printf("%s\t%s\n", k, v));
}
If you want to use a Scanner
together with regular expressions using the findWithinHorizon
method you could also convert a regular expression into a stream of strings.
Here we use a stream builder which is very convenient to use during a conventional while
loop.
Here is an example:
private Stream<String> extractRulesFrom(String text, Pattern pattern, int group) {
Stream.Builder<String> builder = Stream.builder();
try(Scanner scanner = new Scanner(text)) {
while (scanner.findWithinHorizon(pattern, 0) != null) {
builder.accept(scanner.match().group(group));
}
}
return builder.build();
}