How do I create a Stream of regex matches?

Well, in Java 8, there is Pattern.splitAsStream which will provide a stream of items split by a delimiter pattern but unfortunately no support method for getting a stream of matches.

If you are going to implement such a Stream, I recommend implementing Spliterator directly rather than implementing and wrapping an Iterator. You may be more familiar with Iterator but implementing a simple Spliterator is straight-forward:

final class MatchItr extends Spliterators.AbstractSpliterator<String> {
    private final Matcher matcher;
    MatchItr(Matcher m) {
        super(m.regionEnd()-m.regionStart(), ORDERED|NONNULL);
        matcher=m;
    }
    public boolean tryAdvance(Consumer<? super String> action) {
        if(!matcher.find()) return false;
        action.accept(matcher.group());
        return true;
    }
}

You may consider overriding forEachRemaining with a straight-forward loop, though.


If I understand your attempt correctly, the solution should look more like:

Pattern pattern = Pattern.compile(
                 "[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)");

try(BufferedReader br=new BufferedReader(System.console().reader())) {

    br.lines()
      .flatMap(line -> StreamSupport.stream(new MatchItr(pattern.matcher(line)), false))
      .collect(Collectors.groupingBy(o->o, TreeMap::new, Collectors.counting()))
      .forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}

Java 9 provides a method Stream<MatchResult> results() directly on the Matcher. But for finding matches within a stream, there’s an even more convenient method on Scanner. With that, the implementation simplifies to

try(Scanner s = new Scanner(System.console().reader())) {
    s.findAll(pattern)
     .collect(Collectors.groupingBy(MatchResult::group,TreeMap::new,Collectors.counting()))
     .forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}

This answer contains a back-port of Scanner.findAll that can be used with Java 8.


Going off of Holger's solution, we can support arbitrary Matcher operations (such as getting the nth group) by having the user provide a Function<Matcher, String> operation. We can also hide the Spliterator as an implementation detail, so that callers can just work with the Stream directly. As a rule of thumb StreamSupport should be used by library code, rather than users.

public class MatcherStream {
  private MatcherStream() {}

  public static Stream<String> find(Pattern pattern, CharSequence input) {
    return findMatches(pattern, input).map(MatchResult::group);
  }

  public static Stream<MatchResult> findMatches(
      Pattern pattern, CharSequence input) {
    Matcher matcher = pattern.matcher(input);

    Spliterator<MatchResult> spliterator = new Spliterators.AbstractSpliterator<MatchResult>(
        Long.MAX_VALUE, Spliterator.ORDERED|Spliterator.NONNULL) {
      @Override
      public boolean tryAdvance(Consumer<? super MatchResult> action) {
        if(!matcher.find()) return false;
        action.accept(matcher.toMatchResult());
        return true;
      }};

    return StreamSupport.stream(spliterator, false);
  }
}

You can then use it like so:

MatcherStream.find(Pattern.compile("\\w+"), "foo bar baz").forEach(System.out::println);

Or for your specific task (borrowing again from Holger):

try(BufferedReader br = new BufferedReader(System.console().reader())) {
  br.lines()
    .flatMap(line -> MatcherStream.find(pattern, line))
    .collect(Collectors.groupingBy(o->o, TreeMap::new, Collectors.counting()))
    .forEach((k, v) -> System.out.printf("%s\t%s\n", k, v));
}

If you want to use a Scanner together with regular expressions using the findWithinHorizon method you could also convert a regular expression into a stream of strings. Here we use a stream builder which is very convenient to use during a conventional while loop.

Here is an example:

private Stream<String> extractRulesFrom(String text, Pattern pattern, int group) {
    Stream.Builder<String> builder = Stream.builder();
    try(Scanner scanner = new Scanner(text)) {
        while (scanner.findWithinHorizon(pattern, 0) != null) {
            builder.accept(scanner.match().group(group));
        }
    }
    return builder.build();
}