When should I use streams?

I just came across a question when using a List and its stream() method. While I know how to use them, I'm not quite sure about when to use them.

For example, I have a list, containing various paths to different locations. Now, I'd like to check whether a single, given path contains any of the paths specified in the list. I'd like to return a boolean based on whether or not the condition was met.

This of course, is not a hard task per se. But I wonder whether I should use streams, or a for(-each) loop.

The List

private static final List<String> EXCLUDE_PATHS = Arrays.asList(
    "my/path/one",
    "my/path/two"
);

Example using Stream:

private boolean isExcluded(String path) {
    return EXCLUDE_PATHS.stream()
                        .map(String::toLowerCase)
                        .filter(path::contains)
                        .collect(Collectors.toList())
                        .size() > 0;
}

Example using for-each loop:

private boolean isExcluded(String path){
    for (String excludePath : EXCLUDE_PATHS) {
        if (path.contains(excludePath.toLowerCase())) {
            return true;
        }
    }
    return false;
}

Note that the path parameter is always lowercase.

My first guess is that the for-each approach is faster, because the loop would return immediately, if the condition is met. Whereas the stream would still loop over all list entries in order to complete filtering.

Is my assumption correct? If so, why (or rather when) would I use stream() then?


Solution 1:

Your assumption is correct. Your stream implementation is slower than the for-loop.

This stream usage should be as fast as the for-loop though:

EXCLUDE_PATHS.stream()  
    .map(String::toLowerCase)
    .anyMatch(path::contains);

This iterates through the items, applying String::toLowerCase and the filter to the items one-by-one and terminating at the first item that matches.

Both collect() & anyMatch() are terminal operations. anyMatch() exits at the first found item, though, while collect() requires all items to be processed.

Solution 2:

The decision whether to use Streams or not should not be driven by performance consideration, but rather by readability. When it really comes to performance, there are other considerations.

With your .filter(path::contains).collect(Collectors.toList()).size() > 0 approach, you are processing all elements and collecting them into a temporary List, before comparing the size, still, this hardly ever matters for a Stream consisting of two elements.

Using .map(String::toLowerCase).anyMatch(path::contains) can save CPU cycles and memory, if you have a substantially larger number of elements. Still, this converts each String to its lowercase representation, until a match is found. Obviously, there is a point in using

private static final List<String> EXCLUDE_PATHS =
    Stream.of("my/path/one", "my/path/two").map(String::toLowerCase)
          .collect(Collectors.toList());

private boolean isExcluded(String path) {
    return EXCLUDE_PATHS.stream().anyMatch(path::contains);
}

instead. So you don’t have to repeat the conversion to lowcase in every invocation of isExcluded. If the number of elements in EXCLUDE_PATHS or the lengths of the strings becomes really large, you may consider using

private static final List<Predicate<String>> EXCLUDE_PATHS =
    Stream.of("my/path/one", "my/path/two").map(String::toLowerCase)
          .map(s -> Pattern.compile(s, Pattern.LITERAL).asPredicate())
          .collect(Collectors.toList());

private boolean isExcluded(String path){
    return EXCLUDE_PATHS.stream().anyMatch(p -> p.test(path));
}

Compiling a string as regex pattern with the LITERAL flag, makes it behave just like ordinary string operations, but allows the engine to spent some time in preparation, e.g. using the Boyer Moore algorithm, to be more efficient when it comes to the actual comparison.

Of course, this only pays off if there are enough subsequent tests to compensate the time spent in preparation. Determining whether this will be the case, is one of the actual performance considerations, besides the first question whether this operation will ever be performance critical at all. Not the question whether to use Streams or for loops.

By the way, the code examples above keep the logic of your original code, which looks questionable to me. Your isExcluded method returns true, if the specified path contains any of the elements in list, so it returns true for /some/prefix/to/my/path/one, as well as my/path/one/and/some/suffix or even /some/prefix/to/my/path/one/and/some/suffix.

Even dummy/path/onerous is considered fulfilling the criteria as it contains the string my/path/one

Solution 3:

Yeah. You are right. Your stream approach will have some overhead. But you may use such a construction:

private boolean isExcluded(String path) {
    return  EXCLUDE_PATHS.stream().map(String::toLowerCase).anyMatch(path::contains);
}

The main reason to use streams is that they make your code simpler and easy to read.