Java 8 Stream, getting head and tail

Java 8 introduced a Stream class that resembles Scala's Stream, a powerful lazy construct using which it is possible to do something like this very concisely:

def from(n: Int): Stream[Int] = n #:: from(n+1)

def sieve(s: Stream[Int]): Stream[Int] = {
  s.head #:: sieve(s.tail filter (_ % s.head != 0))
}

val primes = sieve(from(2))

primes takeWhile(_ < 1000) print  // prints all primes less than 1000

I wondered if it is possible to do this in Java 8, so I wrote something like this:

IntStream from(int n) {
    return IntStream.iterate(n, m -> m + 1);
}

IntStream sieve(IntStream s) {
    int head = s.findFirst().getAsInt();
    return IntStream.concat(IntStream.of(head), sieve(s.skip(1).filter(n -> n % head != 0)));
}

IntStream primes = sieve(from(2));

Fairly simple, but it produces java.lang.IllegalStateException: stream has already been operated upon or closed because both findFirst() and skip() are terminal operations on Stream which can be done only once.

I don't really have to use up the stream twice since all I need is the first number in the stream and the rest as another stream, i.e. equivalent of Scala's Stream.head and Stream.tail. Is there a method in Java 8 Stream that I can use to achieve this?

Thanks.


Solution 1:

Even if you hadn’t the problem that you can’t split an IntStream, you code didn’t work because you are invoking your sieve method recursively instead of lazily. So you had an infinity recursion before you could query your resulting stream for the first value.

Splitting an IntStream s into a head and a tail IntStream (which has not yet consumed) is possible:

PrimitiveIterator.OfInt it = s.iterator();
int head = it.nextInt();
IntStream tail = IntStream.generate(it::next).filter(i -> i % head != 0);

At this place you need a construct of invoking sieve on the tail lazily. Stream does not provide that; concat expects existing stream instances as arguments and you can’t construct a stream invoking sieve lazily with a lambda expression as lazy creation works with mutable state only which lambda expressions do not support. If you don’t have a library implementation hiding the mutable state you have to use a mutable object. But once you accept the requirement of mutable state, the solution can be even easier than your first approach:

IntStream primes = from(2).filter(i -> p.test(i)).peek(i -> p = p.and(v -> v % i != 0));

IntPredicate p = x -> true;

IntStream from(int n)
{
  return IntStream.iterate(n, m -> m + 1);
}

This will recursively create a filter but in the end it doesn’t matter whether you create a tree of IntPredicates or a tree of IntStreams (like with your IntStream.concat approach if it did work). If you don’t like the mutable instance field for the filter you can hide it in an inner class (but not in a lambda expression…).

Solution 2:

My StreamEx library has now headTail() operation which solves the problem:

public static StreamEx<Integer> sieve(StreamEx<Integer> input) {
    return input.headTail((head, tail) -> 
        sieve(tail.filter(n -> n % head != 0)).prepend(head));
}

The headTail method takes a BiFunction which will be executed at most once during the stream terminal operation execution. So this implementation is lazy: it does not compute anything until traversal starts and computes only as much prime numbers as requested. The BiFunction receives a first stream element head and the stream of the rest elements tail and can modify the tail in any way it wants. You may use it with predefined input:

sieve(IntStreamEx.range(2, 1000).boxed()).forEach(System.out::println);

But infinite stream work as well

sieve(StreamEx.iterate(2, x -> x+1)).takeWhile(x -> x < 1000)
     .forEach(System.out::println);
// Not the primes till 1000, but 1000 first primes
sieve(StreamEx.iterate(2, x -> x+1)).limit(1000).forEach(System.out::println);

There's also alternative solution using headTail and predicate concatenation:

public static StreamEx<Integer> sieve(StreamEx<Integer> input, IntPredicate isPrime) {
    return input.headTail((head, tail) -> isPrime.test(head) 
            ? sieve(tail, isPrime.and(n -> n % head != 0)).prepend(head)
            : sieve(tail, isPrime));
}

sieve(StreamEx.iterate(2, x -> x+1), i -> true).limit(1000).forEach(System.out::println);

It interesting to compare recursive solutions: how many primes they capable to generate.

@John McClean solution (StreamUtils)

John McClean solutions are not lazy: you cannot feed them with infinite stream. So I just found by trial-and-error the maximal allowed upper bound (17793) (after that StackOverflowError occurs):

public void sieveTest(){
    sieve(IntStream.range(2, 17793).boxed()).forEach(System.out::println);
}

@John McClean solution (Streamable)

public void sieveTest2(){
    sieve(Streamable.range(2, 39990)).forEach(System.out::println);
}

Increasing upper limit above 39990 results in StackOverflowError.

@frhack solution (LazySeq)

LazySeq<Integer> ints = integers(2);
LazySeq primes = sieve(ints); // sieve method from @frhack answer
primes.forEach(p -> System.out.println(p));

Result: stuck after prime number = 53327 with enormous heap allocation and garbage collection taking more than 90%. It took several minutes to advance from 53323 to 53327, so waiting more seems impractical.

@vidi solution

Prime.stream().forEach(System.out::println);

Result: StackOverflowError after prime number = 134417.

My solution (StreamEx)

sieve(StreamEx.iterate(2, x -> x+1)).forEach(System.out::println);

Result: StackOverflowError after prime number = 236167.

@frhack solution (rxjava)

Observable<Integer> primes = Observable.from(()->primesStream.iterator());
primes.forEach((x) -> System.out.println(x.toString()));            

Result: StackOverflowError after prime number = 367663.

@Holger solution

IntStream primes=from(2).filter(i->p.test(i)).peek(i->p=p.and(v->v%i!=0));
primes.forEach(System.out::println);

Result: StackOverflowError after prime number = 368089.

My solution (StreamEx with predicate concatenation)

sieve(StreamEx.iterate(2, x -> x+1), i -> true).forEach(System.out::println);

Result: StackOverflowError after prime number = 368287.


So three solutions involving predicate concatenation win, because each new condition adds only 2 more stack frames. I think, the difference between them is marginal and should not be considered to define a winner. However I like my first StreamEx solution more as it more similar to Scala code.

Solution 3:

The solution below does not do state mutations, except for the head/tail deconstruction of the stream.

The lazyness is obtained using IntStream.iterate. The class Prime is used to keep the generator state

    import java.util.PrimitiveIterator;
    import java.util.stream.IntStream;
    import java.util.stream.Stream;

    public class Prime {
        private final IntStream candidates;
        private final int current;

        private Prime(int current, IntStream candidates)
        {
            this.current = current;
            this.candidates = candidates;
        }

        private Prime next()
        {
            PrimitiveIterator.OfInt it = candidates.filter(n -> n % current != 0).iterator();

            int head = it.next();
            IntStream tail = IntStream.generate(it::next);

            return new Prime(head, tail);
        }

        public static Stream<Integer> stream() {
            IntStream possiblePrimes = IntStream.iterate(3, i -> i + 1);

            return Stream.iterate(new Prime(2, possiblePrimes), Prime::next)
                         .map(p -> p.current);
        }
    }

The usage would be this:

Stream<Integer> first10Primes = Prime.stream().limit(10)

Solution 4:

You can essentially implement it like this:

static <T> Tuple2<Optional<T>, Seq<T>> splitAtHead(Stream<T> stream) {
    Iterator<T> it = stream.iterator();
    return tuple(it.hasNext() ? Optional.of(it.next()) : Optional.empty(), seq(it));
}

In the above example, Tuple2 and Seq are types borrowed from jOOλ, a library that we developed for jOOQ integration tests. If you don't want any additional dependencies, you might as well implement them yourself:

class Tuple2<T1, T2> {
    final T1 v1;
    final T2 v2;

    Tuple2(T1 v1, T2 v2) {
        this.v1 = v1;
        this.v2 = v2;
    }

    static <T1, T2> Tuple2<T1, T2> tuple(T1 v1, T2 v2) {
        return new Tuple<>(v1, v2);
    }
}

static <T> Tuple2<Optional<T>, Stream<T>> splitAtHead(Stream<T> stream) {
    Iterator<T> it = stream.iterator();
    return tuple(
        it.hasNext() ? Optional.of(it.next()) : Optional.empty,
        StreamSupport.stream(Spliterators.spliteratorUnknownSize(
            it, Spliterator.ORDERED
        ), false)
    );
}