Understanding Spliterator, Collector and Stream in Java 8
I am having trouble understanding the Stream
interface in Java 8, especially where it has to do with the Spliterator
and Collector
interfaces. My problem is that I simply can't understand Spliterator
and the Collector
interfaces yet, and as a result, the Stream
interface is still somewhat obscure to me.
What exactly is a Spliterator
and a Collector
, and how can I use them? If I am willing to write my own Spliterator
or Collector
(and probably my own Stream
in that process), what should I do and not do?
I read some examples scattered around the web, but since everything here is still new and subject to changes, examples and tutorials are still very sparse.
You should almost certainly never have to deal with Spliterator
as a user; it should only be necessary if you're writing Collection
types yourself and also intending to optimize parallelized operations on them.
For what it's worth, a Spliterator
is a way of operating over the elements of a collection in a way that it's easy to split off part of the collection, e.g. because you're parallelizing and want one thread to work on one part of the collection, one thread to work on another part, etc.
You should essentially never be saving values of type Stream
to a variable, either. Stream
is sort of like an Iterator
, in that it's a one-time-use object that you'll almost always use in a fluent chain, as in the Javadoc example:
int sum = widgets.stream()
.filter(w -> w.getColor() == RED)
.mapToInt(w -> w.getWeight())
.sum();
Collector
is the most generalized, abstract possible version of a "reduce" operation a la map/reduce; in particular, it needs to support parallelization and finalization steps. Examples of Collector
s include:
- summing, e.g.
Collectors.reducing(0, (x, y) -> x + y)
- StringBuilder appending, e.g.
Collector.of(StringBuilder::new, StringBuilder::append, StringBuilder::append, StringBuilder::toString)
Spliterator
basically means "splittable Iterator".
Single thread can traverse/process the entire Spliterator itself, but the Spliterator also has a method trySplit()
which will "split off" a section for someone else (typically, another thread) to process -- leaving the current spliterator with less work.
Collector
combines the specification of a reduce
function (of map-reduce fame), with an initial value, and a function to combine two results (thus enabling results from Spliterated streams of work, to be combined.)
For example, the most basic Collector would have an initial vaue of 0, add an integer onto an existing result, and would 'combine' two results by adding them. Thus summing a spliterated stream of integers.
See:
Spliterator.trySplit()
Collector<T,A,R>
The following are examples of using the predefined collectors to perform common mutable reduction tasks:
// Accumulate names into a List
List<String> list = people.stream().map(Person::getName).collect(Collectors.toList());
// Accumulate names into a TreeSet
Set<String> set = people.stream().map(Person::getName).collect(Collectors.toCollection(TreeSet::new));
// Convert elements to strings and concatenate them, separated by commas
String joined = things.stream()
.map(Object::toString)
.collect(Collectors.joining(", "));
// Compute sum of salaries of employee
int total = employees.stream()
.collect(Collectors.summingInt(Employee::getSalary)));
// Group employees by department
Map<Department, List<Employee>> byDept
= employees.stream()
.collect(Collectors.groupingBy(Employee::getDepartment));
// Compute sum of salaries by department
Map<Department, Integer> totalByDept
= employees.stream()
.collect(Collectors.groupingBy(Employee::getDepartment,
Collectors.summingInt(Employee::getSalary)));
// Partition students into passing and failing
Map<Boolean, List<Student>> passingFailing =
students.stream()
.collect(Collectors.partitioningBy(s -> s.getGrade() >= PASS_THRESHOLD));
Interface Spliterator
- is a core feature of Streams.
The stream()
and parallelStream()
default methods are presented in the Collection
interface. These methods use the Spliterator through the call to the spliterator()
:
...
default Stream<E> stream() {
return StreamSupport.stream(spliterator(), false);
}
default Stream<E> parallelStream() {
return StreamSupport.stream(spliterator(), true);
}
...
Spliterator is an internal iterator that breaks the stream into the smaller parts. These smaller parts can be processed in parallel.
Among other methods, there are two most important to understand the Spliterator:
boolean tryAdvance(Consumer<? super T> action)
Unlike theIterator
, it tries to perform the operation with the next element. If operation executed successfully, the method returnstrue
. Otherwise, returnsfalse
- that means that there is absence of element or end of the stream.Spliterator<T> trySplit()
This method allows to split a set of data into a many smaller sets according to one or another criteria (file size, number of lines, etc).