Grouping by object value, counting and then setting group key by maximum object attribute

I have managed to write a solution using Java 8 Streams API that first groups a list of object Route by its value and then counts the number of objects in each group. It returns a mapping Route -> Long. Here is the code:

Map<Route, Long> routesCounted = routes.stream()
                .collect(Collectors.groupingBy(gr -> gr, Collectors.counting()));

And the Route class:

public class Route implements Comparable<Route> {
    private long lastUpdated;
    private Cell startCell;
    private Cell endCell;
    private int dropOffSize;

    public Route(Cell startCell, Cell endCell, long lastUpdated) {
        this.startCell = startCell;
        this.endCell = endCell;
        this.lastUpdated = lastUpdated;
    }

    public long getLastUpdated() {
        return this.lastUpdated;
    }

    public void setLastUpdated(long lastUpdated) {
        this.lastUpdated = lastUpdated;
    }

    public Cell getStartCell() {
        return startCell;
    }

    public void setStartCell(Cell startCell) {
        this.startCell = startCell;
    }

    public Cell getEndCell() {
        return endCell;
    }

    public void setEndCell(Cell endCell) {
        this.endCell = endCell;
    }

    public int getDropOffSize() {
        return this.dropOffSize;
    }

    public void setDropOffSize(int dropOffSize) {
        this.dropOffSize = dropOffSize;
    }

    @Override
    /**
     * Compute hash code by using Apache Commons Lang HashCodeBuilder.
     */
    public int hashCode() {
        return new HashCodeBuilder(43, 59)
                .append(this.startCell)
                .append(this.endCell)
                .toHashCode();
    }

    @Override
    /**
     * Compute equals by using Apache Commons Lang EqualsBuilder.
     */
    public boolean equals(Object obj) {
        if (!(obj instanceof Route))
            return false;
        if (obj == this)
            return true;

        Route route = (Route) obj;
        return new EqualsBuilder()
                .append(this.startCell, route.startCell)
                .append(this.endCell, route.endCell)
                .isEquals();
    }

    @Override
    public int compareTo(Route route) {
        if (this.dropOffSize < route.dropOffSize)
            return -1;
        else if (this.dropOffSize > route.dropOffSize)
            return 1;
        else {
                // if contains drop off timestamps, order by last timestamp in drop off
                // the highest timestamp has preceding
            if (this.lastUpdated < route.lastUpdated)
                return -1;
            else if (this.lastUpdated > route.lastUpdated)
                return 1;
            else
                return 0;
        }
    }
}

What I would like to additionally achieve is that the key for each group would be the one with the largest lastUpdated value. I was already looking at this solution but I do not know how to combine the counting and grouping by value and Route maximum lastUpdated value. Here is the example data of what I want to achieve:

EXAMPLE:

List<Route> routes = new ArrayList<>();
routes.add(new Route(new Cell(1, 2), new Cell(2, 1), 1200L));
routes.add(new Route(new Cell(3, 2), new Cell(2, 5), 1800L));
routes.add(new Route(new Cell(1, 2), new Cell(2, 1), 1700L));

SHOULD BE CONVERTED TO:

Map<Route, Long> routesCounted = new HashMap<>();
routesCounted.put(new Route(new Cell(1, 2), new Cell(2, 1), 1700L), 2);
routesCounted.put(new Route(new Cell(3, 2), new Cell(2, 5), 1800L), 1);

Notice that the key for mapping, which counted 2 Routes is the one with the largest lastUpdated value.


Here's one approach. First group into lists and then process the lists into the values you actually want:

import static java.util.Comparator.comparingLong;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.toMap;


Map<Route,Integer> routeCounts = routes.stream()
        .collect(groupingBy(x -> x))
        .values().stream()
        .collect(toMap(
            lst -> lst.stream().max(comparingLong(Route::getLastUpdated)).get(),
            List::size
        ));

You can define an abstract "library" method which combines two collectors into one:

static <T, A1, A2, R1, R2, R> Collector<T, ?, R> pairing(Collector<T, A1, R1> c1, 
        Collector<T, A2, R2> c2, BiFunction<R1, R2, R> finisher) {
    EnumSet<Characteristics> c = EnumSet.noneOf(Characteristics.class);
    c.addAll(c1.characteristics());
    c.retainAll(c2.characteristics());
    c.remove(Characteristics.IDENTITY_FINISH);
    return Collector.of(() -> new Object[] {c1.supplier().get(), c2.supplier().get()},
            (acc, v) -> {
                c1.accumulator().accept((A1)acc[0], v);
                c2.accumulator().accept((A2)acc[1], v);
            },
            (acc1, acc2) -> {
                acc1[0] = c1.combiner().apply((A1)acc1[0], (A1)acc2[0]);
                acc1[1] = c2.combiner().apply((A2)acc1[1], (A2)acc2[1]);
                return acc1;
            },
            acc -> {
                R1 r1 = c1.finisher().apply((A1)acc[0]);
                R2 r2 = c2.finisher().apply((A2)acc[1]);
                return finisher.apply(r1, r2);
            }, c.toArray(new Characteristics[c.size()]));
}

After that the actual operation may look like this:

Map<Route, Long> result = routes.stream()
        .collect(Collectors.groupingBy(Function.identity(),
            pairing(Collectors.maxBy(Comparator.comparingLong(Route::getLastUpdated)), 
                    Collectors.counting(), 
                    (route, count) -> new AbstractMap.SimpleEntry<>(route.get(), count))
            ))
        .values().stream().collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue()));

Update: such collector is available in my StreamEx library: MoreCollectors.pairing(). Also similar collector is implemented in jOOL library, so you can use Tuple.collectors instead of pairing.


Changed equals and hashcode to be dependent only on start cell and end cell.

@Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Cell cell = (Cell) o;

        if (a != cell.a) return false;
        if (b != cell.b) return false;

        return true;
    }

    @Override
    public int hashCode() {
        int result = a;
        result = 31 * result + b;
        return result;
    }

My solution looks like this:

Map<Route, Long> routesCounted = routes.stream()
            .sorted((r1,r2)-> (int)(r2.lastUpdated - r1.lastUpdated))
            .collect(Collectors.groupingBy(gr -> gr, Collectors.counting()));

Of course casting to int should be replaced with something more appropriated.


In principle it seems like this ought to be doable in one pass. The usual wrinkle is that this requires an ad-hoc tuple or pair, in this case with a Route and a count. Since Java lacks these, we end up using an Object array of length 2 (as shown in Tagir Valeev's answer), or AbstractMap.SimpleImmutableEntry, or a hypothetical Pair<A,B> class.

The alternative is to write a little value class that holds a Route and a count. Of course there's some pain in doing this, but in this case I think it pays off because it provides a place to put the combining logic. That in turn simplifies the stream operation.

Here's the value class containing a Route and a count:

class RouteCount {
    final Route route;
    final long count;

    private RouteCount(Route r, long c) {
        this.route = r;
        count = c;
    }

    public static RouteCount fromRoute(Route r) {
        return new RouteCount(r, 1L);
    }

    public static RouteCount combine(RouteCount rc1, RouteCount rc2) {
        Route recent;
        if (rc1.route.getLastUpdated() > rc2.route.getLastUpdated()) {
            recent = rc1.route;
        } else {
            recent = rc2.route;
        }
        return new RouteCount(recent, rc1.count + rc2.count);
    }
}

Pretty straightforward, but notice the combine method. It combines two RouteCount values by choosing the Route that's been updated more recently and using the sum of the counts. Now that we have this value class, we can write a one-pass stream to get the result we want:

    Map<Route, RouteCount> counted = routes.stream()
        .collect(groupingBy(route -> route,
                    collectingAndThen(
                        mapping(RouteCount::fromRoute, reducing(RouteCount::combine)),
                        Optional::get)));

Like other answers, this groups the routes into equivalence classes based on the starting and ending cell. The actual Route instance used as the key isn't significant; it's just a representative of its class. The value will be a single RouteCount that contains the Route instance that has been updated most recently, along with the count of equivalent Route instances.

The way this works is that each Route instance that has the same start and end cells is then fed into the downstream collector of groupingBy. This mapping collector maps the Route instance into a RouteCount instance, then passes it to a reducing collector that reduces the instances using the combining logic described above. The and-then portion of collectingAndThen extracts the value from the Optional<RouteCount> that the reducing collector produces.

(Normally a bare get is dangerous, but we don't get to this collector at all unless there's at least one value available. So get is safe in this case.)