Different generic behaviour when using lambda instead of explicit anonymous inner class

The context

I'm working on a project that is heavily dependent on generic types. One of its key components is the so-called TypeToken, which provides a way of representing generic types at runtime and applying some utility functions on them. To avoid Java's Type Erasure, I'm using the curly brackets notation ({}) to create an automatically generated subclass since this makes the type reifiable.

What TypeToken basically does

This is a strongly simplified version of TypeToken which is way more lenient than the original implementation. However, I'm using this approach so I can make sure that the real problem doesn't lie in one of those utility functions.

public class TypeToken<T> {

    private final Type type;
    private final Class<T> rawType;

    private final int hashCode;


    /* ==== Constructor ==== */

    @SuppressWarnings("unchecked")
    protected TypeToken() {
        ParameterizedType paramType = (ParameterizedType) this.getClass().getGenericSuperclass();
        this.type = paramType.getActualTypeArguments()[0];

        // ...
    } 

When it works

Basically, this implementation works perfectly in almost every situation. It has no problem with handling most types. The following examples work perfectly:

TypeToken<List<String>> token = new TypeToken<List<String>>() {};
TypeToken<List<? extends CharSequence>> token = new TypeToken<List<? extends CharSequence>>() {};

As it doesn't check the types, the implementation above allows every type that the compiler permits, including TypeVariables.

<T> void test() {
    TypeToken<T[]> token = new TypeToken<T[]>() {};
}

In this case, type is a GenericArrayType holding a TypeVariable as its component type. This is perfectly fine.

The weird situation when using lambdas

However, when you initialize a TypeToken inside a lambda expression, things start to change. (The type variable comes from the test function above)

Supplier<TypeToken<T[]>> sup = () -> new TypeToken<T[]>() {};

In this case, type is still a GenericArrayType, but it holds null as its component type.

But if you're creating an anonymous inner class, things start to change again:

Supplier<TypeToken<T[]>> sup = new Supplier<TypeToken<T[]>>() {
        @Override
        public TypeToken<T[]> get() {
            return new TypeToken<T[]>() {};
        }
    };

In this case, the component type again holds the correct value (TypeVariable)

The resulting questions

  1. What happens to the TypeVariable in the lambda-example? Why does the type inference not respect the generic type?
  2. What is the difference between the explicitly-declared and the implicitly-declared example? Is type inference the only difference?
  3. How can I fix this without using the boilerplate explicit declaration? This becomes especially important in unit testing since I want to check whether the constructor throws exceptions or not.

To clarify it a bit: This is not a problem that's "relevant" for the program since I do NOT allow non-resolvable types at all, but it's still an interesting phenomenon I'd like to understand.

My research

Update 1

Meanwhile, I've done some research on this topic. In the Java Language Specification §15.12.2.2 I've found an expression that might have something to do with it - "pertinent to applicability", mentioning "implicitly typed lambda expression" as an exception. Obviously, it's the incorrect chapter, but the expression is used in other places, including the chapter about type inference.

But to be honest: I haven't really figured out yet what all of those operators like := or Fi0 mean what makes it really hard to understand it in detail. I'd be glad if someone could clarify this a bit and if this might be the explanation of the weird behavior.

Update 2

I've thought of that approach again and came to the conclusion, that even if the compiler would remove the type since it's not "pertinent to applicability", it doesn't justify to set the component type to null instead of the most generous type, Object. I cannot think of a single reason why the language designers decided to do so.

Update 3

I've just retested the same code with the latest version of Java (I used 8u191 before). To my regret, this hasn't changed anything, although Java's type inference has been improved...

Update 4

I've requested an entry in the offical Java Bug Database/Tracker a few days ago and it just got accepted. Since the developers who reviewed my report assigned the priority P4 to the bug, it might take a while until it'll be fixed. You can find the report here.

A huge shoutout to Tom Hawtin - tackline for mentioning that this might be an essential bug in the Java SE itself. However, a report by Mike Strobel would probably be way more detailed than mine due to his impressive background knowledge. However, when I wrote the report, Strobel's answer wasn't yet available.


tldr:

  1. There is a bug in javac that records the wrong enclosing method for lambda-embedded inner classes. As a result, type variables on the actual enclosing method cannot be resolved by those inner classes.
  2. There are arguably two sets of bugs in the java.lang.reflect API implementation:
    • Some methods are documented as throwing exceptions when nonexistent types are encountered, but they never do. Instead, they allow null references to propagate.
    • The various Type::toString() overrides currently throw or propagate a NullPointerException when a type cannot be resolved.

The answer has to do with the generic signatures that usually get emitted in class files that make use of generics.

Typically, when you write a class that has one or more generic supertypes, the Java compiler will emit a Signature attribute containing the fully parameterized generic signature(s) of the class's supertype(s). I've written about these before, but the short explanation is this: without them, it would not be possible to consume generic types as generic types unless you happened to have the source code. Due to type erasure, information about type variables gets lost at compilation time. If that information were not included as extra metadata, neither the IDE nor your compiler would know that a type was generic, and you could not use it as such. Nor could the compiler emit the necessary runtime checks to enforce type safety.

javac will emit generic signature metadata for any type or method whose signature contains type variables or a parameterized type, which is why you are able to obtain the original generic supertype information for your anonymous types. For example, the anonymous type created here:

TypeToken<?> token = new TypeToken<List<? extends CharSequence>>() {};

...contains this Signature:

LTypeToken<Ljava/util/List<+Ljava/lang/CharSequence;>;>;

From this, the java.lang.reflection APIs can parse the generic supertype information about your (anonymous) class.

But we already know that this works just fine when the TypeToken is parameterized with concrete types. Let's look at a more relevant example, where its type parameter includes a type variable:

static <F> void test() {
    TypeToken sup = new TypeToken<F[]>() {};
}

Here, we get the following signature:

LTypeToken<[TF;>;

Makes sense, right? Now, let's look at how the java.lang.reflect APIs are able to extract generic supertype information from these signatures. If we peer into Class::getGenericSuperclass(), we see that the first thing it does is call getGenericInfo(). If we haven't called into this method before, a ClassRepository gets instantiated:

private ClassRepository getGenericInfo() {
    ClassRepository genericInfo = this.genericInfo;
    if (genericInfo == null) {
        String signature = getGenericSignature0();
        if (signature == null) {
            genericInfo = ClassRepository.NONE;
        } else {
            // !!!  RELEVANT LINE HERE:  !!!
            genericInfo = ClassRepository.make(signature, getFactory());
        }
        this.genericInfo = genericInfo;
    }
    return (genericInfo != ClassRepository.NONE) ? genericInfo : null;
}

The critical piece here is the call to getFactory(), which expands to:

CoreReflectionFactory.make(this, ClassScope.make(this))

ClassScope is the bit we care about: this provides a resolution scope for type variables. Given a type variable name, the scope gets searched for a matching type variable. If one is not found, the 'outer' or enclosing scope is searched:

public TypeVariable<?> lookup(String name) {
    TypeVariable<?>[] tas = getRecvr().getTypeParameters();
    for (TypeVariable<?> tv : tas) {
        if (tv.getName().equals(name)) {return tv;}
    }
    return getEnclosingScope().lookup(name);
}

And, finally, the key to it all (from ClassScope):

protected Scope computeEnclosingScope() {
    Class<?> receiver = getRecvr();

    Method m = receiver.getEnclosingMethod();
    if (m != null)
        // Receiver is a local or anonymous class enclosed in a method.
        return MethodScope.make(m);

    // ...
}

If a type variable (e.g., F) is not found on the class itself (e.g., the anonymous TypeToken<F[]>), then the next step is to search the enclosing method. If we look at the disassembled anonymous class, we see this attribute:

EnclosingMethod: LambdaTest.test()V

The presence of this attribute means that computeEnclosingScope will produce a MethodScope for the generic method static <F> void test(). Since test declares the type variable W, we find it when we search the enclosing scope.

So, why doesn't it work inside a lambda?

To answer this, we must understand how lambdas get compiled. The body of the lambda gets moved into a synthetic static method. At the point where we declare our lambda, an invokedynamic instruction gets emitted, which causes a TypeToken implementation class to be generated the first time we hit that instruction.

In this example, the static method generated for the lambda body would look something like this (if decompiled):

private static /* synthetic */ Object lambda$test$0() {
    return new LambdaTest$1();
}

...where LambdaTest$1 is your anonymous class. Let's dissassemble that and inspect our attributes:

Signature: LTypeToken<TW;>;
EnclosingMethod: LambdaTest.lambda$test$0()Ljava/lang/Object;

Just like the case where we instantiated an anonymous type outside of a lambda, the signature contains the type variable W. But EnclosingMethod refers to the synthetic method.

The synthetic method lambda$test$0() does not declare type variable W. Moreover, lambda$test$0() is not enclosed by test(), so the declaration of W is not visible inside it. Your anonymous class has a supertype containing a type variable that your the class doesn’t know about because it’s out of scope.

When we call getGenericSuperclass(), the scope hierarchy for LambdaTest$1 does not contain W, so the parser cannot resolve it. Due to how the code is written, this unresolved type variable results in null getting placed in the type parameters of the generic supertype.

Note that, had your lambda had instantiated a type that did not refer to any type variables (e.g., TypeToken<String>) then you would not run into this problem.

Conclusions

(i) There is a bug in javac. The Java Virtual Machine Specification §4.7.7 ("The EnclosingMethod Attribute") states:

It is the responsibility of a Java compiler to ensure that the method identified via the method_index is indeed the closest lexically enclosing method of the class that contains this EnclosingMethod attribute. (emphasis mine)

Currently, javac seems to determine the enclosing method after the lambda rewriter runs its course, and as a result, the EnclosingMethod attribute refers to a method that never even existed in the lexical scope. If EnclosingMethod reported the actual lexically enclosing method, the type variables on that method could be resolved by the lambda-embedded classes, and your code would produce the expected results.

It is arguably also a bug that the signature parser/reifier silently allows a null type argument to be propagated into a ParameterizedType (which, as @tom-hawtin-tackline points out, has ancillary effects like toString() throwing a NPE).

My bug report for the EnclosingMethod issue is now online.

(ii) There are arguably multiple bugs in java.lang.reflect and its supporting APIs.

The method ParameterizedType::getActualTypeArguments() is documented as throwing a TypeNotPresentException when "any of the actual type arguments refers to a non-existent type declaration". That description arguably covers the case where a type variable is not in scope. GenericArrayType::getGenericComponentType() should throw a similar exception when "the underlying array type's type refers to a non-existent type declaration". Currently, neither appears to throw a TypeNotPresentException under any circumstances.

I would also argue that the various Type::toString overrides should merely fill in the canonical name of any unresolved types rather than throwing a NPE or any other exception.

I have submitted a bug report for these reflection-related issues, and I will post the link once it is publicly visible.

Workarounds?

If you need to be able to reference a type variable declared by the enclosing method, then you can't do that with a lambda; you'll have to fall back to the longer anonymous type syntax. However, the lambda version should work in most other cases. You should even be able to reference type variables declared by the enclosing class. For example, these should always work:

class Test<X> {
    void test() {
        Supplier<TypeToken<X>> s1 = () -> new TypeToken<X>() {};
        Supplier<TypeToken<String>> s2 = () -> new TypeToken<String>() {};
        Supplier<TypeToken<List<String>>> s3 = () -> new TypeToken<List<String>>() {};
    }
}

Unfortunately, given that this bug has apparently existed since lambdas were first introduced, and it has not been fixed in the most recent LTS release, you may have to assume the bug remains in your clients’ JDKs long after it gets fixed, assuming it gets fixed at all.


As a workaround, you can move the creation of TypeToken out of lambda to a separate method, and still use lambda instead of fully declared class:

static<T> TypeToken<T[]> createTypeToken() {
    return new TypeToken<T[]>() {};
}

Supplier<TypeToken<T[]>> sup = () -> createTypeToken();