Why can't your switch statement data type be long, Java?

Here's an excerpt from Sun's Java tutorials:

A switch works with the byte, short, char, and int primitive data types. It also works with enumerated types (discussed in Classes and Inheritance) and a few special classes that "wrap" certain primitive types: Character, Byte, Short, and Integer (discussed in Simple Data Objects).

There must be a good reason why the long primitive data type is not allowed. Anyone know what it is?


I think to some extent it was probably an arbitrary decision based on typical use of switch.

A switch can essentially be implemented in two ways (or in principle, a combination): for a small number of cases, or ones whose values are widely dispersed, a switch essentially becomes the equivalent of a series of ifs on a temporary variable (the value being switched on must only be evaluated once). For a moderate number of cases that are more or less consecutive in value, a switch table is used (the TABLESWITCH instruction in Java), whereby the location to jump to is effectively looked up in a table.

Either of these methods could in principle use a long value rather than an integer. But I think it was probably just a practical decision to balance up the complexity of the instruction set and compiler with actual need: the cases where you really need to switch over a long are rare enough that it's acceptable to have to re-write as a series of IF statements, or work round in some other way (if the long values in question are close together, you can in your Java code switch over the int result of subtracting the lowest value).


Because they didn't implement the necessary instructions in the bytecode and you really don't want to write that many cases, no matter how "production ready" your code is...

[EDIT: Extracted from comments on this answer, with some additions on background]

To be exact, 2³² is a lot of cases and any program with a method long enough to hold more than that is going to be utterly horrendous! In any language. (The longest function I know of in any code in any language is a little over 6k SLOC – yes, it's a big switch – and it's really unmanageable.) If you're really stuck with having a long where you should have only an int or less, then you've got two real alternatives.

  1. Use some variant on the theme of hash functions to compress the long into an int. The simplest one, only for use when you've got the type wrong, is to just cast! More useful would be to do this:

    (int) ((x&0xFFFFFFFF) ^ ((x >>> 32) & 0xFFFFFFFF))
    

    before switching on the result. You'll have to work out how to transform the cases that you're testing against too. But really, that's still horrible since it doesn't address the real problem of lots of cases.

  2. A much better solution if you're working with very large numbers of cases is to change your design to using a Map<Long,Runnable> or something similar so that you're looking up how to dispatch a particular value. This allows you to separate the cases into multiple files, which is much easier to manage when the case-count gets large, though it does get more complex to organize the registration of the host of implementation classes involved (annotations might help by allowing you to build the registration code automatically).

    FWIW, I did this many years ago (we switched to the newly-released J2SE 1.2 part way through the project) when building a custom bytecode engine for simulating massively parallel hardware (no, reusing the JVM would not have been suitable due to the radically different value and execution models involved) and it enormously simplified the code relative to the big switch that the C version of the code was using.

To reiterate the take-home message, wanting to switch on a long is an indication that either you've got the types wrong in your program or that you're building a system with that much variation involved that you should be using classes. Time for a rethink in either case.


Because the lookup table index must be 32 bits.