What is boxing and unboxing and what are the trade offs?

I'm looking for a clear, concise and accurate answer.

Ideally as the actual answer, although links to good explanations welcome.


Solution 1:

Boxed values are data structures that are minimal wrappers around primitive types*. Boxed values are typically stored as pointers to objects on the heap.

Thus, boxed values use more memory and take at minimum two memory lookups to access: once to get the pointer, and another to follow that pointer to the primitive. Obviously this isn't the kind of thing you want in your inner loops. On the other hand, boxed values typically play better with other types in the system. Since they are first-class data structures in the language, they have the expected metadata and structure that other data structures have.

In Java and Haskell generic collections can't contain unboxed values. Generic collections in .NET can hold unboxed values with no penalties. Where Java's generics are only used for compile-time type checking, .NET will generate specific classes for each generic type instantiated at run time.

Java and Haskell have unboxed arrays, but they're distinctly less convenient than the other collections. However, when peak performance is needed it's worth a little inconvenience to avoid the overhead of boxing and unboxing.

* For this discussion, a primitive value is any that can be stored on the call stack, rather than stored as a pointer to a value on the heap. Frequently that's just the machine types (ints, floats, etc), structs, and sometimes static sized arrays. .NET-land calls them value types (as opposed to reference types). Java folks call them primitive types. Haskellions just call them unboxed.

** I'm also focusing on Java, Haskell, and C# in this answer, because that's what I know. For what it's worth, Python, Ruby, and Javascript all have exclusively boxed values. This is also known as the "Everything is an object" approach***.

*** Caveat: A sufficiently advanced compiler / JIT can in some cases actually detect that a value which is semantically boxed when looking at the source, can safely be an unboxed value at runtime. In essence, thanks to brilliant language implementors your boxes are sometimes free.

Solution 2:

from C# 3.0 In a Nutshell:

Boxing is the act of casting a value type into a reference type:

int x = 9; 
object o = x; // boxing the int

unboxing is... the reverse:

// unboxing o
object o = 9; 
int x = (int)o; 

Solution 3:

Boxing & unboxing is the process of converting a primitive value into an object oriented wrapper class (boxing), or converting a value from an object oriented wrapper class back to the primitive value (unboxing).

For example, in java, you may need to convert an int value into an Integer (boxing) if you want to store it in a Collection because primitives can't be stored in a Collection, only objects. But when you want to get it back out of the Collection you may want to get the value as an int and not an Integer so you would unbox it.

Boxing and unboxing is not inherently bad, but it is a tradeoff. Depending on the language implementation, it can be slower and more memory intensive than just using primitives. However, it may also allow you to use higher level data structures and achieve greater flexibility in your code.

These days, it is most commonly discussed in the context of Java's (and other language's) "autoboxing/autounboxing" feature. Here is a java centric explanation of autoboxing.

Solution 4:

In .Net:

Often you can't rely on what the type of variable a function will consume, so you need to use an object variable which extends from the lowest common denominator - in .Net this is object.

However object is a class and stores its contents as a reference.

List<int> notBoxed = new List<int> { 1, 2, 3 };
int i = notBoxed[1]; // this is the actual value

List<object> boxed = new List<object> { 1, 2, 3 };
int j = (int) boxed[1]; // this is an object that can be 'unboxed' to an int

While both these hold the same information the second list is larger and slower. Each value in the second list is actually a reference to an object that holds the int.

This is called boxed because the int is wrapped by the object. When its cast back the int is unboxed - converted back to it's value.

For value types (i.e. all structs) this is slow, and potentially uses a lot more space.

For reference types (i.e. all classes) this is far less of a problem, as they are stored as a reference anyway.

A further problem with a boxed value type is that it's not obvious that you're dealing with the box, rather than the value. When you compare two structs then you're comparing values, but when you compare two classes then (by default) you're comparing the reference - i.e. are these the same instance?

This can be confusing when dealing with boxed value types:

int a = 7;
int b = 7;

if(a == b) // Evaluates to true, because a and b have the same value

object c = (object) 7;
object d = (object) 7;

if(c == d) // Evaluates to false, because c and d are different instances

It's easy to work around:

if(c.Equals(d)) // Evaluates to true because it calls the underlying int's equals

if(((int) c) == ((int) d)) // Evaluates to true once the values are cast

However it is another thing to be careful of when dealing with boxed values.

Solution 5:

Boxing is the process of conversion of a value type into a reference type. Whereas Unboxing is the conversion of a reference type into a value type.

EX: int i = 123;
    object o = i;// Boxing
    int j = (int)o;// UnBoxing

Value Types are: int, char and structures, enumerations. Reference Types are: Classes,interfaces,arrays,strings and objects