Why do we need serialization if everything is already stored as bits?

From https://docs.oracle.com/javase/tutorial/jndi/objects/serial.html

To serialize an object means to convert its state to a byte stream so that the byte stream can be reverted back into a copy of the object.

Since everything is stored in memory as 0s and 1s, why is there an additional need to deconstruct an object to a form that can be transmitted over a stream? Why is the existing state not good enough for transmission?


Solution 1:

If you find 01000001, how do you know if it is the number 65 or ASCII for A? Or perhaps a frequency in a wav file? It might be color information in a bitmap.
There are so many ways information can be interpreted. You must give the receiver a way of interpreting the information. My first example is perhaps a little silly. Instead, compare a CSV file with a JSON file. Try recreating a JSON structure in a CSV. It won't be pretty.
The same logic lies behind Java's serialization. How do you know what the class definition looks like?

Solution 2:

Main reason: the data representing one instance isn't contiguous.

In

To serialize an object means to convert its state to a byte stream so that the byte stream can be reverted back into a copy of the object.

the main key word is "byte stream", meaning a sequence of bytes.

Your approach matches the concept of languages like Pascal and C, where most data structures finally get represented as a contiguous block of bytes.

With languages like Java, instance fields very often don't contain the field values, but reference other instances containing the values or referencing further instances, and so on. Even a simple class having just one String field ends up in (at least) three different instances:

  • the main instance, holding a reference to the String.
  • a String instance, having a reference to the array of characters,
  • a char[] array holding the contents of the string.

And there's no guarantee whatsoever where these instances get located in memory, most probably not in consecutive locations. So, the in-memory representation indeed consists of bytes, but not in a sequential arrangement.

That's why we talk about "serialization", this process creates a serial stream of bytes out of something that more resembles a web of interlinked elements.