Protocol buffers - unique numbered tag - clarification?

I'm using protocol buffers and everything is working fine. except that the fact that I don't understand - why do I need the numbered tags in the proto file :

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3;
}

Sure I've read the docs :

As you can see, each field in the message definition has a unique numbered tag. These tags are used to identify your fields in the message binary format, and should not be changed once your message type is in use.

I didn't understand what difference does it make if I change it . ( I will create a new proto and compile it - so why does it care ?)

Another article states that :

Numbered fields in proto definitions obviate the need for version checks which is one of the explicitly stated motivations for the design and implementation of Protocol Buffers. As the developer documentation states, the protocol was designed in part to avoid “ugly code” like this for checking protocol versions:

if (version == 3) {
  ...
} else if (version > 4) {
  if (version == 5) {
    ...
  }
  ...
}

Question

Is it just me or it is completely unclear ?

let me ask it in a different way :

If I have a proto file like the above file , and then I change it to :

message SearchRequest {
  required string query = 3; //reversed order
  optional int32 page_number = 2;
  optional int32 result_per_page = 1;
}

What does it care ? I re-compile and add the file ( i've done it multiple times in the last week).

what am I missing ? can you please supply a human-to human explanation for this numbered tags ?


Solution 1:

The numbered tags are used to match fields when serializing and deserializing the data.

Obviously, if you change the numbering scheme, and apply this change to both serializer and deserializer, there is no issue.

Consider though, if you saved data with the first numbering scheme, and loaded it with the second one, it would try to load query into result_per_page, and deserialization would likely fail.

Now, why is this useful? Let's say you need to add another field to your data, long after the schema is already in use:

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3;
  optional int32 new_data = 4;
}

Because you explicitly give it a number, your deserializer is still able to load data serialized with the old numbering scheme, ignoring deserialization of non-existent data.

Solution 2:

These field numbers are used by protobuf while encoding and decoding. See here for more details.

So each and every field has wire type so int32 has wire type as 0 and your field number say is 2 so it will be encoded as 0001 0000 i.e. 10 in hex.

And later on when its decoded, its left shifted by 1 which makes it as 001 0000 and last three lsb decides wire type i.e. it then makes out its of type int field and rest decides which field in proto it is i.e. 00010 is 2. So field 2 of wire type 0 (int)