Using Protobuf-net, I suddenly got an exception about an unknown wire-type

Solution 1:

First thing to check:

IS THE INPUT DATA PROTOBUF DATA? If you try and parse another format (json, xml, csv, binary-formatter), or simply broken data (an "internal server error" html placeholder text page, for example), then it won't work.


What is a wire-type?

It is a 3-bit flag that tells it (in broad terms; it is only 3 bits after all) what the next data looks like.

Each field in protocol buffers is prefixed by a header that tells it which field (number) it represents, and what type of data is coming next; this "what type of data" is essential to support the case where unanticipated data is in the stream (for example, you've added fields to the data-type at one end), as it lets the serializer know how to read past that data (or store it for round-trip if required).

What are the different wire-type values and their description?

  • 0: variant-length integer (up to 64 bits) - base-128 encoded with the MSB indicating continuation (used as the default for integer types, including enums)
  • 1: 64-bit - 8 bytes of data (used for double, or electively for long/ulong)
  • 2: length-prefixed - first read an integer using variant-length encoding; this tells you how many bytes of data follow (used for strings, byte[], "packed" arrays, and as the default for child objects properties / lists)
  • 3: "start group" - an alternative mechanism for encoding child objects that uses start/end tags - largely deprecated by Google, it is more expensive to skip an entire child-object field since you can't just "seek" past an unexpected object
  • 4: "end group" - twinned with 3
  • 5: 32-bit - 4 bytes of data (used for float, or electively for int/uint and other small integer types)

I suspect a field is causing the problem, how to debug this?

Are you serializing to a file? The most likely cause (in my experience) is that you have overwritten an existing file, but have not truncated it; i.e. it was 200 bytes; you've re-written it, but with only 182 bytes. There are now 18 bytes of garbage on the end of your stream that is tripping it up. Files must be truncated when re-writing protocol buffers. You can do this with FileMode:

using(var file = new FileStream(path, FileMode.Truncate)) {
    // write
}

or alternatively by SetLength after writing your data:

file.SetLength(file.Position);

Other possible cause

You are (accidentally) deserializing a stream into a different type than what was serialized. It's worth double-checking both sides of the conversation to ensure this is not happening.

Solution 2:

Since the stack trace references this StackOverflow question, I thought I'd point out that you can also receive this exception if you (accidentally) deserialize a stream into a different type than what was serialized. So it's worth double-checking both sides of the conversation to ensure this is not happening.

Solution 3:

This can also be caused by an attempt to write more than one protobuf message to a single stream. The solution is to use SerializeWithLengthPrefix and DeserializeWithLengthPrefix.


Why this happens:

The protobuf specification supports a fairly small number of wire-types (the binary storage formats) and data-types (the .NET etc data-types). Additionally, this is not 1:1, nor is is 1:many or many:1 - a single wire-type can be used for multiple data-types, and a single data-type can be encoded via any of multiple wire-types. As a consequence, you cannot fully understand a protobuf fragment unless you already know the scema, so you know how to interpret each value. When you are, say, reading an Int32 data-type, the supported wire-types might be "varint", "fixed32" and "fixed64", where-as when reading a String data-type, the only supported wire-type is "string".

If there is no compatible map between the data-type and wire-type, then the data cannot be read, and this error is raised.

Now let's look at why this occurs in the scenario here:

[ProtoContract]
public class Data1
{
    [ProtoMember(1, IsRequired=true)]
    public int A { get; set; }
}

[ProtoContract]
public class Data2
{
    [ProtoMember(1, IsRequired = true)]
    public string B { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        var d1 = new Data1 { A = 1};
        var d2 = new Data2 { B = "Hello" };
        var ms = new MemoryStream();
        Serializer.Serialize(ms, d1); 
        Serializer.Serialize(ms, d2);
        ms.Position = 0;
        var d3 = Serializer.Deserialize<Data1>(ms); // This will fail
        var d4 = Serializer.Deserialize<Data2>(ms);
        Console.WriteLine("{0} {1}", d3, d4);
    }
}

In the above, two messages are written directly after each-other. The complication is: protobuf is an appendable format, with append meaning "merge". A protobuf message does not know its own length, so the default way of reading a message is: read until EOF. However, here we have appended two different types. If we read this back, it does not know when we have finished reading the first message, so it keeps reading. When it gets to data from the second message, we find ourselves reading a "string" wire-type, but we are still trying to populate a Data1 instance, for which member 1 is an Int32. There is no map between "string" and Int32, so it explodes.

The *WithLengthPrefix methods allow the serializer to know where each message finishes; so, if we serialize a Data1 and Data2 using the *WithLengthPrefix, then deserialize a Data1 and a Data2 using the *WithLengthPrefix methods, then it correctly splits the incoming data between the two instances, only reading the right value into the right object.

Additionally, when storing heterogeneous data like this, you might want to additionally assign (via *WithLengthPrefix) a different field-number to each class; this provides greater visibility of which type is being deserialized. There is also a method in Serializer.NonGeneric which can then be used to deserialize the data without needing to know in advance what we are deserializing:

// Data1 is "1", Data2 is "2"
Serializer.SerializeWithLengthPrefix(ms, d1, PrefixStyle.Base128, 1);
Serializer.SerializeWithLengthPrefix(ms, d2, PrefixStyle.Base128, 2);
ms.Position = 0;

var lookup = new Dictionary<int,Type> { {1, typeof(Data1)}, {2,typeof(Data2)}};
object obj;
while (Serializer.NonGeneric.TryDeserializeWithLengthPrefix(ms,
    PrefixStyle.Base128, fieldNum => lookup[fieldNum], out obj))
{
    Console.WriteLine(obj); // writes Data1 on the first iteration,
                            // and Data2 on the second iteration
}