How does the Minecraft NBT tag format work?

I'm having problem understanding the NBT tag format.

I know about the TAG_INT and TAG_CHAR. In fact, I have done some basic programming.

I want to create an Inventory editor for Minecraft. But the world data file is in a binary format and I can't find a program (with source code for C) that can help me translate all this data.

How does NBT format work?


In order to make an inventory editor you don't really need to know how the binary structure of NBT is as you could just use one of the existing NBT libraries.

For example, check the following website that lists some libraries for 11 different popular programming languages.

That website also pretty much sums up the basics of the NBT system.

If you however want to build it from "nothing" you should keep in mind you can encounter NBT data that is in different formats (Uncompressed or compressed with either gzip or zlib).

As far as I am aware a NBT file normally always starts with 1 Compound element however this is not required by the specifications.

Each element starts with 1 byte, which specifies the type (also called the tag type).

id    name
0     TAG_End
1     TAG_Byte
2     TAG_Short
3     TAG_Int
4     TAG_Long
5     TAG_Float
6     TAG_Double
7     TAG_Byte_Array
8     TAG_String
9     TAG_List
10    TAG_Compound
11    TAG_Int_Array

Every tag/element except for the End tag has a name. When it has a name, the tag id byte is followed by 2 bytes (big endian, e.g. 00 0A (hex) means length 10) specifying the length of the string. This length is then followed by N bytes, these bytes are the bytes of the string.

These name bytes are then followed by the actual data of the tag. The data of TAG_Byte, TAG_Short, TAG_Int, TAG_Long are big-endian numbers stored in respectively 1,2,4 and 8 bytes. Note: Java has no unsigned data types, so assume these are signed types

The TAG_Float and TAG_Double are 4 and 8 bytes. According to 1 they are stored as big endian IEEE-754 single/double precision floating point numbers. How to parse these might depend on your programming language of choice.

The data of the array tags (TAG_Byte_Array/TAG_Int_Array) start with a 4-byte 32-bit integer which indicates the length of the array. After the length it contains C*N bytes, where N is the length read and C the amount of bytes needed per element (So 1 for bytes, 4 for integers)

The data of TAG_String is 2 bytes (short) indicating the length and then length-bytes for the string characters.

TAG_Compound is essentially a container for multiple nodes. It's data is other tags and all future tags are a child of this tag, until a TAG_End is read.

The TAG_List tag is a list of values of one specific type. It's data contains of 1 byte indicating the type (Refer to the TAG's listed above) followed by 4 bytes specifying the amount of elements. Each element is read by only reading the data section of the associated tag. (So, excluding the TagId byte the name-length and the name characters)

To summarize: Lets specify [NAME_BLOCK] as the 2 bytes containing the length and the bytes (length) containing the characters.

TAG_ID          FORMAT                                                      total length (bytes)                data length
TAG_End         [TAG_ID]                                                    1                                   0
TAG_Byte        [TAG_ID] [NAME_BLOCK] [VALUE]                               4 + name.length                     1
TAG_Short       [TAG_ID] [NAME_BLOCK] [VALUE]                               5 + name.length                     2
TAG_Int         [TAG_ID] [NAME_BLOCK] [VALUE]                               7 + name.length                     4
TAG_Long        [TAG_ID] [NAME_BLOCK] [VALUE]                               11 + name.length                    8
TAG_Float       [TAG_ID] [NAME_BLOCK] [VALUE]                               7 + name.length                     4
TAG_Double      [TAG_ID] [NAME_BLOCK] [VALUE]                               11 + name.length                    8
TAG_String      [TAG_ID] [NAME_BLOCK] [VALUE_LENGTH] [VALUE]                5 + name.length + value.length      2 + value.length
TAG_Byte_Array  [TAG_ID] [NAME_BLOCK] [NUM_ELEMENTS] [ELEMENTS]             7 + name.length + num_elements      4 + (1 * num_elements)
TAG_Int_Array   [TAG_ID] [NAME_BLOCK] [NUM_ELEMENTS] [ELEMENTS]             7 + (4 * num_elements)              4 + (4 * num_elements)
TAG_List        [TAG_ID] [NAME_BLOCK] [TYPE] [NUM_ELEMENTS] [ELEMENTS]      8 + name.length + elements.bytes()  5 + elements.bytes()
TAG_Compound    [TAG_ID] [NAME_BLOCK] [TAGS.....]                           4 + name.length + tags.bytes()      tags.bytes()

A few examples:

bytes               What it is
05                  TAG -> TAG_Int
00 05               Length of name tag => 5-characters
48 65 6C 6C 6F      The characters spelling Hello
00 00 01 02         The value of the Integer tag, 4 bytes (value = 258)

bytes               What it is
10                  TAG -> TAG_Compound
00 04               Length of name tag => 4-characters
48 65 6C 6C 6F      The characters spelling world
05                  TAG -> TAG_Int
00 05               Length of name tag => 5-characters
48 65 6C 6C 6F      The characters spelling Hello
00 00 01 02         The value of the Integer tag, 4 bytes (value = 258)
00                  TAG -> TAG_End, the end of the compound tag body/value. (The int is part of the value of the compound.)

According to this this website the player files are compressed with GZip. Also you might find this page useful

In case you are in need of an example, here is a java example (Only capable of reading, outputs as JSON (with tag-types etc), java 1.8+, requires GSon).

import com.google.gson.*;
import java.io.*;
import java.util.*;
import java.util.zip.GZIPInputStream;

public class SimpleNBTReader {
    interface Helper{
        Object apply(DataInput t) throws Exception;
    }

    static class Node{
        TagType type;
        String name;
        Object value;

        public Node(TagType type,String name, Object value){
            this.type = type;
            this.name = name;
            this.value = value;
        }
    }

    enum TagType {
        TAG_End(s -> null),
        TAG_Byte(DataInput::readByte),
        TAG_Short(DataInput::readShort),
        TAG_Int(DataInput::readInt),
        TAG_Long(DataInput::readLong),
        TAG_Float(DataInput::readFloat),
        TAG_Double(DataInput::readDouble),
        TAG_Byte_Array(in -> {
            int len = in.readInt();
            byte[] bytes = new byte[len];
            in.readFully(bytes);
            return bytes;
        }),
        TAG_String(in -> {
            int len = in.readShort();
            byte[] bytes = new byte[len];
            in.readFully(bytes);
            return new String(bytes,"UTF-8");
        }),
        TAG_List(in ->{
            TagType type = TagType.values()[in.readByte()];
            int len = in.readInt();
            Object[] values = new Object[len];
            for(int i=0; i < len; i++)
                values[i] = type.read(in);
            return values;
        }),
        TAG_Compound(in -> {
            List<Object> values = new LinkedList<>();
            while(true){
                TagType type = TagType.values()[in.readByte()];

                if(type == TagType.TAG_End)
                    break;

                values.add(readTag(type,in));
            }
            return values;
        }),
        TAG_Int_Array(in -> {
            int len = in.readInt();
            int[] values = new int[len];
            for(int i=0; i < len; i++)
                values[i] = in.readInt();
            return values;
        })
        ;
        private Helper body;

        TagType(Helper body) {
            this.body = body;
        }

        public Object read(DataInput in) throws Exception {
            return body.apply(in);
        }
    }

    private static Node readTag(TagType type, DataInput in) throws Exception{
        if(type == TagType.TAG_End)
            throw new Exception("TAG_END has no name data.");

        int nameLength = in.readShort();
        byte[] buffer = new byte[nameLength];
        in.readFully(buffer);
        String name = new String(buffer,"UTF-8");

        return new Node(type,name,type.read(in));
    }

    public static Object read(DataInput in) throws Exception{
        TagType type = TagType.values()[in.readByte()];
        return readTag(type,in);
    }

    public static void main(String[] args) throws Exception {
        Gson gs = (new GsonBuilder()).setPrettyPrinting().create();

        DataInputStream in = new DataInputStream(new GZIPInputStream(new FileInputStream("player_save_file.dat")));
        Object v = read(in);
        System.out.println(gs.toJson(v));
    }
}