What does a zlib header look like?
Solution 1:
zlib magic headers
78 01 - No Compression/low
78 9C - Default Compression
78 DA - Best Compression
Solution 2:
Link to RFC
0 1
+---+---+
|CMF|FLG|
+---+---+
CMF (Compression Method and flags) This byte is divided into a 4-bit compression method and a 4- bit information field depending on the compression method.
bits 0 to 3 CM Compression method
bits 4 to 7 CINFO Compression info
CM (Compression method)
This identifies the compression method used in the file. CM = 8
denotes the "deflate" compression method with a window size up
to 32K. This is the method used by gzip and PNG and almost everything else.
CM = 15 is reserved.
CINFO (Compression info) For CM = 8, CINFO is the base-2 logarithm of the LZ77 window size, minus eight (CINFO=7 indicates a 32K window size). Values of CINFO above 7 are not allowed in this version of the specification. CINFO is not defined in this specification for CM not equal to 8.
In practice, this means the first byte is almost always 78
(hex)
FLG (FLaGs) This flag byte is divided as follows:
bits 0 to 4 FCHECK (check bits for CMF and FLG)
bit 5 FDICT (preset dictionary)
bits 6 to 7 FLEVEL (compression level)
The FCHECK value must be such that CMF and FLG, when viewed as a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG), is a multiple of 31.
FLEVEL (Compression level)
These flags are available for use by specific compression
methods. The "deflate" method (CM = 8
) sets these flags as
follows:
0 - compressor used fastest algorithm
1 - compressor used fast algorithm
2 - compressor used default algorithm
3 - compressor used maximum compression, slowest algorithm
Solution 3:
ZLIB/GZIP headers
Level | ZLIB | GZIP
1 | 78 01 | 1F 8B
2 | 78 5E | 1F 8B
3 | 78 5E | 1F 8B
4 | 78 5E | 1F 8B
5 | 78 5E | 1F 8B
6 | 78 9C | 1F 8B
7 | 78 DA | 1F 8B
8 | 78 DA | 1F 8B
9 | 78 DA | 1F 8B
Deflate doesn't have common headers
Solution 4:
Following is the Zlib compressed data format.
+---+---+
|CMF|FLG| (2 bytes - Defines the compression mode - More details below)
+---+---+
+---+---+---+---+
| DICTID | (4 bytes. Present only when FLG.FDICT is set.) - Mostly not set
+---+---+---+---+
+=====================+
|...compressed data...| (variable size of data)
+=====================+
+---+---+---+---+
| ADLER32 | (4 bytes of checksum)
+---+---+---+---+
Mostly, FLG.FDICT
(Dictionary flag) is not set. In such cases the DICTID
is simply not present. So, the total hear is just 2 bytes.
The header values(CMF
and FLG
) with no dictionary are defined as follows.
CMF | FLG
0x78 | 0x01 - No Compression/low
0x78 | 0x9C - Default Compression
0x78 | 0xDA - Best Compression
More at ZLIB RFC
Solution 5:
The ZLIB header (as defined in RFC1950) is a 16-bit, big-endian value. It contains these fields from most to least significant:
-
CINFO
(bits 12-15)
Indicates the window size as a power of two, from0
(256 bytes) to7
(32768 bytes). This will usually be7
. Higher values are not allowed. -
CM
(bits 8-11)
The compression method. Only Deflate (8
) is allowed. -
FLEVEL
(bits 6-7)
Roughly indicates the compression level, from0
(fast/low) to3
(slow/high) -
FDICT
(bit 5)
Indicates whether a preset dictionary is used. This is usually0
. (1
is technically allowed, but I don't know of any Deflate formats that define preset dictionaries.) -
FCHECK
(bits 0-4)
A checksum (5 bits,0
..31
), whose value is calculated such that the entire value divides 31 with no remainder.*
Typically, only the CINFO
and FLEVEL
fields can be freely changed, and FCHECK
must be calculated based on the final value. Assuming no preset dictionary, there is no choice in what the other fields contain, so a total of 32 possible headers are valid. Here they are:
FLEVEL: 0 1 2 3
CINFO:
0 08 1D 08 5B 08 99 08 D7
1 18 19 18 57 18 95 18 D3
2 28 15 28 53 28 91 28 CF
3 38 11 38 4F 38 8D 38 CB
4 48 0D 48 4B 48 89 48 C7
5 58 09 58 47 58 85 58 C3
6 68 05 68 43 68 81 68 DE
7 78 01 78 5E 78 9C 78 DA
The CINFO
field is rarely, if ever, set by compressors to be anything other than 7
(indicating the maximum 32KB window), so the only values you are likely to see in the wild are the four in the bottom row (beginning with 78
).
* (You might wonder if there's a small amount of leeway on the value of FCHECK
- could it be set to either of 0 or 31 if both pass the checksum? In practice though, this can only occur if FDICT=1
, so it doesn't feature in the above table.)