kusano 7d535a
LZ4 Frame Format Description
kusano 7d535a
============================
kusano 7d535a
kusano 7d535a
###Notices
kusano 7d535a
kusano 7d535a
Copyright (c) 2013-2015 Yann Collet
kusano 7d535a
kusano 7d535a
Permission is granted to copy and distribute this document 
kusano 7d535a
for any  purpose and without charge, 
kusano 7d535a
including translations into other  languages 
kusano 7d535a
and incorporation into compilations, 
kusano 7d535a
provided that the copyright notice and this notice are preserved, 
kusano 7d535a
and that any substantive changes or deletions from the original 
kusano 7d535a
are clearly marked.
kusano 7d535a
Distribution of this document is unlimited.
kusano 7d535a
kusano 7d535a
###Version
kusano 7d535a
kusano 7d535a
1.5.1 (31/03/2015)
kusano 7d535a
kusano 7d535a
kusano 7d535a
Introduction
kusano 7d535a
------------
kusano 7d535a
kusano 7d535a
The purpose of this document is to define a lossless compressed data format, 
kusano 7d535a
that is independent of CPU type, operating system, 
kusano 7d535a
file system and character set, suitable for 
kusano 7d535a
File compression, Pipe and streaming compression 
kusano 7d535a
using the [LZ4 algorithm](http://www.lz4.info).
kusano 7d535a
kusano 7d535a
The data can be produced or consumed, 
kusano 7d535a
even for an arbitrarily long sequentially presented input data stream,
kusano 7d535a
using only an a priori bounded amount of intermediate storage,
kusano 7d535a
and hence can be used in data communications.
kusano 7d535a
The format uses the LZ4 compression method,
kusano 7d535a
and optional [xxHash-32 checksum method](https://github.com/Cyan4973/xxHash),
kusano 7d535a
for detection of data corruption.
kusano 7d535a
kusano 7d535a
The data format defined by this specification 
kusano 7d535a
does not attempt to allow random access to compressed data.
kusano 7d535a
kusano 7d535a
This specification is intended for use by implementers of software
kusano 7d535a
to compress data into LZ4 format and/or decompress data from LZ4 format.
kusano 7d535a
The text of the specification assumes a basic background in programming
kusano 7d535a
at the level of bits and other primitive data representations.
kusano 7d535a
kusano 7d535a
Unless otherwise indicated below,
kusano 7d535a
a compliant compressor must produce data sets
kusano 7d535a
that conform to the specifications presented here.
kusano 7d535a
It doesn’t need to support all options though.
kusano 7d535a
kusano 7d535a
A compliant decompressor must be able to decompress
kusano 7d535a
at least one working set of parameters
kusano 7d535a
that conforms to the specifications presented here.
kusano 7d535a
It may also ignore checksums.
kusano 7d535a
Whenever it does not support a specific parameter within the compressed stream,
kusano 7d535a
it must produce a non-ambiguous error code
kusano 7d535a
and associated error message explaining which parameter is unsupported.
kusano 7d535a
kusano 7d535a
kusano 7d535a
General Structure of LZ4 Frame format
kusano 7d535a
-------------------------------------
kusano 7d535a
kusano 7d535a
| MagicNb | F. Descriptor | Block | (...) | EndMark | C. Checksum |
kusano 7d535a
|:-------:|:-------------:| ----- | ----- | ------- | ----------- |
kusano 7d535a
| 4 bytes |  3-11 bytes   |       |       | 4 bytes |   4 bytes   | 
kusano 7d535a
kusano 7d535a
__Magic Number__
kusano 7d535a
kusano 7d535a
4 Bytes, Little endian format.
kusano 7d535a
Value : 0x184D2204
kusano 7d535a
kusano 7d535a
__Frame Descriptor__
kusano 7d535a
kusano 7d535a
3 to 11 Bytes, to be detailed in the next part.
kusano 7d535a
Most important part of the spec.
kusano 7d535a
kusano 7d535a
__Data Blocks__
kusano 7d535a
kusano 7d535a
To be detailed later on.
kusano 7d535a
That’s where compressed data is stored.
kusano 7d535a
kusano 7d535a
__EndMark__
kusano 7d535a
kusano 7d535a
The flow of blocks ends when the last data block has a size of “0”.
kusano 7d535a
The size is expressed as a 32-bits value.
kusano 7d535a
kusano 7d535a
__Content Checksum__
kusano 7d535a
kusano 7d535a
Content Checksum verify that the full content has been decoded correctly.
kusano 7d535a
The content checksum is the result 
kusano 7d535a
of [xxh32() hash function](https://github.com/Cyan4973/xxHash)
kusano 7d535a
digesting the original (decoded) data as input, and a seed of zero.
kusano 7d535a
Content checksum is only present when its associated flag
kusano 7d535a
is set in the frame descriptor. 
kusano 7d535a
Content Checksum validates the result,
kusano 7d535a
that all blocks were fully transmitted in the correct order and without error,
kusano 7d535a
and also that the encoding/decoding process itself generated no distortion.
kusano 7d535a
Its usage is recommended.
kusano 7d535a
kusano 7d535a
__Frame Concatenation__
kusano 7d535a
kusano 7d535a
In some circumstances, it may be preferable to append multiple frames,
kusano 7d535a
for example in order to add new data to an existing compressed file
kusano 7d535a
without re-framing it.
kusano 7d535a
kusano 7d535a
In such case, each frame has its own set of descriptor flags.
kusano 7d535a
Each frame is considered independent.
kusano 7d535a
The only relation between frames is their sequential order.
kusano 7d535a
kusano 7d535a
The ability to decode multiple concatenated frames 
kusano 7d535a
within a single stream or file
kusano 7d535a
is left outside of this specification. 
kusano 7d535a
As an example, the reference lz4 command line utility behavior is
kusano 7d535a
to decode all concatenated frames in their sequential order.
kusano 7d535a
kusano 7d535a
 
kusano 7d535a
Frame Descriptor
kusano 7d535a
----------------
kusano 7d535a
kusano 7d535a
| FLG     | BD      | (Content Size) | HC      |
kusano 7d535a
| ------- | ------- |:--------------:| ------- |
kusano 7d535a
| 1 byte  | 1 byte  |  0 - 8 bytes   | 1 byte  | 
kusano 7d535a
kusano 7d535a
The descriptor uses a minimum of 3 bytes,
kusano 7d535a
and up to 11 bytes depending on optional parameters.
kusano 7d535a
kusano 7d535a
__FLG byte__
kusano 7d535a
kusano 7d535a
|  BitNb  |   7-6   |    5    |     4     |   3     |     2     |    1-0   |
kusano 7d535a
| ------- | ------- | ------- | --------- | ------- | --------- | -------- |
kusano 7d535a
|FieldName| Version | B.Indep | B.Checksum| C.Size  | C.Checksum|*Reserved*|
kusano 7d535a
kusano 7d535a
kusano 7d535a
__BD byte__
kusano 7d535a
kusano 7d535a
|  BitNb  |     7    |     6-5-4    |  3-2-1-0 |
kusano 7d535a
| ------- | -------- | ------------ | -------- |
kusano 7d535a
|FieldName|*Reserved*| Block MaxSize|*Reserved*|
kusano 7d535a
kusano 7d535a
In the tables, bit 7 is highest bit, while bit 0 is lowest.
kusano 7d535a
kusano 7d535a
__Version Number__
kusano 7d535a
kusano 7d535a
2-bits field, must be set to “01”.
kusano 7d535a
Any other value cannot be decoded by this version of the specification.
kusano 7d535a
Other version numbers will use different flag layouts.
kusano 7d535a
kusano 7d535a
__Block Independence flag__
kusano 7d535a
kusano 7d535a
If this flag is set to “1”, blocks are independent. 
kusano 7d535a
If this flag is set to “0”, each block depends on previous ones
kusano 7d535a
(up to LZ4 window size, which is 64 KB).
kusano 7d535a
In such case, it’s necessary to decode all blocks in sequence.
kusano 7d535a
kusano 7d535a
Block dependency improves compression ratio, especially for small blocks.
kusano 7d535a
On the other hand, it makes direct jumps or multi-threaded decoding impossible.
kusano 7d535a
kusano 7d535a
__Block checksum flag__
kusano 7d535a
kusano 7d535a
If this flag is set, each data block will be followed by a 4-bytes checksum,
kusano 7d535a
calculated by using the xxHash-32 algorithm on the raw (compressed) data block.
kusano 7d535a
The intention is to detect data corruption (storage or transmission errors) 
kusano 7d535a
immediately, before decoding.
kusano 7d535a
Block checksum usage is optional.
kusano 7d535a
kusano 7d535a
__Content Size flag__
kusano 7d535a
kusano 7d535a
If this flag is set, the uncompressed size of data included within the frame 
kusano 7d535a
will be present as an 8 bytes unsigned little endian value, after the flags.
kusano 7d535a
Content Size usage is optional.
kusano 7d535a
kusano 7d535a
__Content checksum flag__
kusano 7d535a
kusano 7d535a
If this flag is set, a content checksum will be appended after the EndMark.
kusano 7d535a
kusano 7d535a
Recommended value : “1” (content checksum is present)
kusano 7d535a
kusano 7d535a
__Block Maximum Size__
kusano 7d535a
kusano 7d535a
This information is intended to help the decoder allocate memory.
kusano 7d535a
Size here refers to the original (uncompressed) data size.
kusano 7d535a
Block Maximum Size is one value among the following table :
kusano 7d535a
kusano 7d535a
|  0  |  1  |  2  |  3  |   4   |   5    |  6   |  7   | 
kusano 7d535a
| --- | --- | --- | --- | ----- | ------ | ---- | ---- | 
kusano 7d535a
| N/A | N/A | N/A | N/A | 64 KB | 256 KB | 1 MB | 4 MB | 
kusano 7d535a
kusano 7d535a
The decoder may refuse to allocate block sizes above a (system-specific) size.
kusano 7d535a
Unused values may be used in a future revision of the spec.
kusano 7d535a
A decoder conformant to the current version of the spec
kusano 7d535a
is only able to decode blocksizes defined in this spec.
kusano 7d535a
kusano 7d535a
__Reserved bits__
kusano 7d535a
kusano 7d535a
Value of reserved bits **must** be 0 (zero).
kusano 7d535a
Reserved bit might be used in a future version of the specification,
kusano 7d535a
typically enabling new optional features.
kusano 7d535a
If this happens, a decoder respecting the current version of the specification
kusano 7d535a
shall not be able to decode such a frame.
kusano 7d535a
kusano 7d535a
__Content Size__
kusano 7d535a
kusano 7d535a
This is the original (uncompressed) size.
kusano 7d535a
This information is optional, and only present if the associated flag is set.
kusano 7d535a
Content size is provided using unsigned 8 Bytes, for a maximum of 16 HexaBytes.
kusano 7d535a
Format is Little endian.
kusano 7d535a
This value is informational, typically for display or memory allocation.
kusano 7d535a
It can be skipped by a decoder, or used to validate content correctness.
kusano 7d535a
kusano 7d535a
__Header Checksum__
kusano 7d535a
kusano 7d535a
One-byte checksum of combined descriptor fields, including optional ones.
kusano 7d535a
The value is the second byte of xxh32() : ` (xxh32()>>8) & 0xFF `
kusano 7d535a
using zero as a seed,
kusano 7d535a
and the full Frame Descriptor as an input
kusano 7d535a
(including optional fields when they are present).
kusano 7d535a
A wrong checksum indicates an error in the descriptor.
kusano 7d535a
Header checksum is informational and can be skipped.
kusano 7d535a
kusano 7d535a
kusano 7d535a
Data Blocks
kusano 7d535a
-----------
kusano 7d535a
kusano 7d535a
| Block Size |  data  | (Block Checksum) |
kusano 7d535a
|:----------:| ------ |:----------------:|
kusano 7d535a
|  4 bytes   |        |   0 - 4 bytes    | 
kusano 7d535a
kusano 7d535a
kusano 7d535a
__Block Size__
kusano 7d535a
kusano 7d535a
This field uses 4-bytes, format is little-endian.
kusano 7d535a
kusano 7d535a
The highest bit is “1” if data in the block is uncompressed.
kusano 7d535a
kusano 7d535a
The highest bit is “0” if data in the block is compressed by LZ4.
kusano 7d535a
kusano 7d535a
All other bits give the size, in bytes, of the following data block
kusano 7d535a
(the size does not include the block checksum if present).
kusano 7d535a
kusano 7d535a
Block Size shall never be larger than Block Maximum Size.
kusano 7d535a
Such a thing could happen for incompressible source data. 
kusano 7d535a
In such case, such a data block shall be passed in uncompressed format.
kusano 7d535a
kusano 7d535a
__Data__
kusano 7d535a
kusano 7d535a
Where the actual data to decode stands.
kusano 7d535a
It might be compressed or not, depending on previous field indications.
kusano 7d535a
Uncompressed size of Data can be any size, up to “block maximum size”.
kusano 7d535a
Note that data block is not necessarily full : 
kusano 7d535a
an arbitrary “flush” may happen anytime. Any block can be “partially filled”.
kusano 7d535a
kusano 7d535a
__Block checksum__
kusano 7d535a
kusano 7d535a
Only present if the associated flag is set.
kusano 7d535a
This is a 4-bytes checksum value, in little endian format,
kusano 7d535a
calculated by using the xxHash-32 algorithm on the raw (undecoded) data block,
kusano 7d535a
and a seed of zero.
kusano 7d535a
The intention is to detect data corruption (storage or transmission errors) 
kusano 7d535a
before decoding.
kusano 7d535a
kusano 7d535a
Block checksum is cumulative with Content checksum.
kusano 7d535a
kusano 7d535a
kusano 7d535a
Skippable Frames
kusano 7d535a
----------------
kusano 7d535a
kusano 7d535a
| Magic Number | Frame Size | User Data |
kusano 7d535a
|:------------:|:----------:| --------- |
kusano 7d535a
|   4 bytes    |  4 bytes   |           | 
kusano 7d535a
kusano 7d535a
Skippable frames allow the integration of user-defined data
kusano 7d535a
into a flow of concatenated frames.
kusano 7d535a
Its design is pretty straightforward,
kusano 7d535a
with the sole objective to allow the decoder to quickly skip 
kusano 7d535a
over user-defined data and continue decoding.
kusano 7d535a
kusano 7d535a
For the purpose of facilitating identification,
kusano 7d535a
it is discouraged to start a flow of concatenated frames with a skippable frame.
kusano 7d535a
If there is a need to start such a flow with some user data
kusano 7d535a
encapsulated into a skippable frame,
kusano 7d535a
it’s recommended to start with a zero-byte LZ4 frame
kusano 7d535a
followed by a skippable frame.
kusano 7d535a
This will make it easier for file type identifiers.
kusano 7d535a
kusano 7d535a
 
kusano 7d535a
__Magic Number__
kusano 7d535a
kusano 7d535a
4 Bytes, Little endian format.
kusano 7d535a
Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
kusano 7d535a
All 16 values are valid to identify a skippable frame.
kusano 7d535a
kusano 7d535a
__Frame Size__ 
kusano 7d535a
kusano 7d535a
This is the size, in bytes, of the following User Data
kusano 7d535a
(without including the magic number nor the size field itself).
kusano 7d535a
4 Bytes, Little endian format, unsigned 32-bits.
kusano 7d535a
This means User Data can’t be bigger than (2^32-1) Bytes.
kusano 7d535a
kusano 7d535a
__User Data__
kusano 7d535a
kusano 7d535a
User Data can be anything. Data will just be skipped by the decoder.
kusano 7d535a
kusano 7d535a
kusano 7d535a
Legacy frame
kusano 7d535a
------------
kusano 7d535a
kusano 7d535a
The Legacy frame format was defined into the initial versions of “LZ4Demo”.
kusano 7d535a
Newer compressors should not use this format anymore, as it is too restrictive.
kusano 7d535a
kusano 7d535a
Main characteristics of the legacy format :
kusano 7d535a
kusano 7d535a
- Fixed block size : 8 MB.
kusano 7d535a
- All blocks must be completely filled, except the last one.
kusano 7d535a
- All blocks are always compressed, even when compression is detrimental.
kusano 7d535a
- The last block is detected either because 
kusano 7d535a
  it is followed by the “EOF” (End of File) mark,
kusano 7d535a
  or because it is followed by a known Frame Magic Number.
kusano 7d535a
- No checksum
kusano 7d535a
- Convention is Little endian
kusano 7d535a
kusano 7d535a
| MagicNb | B.CSize | CData | B.CSize | CData |  (...)  | EndMark |
kusano 7d535a
| ------- | ------- | ----- | ------- | ----- | ------- | ------- |
kusano 7d535a
| 4 bytes | 4 bytes | CSize | 4 bytes | CSize | x times |   EOF   |
kusano 7d535a
kusano 7d535a
kusano 7d535a
__Magic Number__
kusano 7d535a
kusano 7d535a
4 Bytes, Little endian format.
kusano 7d535a
Value : 0x184C2102
kusano 7d535a
kusano 7d535a
__Block Compressed Size__
kusano 7d535a
kusano 7d535a
This is the size, in bytes, of the following compressed data block.
kusano 7d535a
4 Bytes, Little endian format.
kusano 7d535a
kusano 7d535a
__Data__
kusano 7d535a
kusano 7d535a
Where the actual compressed data stands.
kusano 7d535a
Data is always compressed, even when compression is detrimental.
kusano 7d535a
kusano 7d535a
__EndMark__
kusano 7d535a
kusano 7d535a
End of legacy frame is implicit only.
kusano 7d535a
It must be followed by a standard EOF (End Of File) signal,
kusano 7d535a
wether it is a file or a stream.
kusano 7d535a
kusano 7d535a
Alternatively, if the frame is followed by a valid Frame Magic Number,
kusano 7d535a
it is considered completed.
kusano 7d535a
It makes legacy frames compatible with frame concatenation.
kusano 7d535a
kusano 7d535a
Any other value will be interpreted as a block size,
kusano 7d535a
and trigger an error if it does not fit within acceptable range.
kusano 7d535a
kusano 7d535a
kusano 7d535a
Version changes
kusano 7d535a
---------------
kusano 7d535a
kusano 7d535a
1.5.1 : changed format to MarkDown compatible
kusano 7d535a
kusano 7d535a
1.5 : removed Dictionary ID from specification
kusano 7d535a
kusano 7d535a
1.4.1 : changed wording from “stream” to “frame”
kusano 7d535a
kusano 7d535a
1.4 : added skippable streams, re-added stream checksum
kusano 7d535a
kusano 7d535a
1.3 : modified header checksum
kusano 7d535a
kusano 7d535a
1.2 : reduced choice of “block size”, to postpone decision on “dynamic size of BlockSize Field”.
kusano 7d535a
kusano 7d535a
1.1 : optional fields are now part of the descriptor
kusano 7d535a
kusano 7d535a
1.0 : changed “block size” specification, adding a compressed/uncompressed flag
kusano 7d535a
kusano 7d535a
0.9 : reduced scale of “block maximum size” table
kusano 7d535a
kusano 7d535a
0.8 : removed : high compression flag
kusano 7d535a
kusano 7d535a
0.7 : removed : stream checksum
kusano 7d535a
kusano 7d535a
0.6 : settled : stream size uses 8 bytes, endian convention is little endian
kusano 7d535a
kusano 7d535a
0.5: added copyright notice
kusano 7d535a
kusano 7d535a
0.4 : changed format to Google Doc compatible OpenDocument