kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Network Working Group                                         P. Deutsch
kusano 7d535a
Request for Comments: 1951                           Aladdin Enterprises
kusano 7d535a
Category: Informational                                         May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
        DEFLATE Compressed Data Format Specification version 1.3
kusano 7d535a
kusano 7d535a
Status of This Memo
kusano 7d535a
kusano 7d535a
   This memo provides information for the Internet community.  This memo
kusano 7d535a
   does not specify an Internet standard of any kind.  Distribution of
kusano 7d535a
   this memo is unlimited.
kusano 7d535a
kusano 7d535a
IESG Note:
kusano 7d535a
kusano 7d535a
   The IESG takes no position on the validity of any Intellectual
kusano 7d535a
   Property Rights statements contained in this document.
kusano 7d535a
kusano 7d535a
Notices
kusano 7d535a
kusano 7d535a
   Copyright (c) 1996 L. Peter Deutsch
kusano 7d535a
kusano 7d535a
   Permission is granted to copy and distribute this document for any
kusano 7d535a
   purpose and without charge, including translations into other
kusano 7d535a
   languages and incorporation into compilations, provided that the
kusano 7d535a
   copyright notice and this notice are preserved, and that any
kusano 7d535a
   substantive changes or deletions from the original are clearly
kusano 7d535a
   marked.
kusano 7d535a
kusano 7d535a
   A pointer to the latest version of this and related documentation in
kusano 7d535a
   HTML format can be found at the URL
kusano 7d535a
   <ftp: documents="" ftp.uu.net="" graphics="" png="" zdoc-index.html="" zlib="">.</ftp:>
kusano 7d535a
kusano 7d535a
Abstract
kusano 7d535a
kusano 7d535a
   This specification defines a lossless compressed data format that
kusano 7d535a
   compresses data using a combination of the LZ77 algorithm and Huffman
kusano 7d535a
   coding, with efficiency comparable to the best currently available
kusano 7d535a
   general-purpose compression methods.  The data can be produced or
kusano 7d535a
   consumed, even for an arbitrarily long sequentially presented input
kusano 7d535a
   data stream, using only an a priori bounded amount of intermediate
kusano 7d535a
   storage.  The format can be implemented readily in a manner not
kusano 7d535a
   covered by patents.
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                      [Page 1]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
Table of Contents
kusano 7d535a
kusano 7d535a
   1. Introduction ................................................... 2
kusano 7d535a
      1.1. Purpose ................................................... 2
kusano 7d535a
      1.2. Intended audience ......................................... 3
kusano 7d535a
      1.3. Scope ..................................................... 3
kusano 7d535a
      1.4. Compliance ................................................ 3
kusano 7d535a
      1.5.  Definitions of terms and conventions used ................ 3
kusano 7d535a
      1.6. Changes from previous versions ............................ 4
kusano 7d535a
   2. Compressed representation overview ............................. 4
kusano 7d535a
   3. Detailed specification ......................................... 5
kusano 7d535a
      3.1. Overall conventions ....................................... 5
kusano 7d535a
          3.1.1. Packing into bytes .................................. 5
kusano 7d535a
      3.2. Compressed block format ................................... 6
kusano 7d535a
          3.2.1. Synopsis of prefix and Huffman coding ............... 6
kusano 7d535a
          3.2.2. Use of Huffman coding in the "deflate" format ....... 7
kusano 7d535a
          3.2.3. Details of block format ............................. 9
kusano 7d535a
          3.2.4. Non-compressed blocks (BTYPE=00) ................... 11
kusano 7d535a
          3.2.5. Compressed blocks (length and distance codes) ...... 11
kusano 7d535a
          3.2.6. Compression with fixed Huffman codes (BTYPE=01) .... 12
kusano 7d535a
          3.2.7. Compression with dynamic Huffman codes (BTYPE=10) .. 13
kusano 7d535a
      3.3. Compliance ............................................... 14
kusano 7d535a
   4. Compression algorithm details ................................. 14
kusano 7d535a
   5. References .................................................... 16
kusano 7d535a
   6. Security Considerations ....................................... 16
kusano 7d535a
   7. Source code ................................................... 16
kusano 7d535a
   8. Acknowledgements .............................................. 16
kusano 7d535a
   9. Author's Address .............................................. 17
kusano 7d535a
kusano 7d535a
1. Introduction
kusano 7d535a
kusano 7d535a
   1.1. Purpose
kusano 7d535a
kusano 7d535a
      The purpose of this specification is to define a lossless
kusano 7d535a
      compressed data format that:
kusano 7d535a
          * Is independent of CPU type, operating system, file system,
kusano 7d535a
            and character set, and hence can be used for interchange;
kusano 7d535a
          * Can be produced or consumed, even for an arbitrarily long
kusano 7d535a
            sequentially presented input data stream, using only an a
kusano 7d535a
            priori bounded amount of intermediate storage, and hence
kusano 7d535a
            can be used in data communications or similar structures
kusano 7d535a
            such as Unix filters;
kusano 7d535a
          * Compresses data with efficiency comparable to the best
kusano 7d535a
            currently available general-purpose compression methods,
kusano 7d535a
            and in particular considerably better than the "compress"
kusano 7d535a
            program;
kusano 7d535a
          * Can be implemented readily in a manner not covered by
kusano 7d535a
            patents, and hence can be practiced freely;
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                      [Page 2]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
          * Is compatible with the file format produced by the current
kusano 7d535a
            widely used gzip utility, in that conforming decompressors
kusano 7d535a
            will be able to read data produced by the existing gzip
kusano 7d535a
            compressor.
kusano 7d535a
kusano 7d535a
      The data format defined by this specification does not attempt to:
kusano 7d535a
kusano 7d535a
          * Allow random access to compressed data;
kusano 7d535a
          * Compress specialized data (e.g., raster graphics) as well
kusano 7d535a
            as the best currently available specialized algorithms.
kusano 7d535a
kusano 7d535a
      A simple counting argument shows that no lossless compression
kusano 7d535a
      algorithm can compress every possible input data set.  For the
kusano 7d535a
      format defined here, the worst case expansion is 5 bytes per 32K-
kusano 7d535a
      byte block, i.e., a size increase of 0.015% for large data sets.
kusano 7d535a
      English text usually compresses by a factor of 2.5 to 3;
kusano 7d535a
      executable files usually compress somewhat less; graphical data
kusano 7d535a
      such as raster images may compress much more.
kusano 7d535a
kusano 7d535a
   1.2. Intended audience
kusano 7d535a
kusano 7d535a
      This specification is intended for use by implementors of software
kusano 7d535a
      to compress data into "deflate" format and/or decompress data from
kusano 7d535a
      "deflate" format.
kusano 7d535a
kusano 7d535a
      The text of the specification assumes a basic background in
kusano 7d535a
      programming at the level of bits and other primitive data
kusano 7d535a
      representations.  Familiarity with the technique of Huffman coding
kusano 7d535a
      is helpful but not required.
kusano 7d535a
kusano 7d535a
   1.3. Scope
kusano 7d535a
kusano 7d535a
      The specification specifies a method for representing a sequence
kusano 7d535a
      of bytes as a (usually shorter) sequence of bits, and a method for
kusano 7d535a
      packing the latter bit sequence into bytes.
kusano 7d535a
kusano 7d535a
   1.4. Compliance
kusano 7d535a
kusano 7d535a
      Unless otherwise indicated below, a compliant decompressor must be
kusano 7d535a
      able to accept and decompress any data set that conforms to all
kusano 7d535a
      the specifications presented here; a compliant compressor must
kusano 7d535a
      produce data sets that conform to all the specifications presented
kusano 7d535a
      here.
kusano 7d535a
kusano 7d535a
   1.5.  Definitions of terms and conventions used
kusano 7d535a
kusano 7d535a
      Byte: 8 bits stored or transmitted as a unit (same as an octet).
kusano 7d535a
      For this specification, a byte is exactly 8 bits, even on machines
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                      [Page 3]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
      which store a character on a number of bits different from eight.
kusano 7d535a
      See below, for the numbering of bits within a byte.
kusano 7d535a
kusano 7d535a
      String: a sequence of arbitrary bytes.
kusano 7d535a
kusano 7d535a
   1.6. Changes from previous versions
kusano 7d535a
kusano 7d535a
      There have been no technical changes to the deflate format since
kusano 7d535a
      version 1.1 of this specification.  In version 1.2, some
kusano 7d535a
      terminology was changed.  Version 1.3 is a conversion of the
kusano 7d535a
      specification to RFC style.
kusano 7d535a
kusano 7d535a
2. Compressed representation overview
kusano 7d535a
kusano 7d535a
   A compressed data set consists of a series of blocks, corresponding
kusano 7d535a
   to successive blocks of input data.  The block sizes are arbitrary,
kusano 7d535a
   except that non-compressible blocks are limited to 65,535 bytes.
kusano 7d535a
kusano 7d535a
   Each block is compressed using a combination of the LZ77 algorithm
kusano 7d535a
   and Huffman coding. The Huffman trees for each block are independent
kusano 7d535a
   of those for previous or subsequent blocks; the LZ77 algorithm may
kusano 7d535a
   use a reference to a duplicated string occurring in a previous block,
kusano 7d535a
   up to 32K input bytes before.
kusano 7d535a
kusano 7d535a
   Each block consists of two parts: a pair of Huffman code trees that
kusano 7d535a
   describe the representation of the compressed data part, and a
kusano 7d535a
   compressed data part.  (The Huffman trees themselves are compressed
kusano 7d535a
   using Huffman encoding.)  The compressed data consists of a series of
kusano 7d535a
   elements of two types: literal bytes (of strings that have not been
kusano 7d535a
   detected as duplicated within the previous 32K input bytes), and
kusano 7d535a
   pointers to duplicated strings, where a pointer is represented as a
kusano 7d535a
   pair <length, backward="" distance="">.  The representation used in the</length,>
kusano 7d535a
   "deflate" format limits distances to 32K bytes and lengths to 258
kusano 7d535a
   bytes, but does not limit the size of a block, except for
kusano 7d535a
   uncompressible blocks, which are limited as noted above.
kusano 7d535a
kusano 7d535a
   Each type of value (literals, distances, and lengths) in the
kusano 7d535a
   compressed data is represented using a Huffman code, using one code
kusano 7d535a
   tree for literals and lengths and a separate code tree for distances.
kusano 7d535a
   The code trees for each block appear in a compact form just before
kusano 7d535a
   the compressed data for that block.
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                      [Page 4]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
3. Detailed specification
kusano 7d535a
kusano 7d535a
   3.1. Overall conventions In the diagrams below, a box like this:
kusano 7d535a
kusano 7d535a
         +---+
kusano 7d535a
         |   | <-- the vertical bars might be missing
kusano 7d535a
         +---+
kusano 7d535a
kusano 7d535a
      represents one byte; a box like this:
kusano 7d535a
kusano 7d535a
         +==============+
kusano 7d535a
         |              |
kusano 7d535a
         +==============+
kusano 7d535a
kusano 7d535a
      represents a variable number of bytes.
kusano 7d535a
kusano 7d535a
      Bytes stored within a computer do not have a "bit order", since
kusano 7d535a
      they are always treated as a unit.  However, a byte considered as
kusano 7d535a
      an integer between 0 and 255 does have a most- and least-
kusano 7d535a
      significant bit, and since we write numbers with the most-
kusano 7d535a
      significant digit on the left, we also write bytes with the most-
kusano 7d535a
      significant bit on the left.  In the diagrams below, we number the
kusano 7d535a
      bits of a byte so that bit 0 is the least-significant bit, i.e.,
kusano 7d535a
      the bits are numbered:
kusano 7d535a
kusano 7d535a
         +--------+
kusano 7d535a
         |76543210|
kusano 7d535a
         +--------+
kusano 7d535a
kusano 7d535a
      Within a computer, a number may occupy multiple bytes.  All
kusano 7d535a
      multi-byte numbers in the format described here are stored with
kusano 7d535a
      the least-significant byte first (at the lower memory address).
kusano 7d535a
      For example, the decimal number 520 is stored as:
kusano 7d535a
kusano 7d535a
             0        1
kusano 7d535a
         +--------+--------+
kusano 7d535a
         |00001000|00000010|
kusano 7d535a
         +--------+--------+
kusano 7d535a
          ^        ^
kusano 7d535a
          |        |
kusano 7d535a
          |        + more significant byte = 2 x 256
kusano 7d535a
          + less significant byte = 8
kusano 7d535a
kusano 7d535a
      3.1.1. Packing into bytes
kusano 7d535a
kusano 7d535a
         This document does not address the issue of the order in which
kusano 7d535a
         bits of a byte are transmitted on a bit-sequential medium,
kusano 7d535a
         since the final data format described here is byte- rather than
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                      [Page 5]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
         bit-oriented.  However, we describe the compressed block format
kusano 7d535a
         in below, as a sequence of data elements of various bit
kusano 7d535a
         lengths, not a sequence of bytes.  We must therefore specify
kusano 7d535a
         how to pack these data elements into bytes to form the final
kusano 7d535a
         compressed byte sequence:
kusano 7d535a
kusano 7d535a
             * Data elements are packed into bytes in order of
kusano 7d535a
               increasing bit number within the byte, i.e., starting
kusano 7d535a
               with the least-significant bit of the byte.
kusano 7d535a
             * Data elements other than Huffman codes are packed
kusano 7d535a
               starting with the least-significant bit of the data
kusano 7d535a
               element.
kusano 7d535a
             * Huffman codes are packed starting with the most-
kusano 7d535a
               significant bit of the code.
kusano 7d535a
kusano 7d535a
         In other words, if one were to print out the compressed data as
kusano 7d535a
         a sequence of bytes, starting with the first byte at the
kusano 7d535a
         *right* margin and proceeding to the *left*, with the most-
kusano 7d535a
         significant bit of each byte on the left as usual, one would be
kusano 7d535a
         able to parse the result from right to left, with fixed-width
kusano 7d535a
         elements in the correct MSB-to-LSB order and Huffman codes in
kusano 7d535a
         bit-reversed order (i.e., with the first bit of the code in the
kusano 7d535a
         relative LSB position).
kusano 7d535a
kusano 7d535a
   3.2. Compressed block format
kusano 7d535a
kusano 7d535a
      3.2.1. Synopsis of prefix and Huffman coding
kusano 7d535a
kusano 7d535a
         Prefix coding represents symbols from an a priori known
kusano 7d535a
         alphabet by bit sequences (codes), one code for each symbol, in
kusano 7d535a
         a manner such that different symbols may be represented by bit
kusano 7d535a
         sequences of different lengths, but a parser can always parse
kusano 7d535a
         an encoded string unambiguously symbol-by-symbol.
kusano 7d535a
kusano 7d535a
         We define a prefix code in terms of a binary tree in which the
kusano 7d535a
         two edges descending from each non-leaf node are labeled 0 and
kusano 7d535a
         1 and in which the leaf nodes correspond one-for-one with (are
kusano 7d535a
         labeled with) the symbols of the alphabet; then the code for a
kusano 7d535a
         symbol is the sequence of 0's and 1's on the edges leading from
kusano 7d535a
         the root to the leaf labeled with that symbol.  For example:
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                      [Page 6]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
                          /\              Symbol    Code
kusano 7d535a
                         0  1             ------    ----
kusano 7d535a
                        /    \                A      00
kusano 7d535a
                       /\     B               B       1
kusano 7d535a
                      0  1                    C     011
kusano 7d535a
                     /    \                   D     010
kusano 7d535a
                    A     /\
kusano 7d535a
                         0  1
kusano 7d535a
                        /    \
kusano 7d535a
                       D      C
kusano 7d535a
kusano 7d535a
         A parser can decode the next symbol from an encoded input
kusano 7d535a
         stream by walking down the tree from the root, at each step
kusano 7d535a
         choosing the edge corresponding to the next input bit.
kusano 7d535a
kusano 7d535a
         Given an alphabet with known symbol frequencies, the Huffman
kusano 7d535a
         algorithm allows the construction of an optimal prefix code
kusano 7d535a
         (one which represents strings with those symbol frequencies
kusano 7d535a
         using the fewest bits of any possible prefix codes for that
kusano 7d535a
         alphabet).  Such a code is called a Huffman code.  (See
kusano 7d535a
         reference [1] in Chapter 5, references for additional
kusano 7d535a
         information on Huffman codes.)
kusano 7d535a
kusano 7d535a
         Note that in the "deflate" format, the Huffman codes for the
kusano 7d535a
         various alphabets must not exceed certain maximum code lengths.
kusano 7d535a
         This constraint complicates the algorithm for computing code
kusano 7d535a
         lengths from symbol frequencies.  Again, see Chapter 5,
kusano 7d535a
         references for details.
kusano 7d535a
kusano 7d535a
      3.2.2. Use of Huffman coding in the "deflate" format
kusano 7d535a
kusano 7d535a
         The Huffman codes used for each alphabet in the "deflate"
kusano 7d535a
         format have two additional rules:
kusano 7d535a
kusano 7d535a
             * All codes of a given bit length have lexicographically
kusano 7d535a
               consecutive values, in the same order as the symbols
kusano 7d535a
               they represent;
kusano 7d535a
kusano 7d535a
             * Shorter codes lexicographically precede longer codes.
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                      [Page 7]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
         We could recode the example above to follow this rule as
kusano 7d535a
         follows, assuming that the order of the alphabet is ABCD:
kusano 7d535a
kusano 7d535a
            Symbol  Code
kusano 7d535a
            ------  ----
kusano 7d535a
            A       10
kusano 7d535a
            B       0
kusano 7d535a
            C       110
kusano 7d535a
            D       111
kusano 7d535a
kusano 7d535a
         I.e., 0 precedes 10 which precedes 11x, and 110 and 111 are
kusano 7d535a
         lexicographically consecutive.
kusano 7d535a
kusano 7d535a
         Given this rule, we can define the Huffman code for an alphabet
kusano 7d535a
         just by giving the bit lengths of the codes for each symbol of
kusano 7d535a
         the alphabet in order; this is sufficient to determine the
kusano 7d535a
         actual codes.  In our example, the code is completely defined
kusano 7d535a
         by the sequence of bit lengths (2, 1, 3, 3).  The following
kusano 7d535a
         algorithm generates the codes as integers, intended to be read
kusano 7d535a
         from most- to least-significant bit.  The code lengths are
kusano 7d535a
         initially in tree[I].Len; the codes are produced in
kusano 7d535a
         tree[I].Code.
kusano 7d535a
kusano 7d535a
         1)  Count the number of codes for each code length.  Let
kusano 7d535a
             bl_count[N] be the number of codes of length N, N >= 1.
kusano 7d535a
kusano 7d535a
         2)  Find the numerical value of the smallest code for each
kusano 7d535a
             code length:
kusano 7d535a
kusano 7d535a
                code = 0;
kusano 7d535a
                bl_count[0] = 0;
kusano 7d535a
                for (bits = 1; bits <= MAX_BITS; bits++) {
kusano 7d535a
                    code = (code + bl_count[bits-1]) << 1;
kusano 7d535a
                    next_code[bits] = code;
kusano 7d535a
                }
kusano 7d535a
kusano 7d535a
         3)  Assign numerical values to all codes, using consecutive
kusano 7d535a
             values for all codes of the same length with the base
kusano 7d535a
             values determined at step 2. Codes that are never used
kusano 7d535a
             (which have a bit length of zero) must not be assigned a
kusano 7d535a
             value.
kusano 7d535a
kusano 7d535a
                for (n = 0;  n <= max_code; n++) {
kusano 7d535a
                    len = tree[n].Len;
kusano 7d535a
                    if (len != 0) {
kusano 7d535a
                        tree[n].Code = next_code[len];
kusano 7d535a
                        next_code[len]++;
kusano 7d535a
                    }
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                      [Page 8]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
                }
kusano 7d535a
kusano 7d535a
         Example:
kusano 7d535a
kusano 7d535a
         Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3,
kusano 7d535a
         3, 2, 4, 4).  After step 1, we have:
kusano 7d535a
kusano 7d535a
            N      bl_count[N]
kusano 7d535a
            -      -----------
kusano 7d535a
            2      1
kusano 7d535a
            3      5
kusano 7d535a
            4      2
kusano 7d535a
kusano 7d535a
         Step 2 computes the following next_code values:
kusano 7d535a
kusano 7d535a
            N      next_code[N]
kusano 7d535a
            -      ------------
kusano 7d535a
            1      0
kusano 7d535a
            2      0
kusano 7d535a
            3      2
kusano 7d535a
            4      14
kusano 7d535a
kusano 7d535a
         Step 3 produces the following code values:
kusano 7d535a
kusano 7d535a
            Symbol Length   Code
kusano 7d535a
            ------ ------   ----
kusano 7d535a
            A       3        010
kusano 7d535a
            B       3        011
kusano 7d535a
            C       3        100
kusano 7d535a
            D       3        101
kusano 7d535a
            E       3        110
kusano 7d535a
            F       2         00
kusano 7d535a
            G       4       1110
kusano 7d535a
            H       4       1111
kusano 7d535a
kusano 7d535a
      3.2.3. Details of block format
kusano 7d535a
kusano 7d535a
         Each block of compressed data begins with 3 header bits
kusano 7d535a
         containing the following data:
kusano 7d535a
kusano 7d535a
            first bit       BFINAL
kusano 7d535a
            next 2 bits     BTYPE
kusano 7d535a
kusano 7d535a
         Note that the header bits do not necessarily begin on a byte
kusano 7d535a
         boundary, since a block does not necessarily occupy an integral
kusano 7d535a
         number of bytes.
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                      [Page 9]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
         BFINAL is set if and only if this is the last block of the data
kusano 7d535a
         set.
kusano 7d535a
kusano 7d535a
         BTYPE specifies how the data are compressed, as follows:
kusano 7d535a
kusano 7d535a
            00 - no compression
kusano 7d535a
            01 - compressed with fixed Huffman codes
kusano 7d535a
            10 - compressed with dynamic Huffman codes
kusano 7d535a
            11 - reserved (error)
kusano 7d535a
kusano 7d535a
         The only difference between the two compressed cases is how the
kusano 7d535a
         Huffman codes for the literal/length and distance alphabets are
kusano 7d535a
         defined.
kusano 7d535a
kusano 7d535a
         In all cases, the decoding algorithm for the actual data is as
kusano 7d535a
         follows:
kusano 7d535a
kusano 7d535a
            do
kusano 7d535a
               read block header from input stream.
kusano 7d535a
               if stored with no compression
kusano 7d535a
                  skip any remaining bits in current partially
kusano 7d535a
                     processed byte
kusano 7d535a
                  read LEN and NLEN (see next section)
kusano 7d535a
                  copy LEN bytes of data to output
kusano 7d535a
               otherwise
kusano 7d535a
                  if compressed with dynamic Huffman codes
kusano 7d535a
                     read representation of code trees (see
kusano 7d535a
                        subsection below)
kusano 7d535a
                  loop (until end of block code recognized)
kusano 7d535a
                     decode literal/length value from input stream
kusano 7d535a
                     if value < 256
kusano 7d535a
                        copy value (literal byte) to output stream
kusano 7d535a
                     otherwise
kusano 7d535a
                        if value = end of block (256)
kusano 7d535a
                           break from loop
kusano 7d535a
                        otherwise (value = 257..285)
kusano 7d535a
                           decode distance from input stream
kusano 7d535a
kusano 7d535a
                           move backwards distance bytes in the output
kusano 7d535a
                           stream, and copy length bytes from this
kusano 7d535a
                           position to the output stream.
kusano 7d535a
                  end loop
kusano 7d535a
            while not last block
kusano 7d535a
kusano 7d535a
         Note that a duplicated string reference may refer to a string
kusano 7d535a
         in a previous block; i.e., the backward distance may cross one
kusano 7d535a
         or more block boundaries.  However a distance cannot refer past
kusano 7d535a
         the beginning of the output stream.  (An application using a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                     [Page 10]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
         preset dictionary might discard part of the output stream; a
kusano 7d535a
         distance can refer to that part of the output stream anyway)
kusano 7d535a
         Note also that the referenced string may overlap the current
kusano 7d535a
         position; for example, if the last 2 bytes decoded have values
kusano 7d535a
         X and Y, a string reference with <length 5,="" =="" distance="2"></length>
kusano 7d535a
         adds X,Y,X,Y,X to the output stream.
kusano 7d535a
kusano 7d535a
         We now specify each compression method in turn.
kusano 7d535a
kusano 7d535a
      3.2.4. Non-compressed blocks (BTYPE=00)
kusano 7d535a
kusano 7d535a
         Any bits of input up to the next byte boundary are ignored.
kusano 7d535a
         The rest of the block consists of the following information:
kusano 7d535a
kusano 7d535a
              0   1   2   3   4...
kusano 7d535a
            +---+---+---+---+================================+
kusano 7d535a
            |  LEN  | NLEN  |... LEN bytes of literal data...|
kusano 7d535a
            +---+---+---+---+================================+
kusano 7d535a
kusano 7d535a
         LEN is the number of data bytes in the block.  NLEN is the
kusano 7d535a
         one's complement of LEN.
kusano 7d535a
kusano 7d535a
      3.2.5. Compressed blocks (length and distance codes)
kusano 7d535a
kusano 7d535a
         As noted above, encoded data blocks in the "deflate" format
kusano 7d535a
         consist of sequences of symbols drawn from three conceptually
kusano 7d535a
         distinct alphabets: either literal bytes, from the alphabet of
kusano 7d535a
         byte values (0..255), or <length, backward="" distance=""> pairs,</length,>
kusano 7d535a
         where the length is drawn from (3..258) and the distance is
kusano 7d535a
         drawn from (1..32,768).  In fact, the literal and length
kusano 7d535a
         alphabets are merged into a single alphabet (0..285), where
kusano 7d535a
         values 0..255 represent literal bytes, the value 256 indicates
kusano 7d535a
         end-of-block, and values 257..285 represent length codes
kusano 7d535a
         (possibly in conjunction with extra bits following the symbol
kusano 7d535a
         code) as follows:
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                     [Page 11]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
                 Extra               Extra               Extra
kusano 7d535a
            Code Bits Length(s) Code Bits Lengths   Code Bits Length(s)
kusano 7d535a
            ---- ---- ------     ---- ---- -------   ---- ---- -------
kusano 7d535a
             257   0     3       267   1   15,16     277   4   67-82
kusano 7d535a
             258   0     4       268   1   17,18     278   4   83-98
kusano 7d535a
             259   0     5       269   2   19-22     279   4   99-114
kusano 7d535a
             260   0     6       270   2   23-26     280   4  115-130
kusano 7d535a
             261   0     7       271   2   27-30     281   5  131-162
kusano 7d535a
             262   0     8       272   2   31-34     282   5  163-194
kusano 7d535a
             263   0     9       273   3   35-42     283   5  195-226
kusano 7d535a
             264   0    10       274   3   43-50     284   5  227-257
kusano 7d535a
             265   1  11,12      275   3   51-58     285   0    258
kusano 7d535a
             266   1  13,14      276   3   59-66
kusano 7d535a
kusano 7d535a
         The extra bits should be interpreted as a machine integer
kusano 7d535a
         stored with the most-significant bit first, e.g., bits 1110
kusano 7d535a
         represent the value 14.
kusano 7d535a
kusano 7d535a
                  Extra           Extra               Extra
kusano 7d535a
             Code Bits Dist  Code Bits   Dist     Code Bits Distance
kusano 7d535a
             ---- ---- ----  ---- ----  ------    ---- ---- --------
kusano 7d535a
               0   0    1     10   4     33-48    20    9   1025-1536
kusano 7d535a
               1   0    2     11   4     49-64    21    9   1537-2048
kusano 7d535a
               2   0    3     12   5     65-96    22   10   2049-3072
kusano 7d535a
               3   0    4     13   5     97-128   23   10   3073-4096
kusano 7d535a
               4   1   5,6    14   6    129-192   24   11   4097-6144
kusano 7d535a
               5   1   7,8    15   6    193-256   25   11   6145-8192
kusano 7d535a
               6   2   9-12   16   7    257-384   26   12  8193-12288
kusano 7d535a
               7   2  13-16   17   7    385-512   27   12 12289-16384
kusano 7d535a
               8   3  17-24   18   8    513-768   28   13 16385-24576
kusano 7d535a
               9   3  25-32   19   8   769-1024   29   13 24577-32768
kusano 7d535a
kusano 7d535a
      3.2.6. Compression with fixed Huffman codes (BTYPE=01)
kusano 7d535a
kusano 7d535a
         The Huffman codes for the two alphabets are fixed, and are not
kusano 7d535a
         represented explicitly in the data.  The Huffman code lengths
kusano 7d535a
         for the literal/length alphabet are:
kusano 7d535a
kusano 7d535a
                   Lit Value    Bits        Codes
kusano 7d535a
                   ---------    ----        -----
kusano 7d535a
                     0 - 143     8          00110000 through
kusano 7d535a
                                            10111111
kusano 7d535a
                   144 - 255     9          110010000 through
kusano 7d535a
                                            111111111
kusano 7d535a
                   256 - 279     7          0000000 through
kusano 7d535a
                                            0010111
kusano 7d535a
                   280 - 287     8          11000000 through
kusano 7d535a
                                            11000111
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                     [Page 12]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
         The code lengths are sufficient to generate the actual codes,
kusano 7d535a
         as described above; we show the codes in the table for added
kusano 7d535a
         clarity.  Literal/length values 286-287 will never actually
kusano 7d535a
         occur in the compressed data, but participate in the code
kusano 7d535a
         construction.
kusano 7d535a
kusano 7d535a
         Distance codes 0-31 are represented by (fixed-length) 5-bit
kusano 7d535a
         codes, with possible additional bits as shown in the table
kusano 7d535a
         shown in Paragraph 3.2.5, above.  Note that distance codes 30-
kusano 7d535a
         31 will never actually occur in the compressed data.
kusano 7d535a
kusano 7d535a
      3.2.7. Compression with dynamic Huffman codes (BTYPE=10)
kusano 7d535a
kusano 7d535a
         The Huffman codes for the two alphabets appear in the block
kusano 7d535a
         immediately after the header bits and before the actual
kusano 7d535a
         compressed data, first the literal/length code and then the
kusano 7d535a
         distance code.  Each code is defined by a sequence of code
kusano 7d535a
         lengths, as discussed in Paragraph 3.2.2, above.  For even
kusano 7d535a
         greater compactness, the code length sequences themselves are
kusano 7d535a
         compressed using a Huffman code.  The alphabet for code lengths
kusano 7d535a
         is as follows:
kusano 7d535a
kusano 7d535a
               0 - 15: Represent code lengths of 0 - 15
kusano 7d535a
                   16: Copy the previous code length 3 - 6 times.
kusano 7d535a
                       The next 2 bits indicate repeat length
kusano 7d535a
                             (0 = 3, ... , 3 = 6)
kusano 7d535a
                          Example:  Codes 8, 16 (+2 bits 11),
kusano 7d535a
                                    16 (+2 bits 10) will expand to
kusano 7d535a
                                    12 code lengths of 8 (1 + 6 + 5)
kusano 7d535a
                   17: Repeat a code length of 0 for 3 - 10 times.
kusano 7d535a
                       (3 bits of length)
kusano 7d535a
                   18: Repeat a code length of 0 for 11 - 138 times
kusano 7d535a
                       (7 bits of length)
kusano 7d535a
kusano 7d535a
         A code length of 0 indicates that the corresponding symbol in
kusano 7d535a
         the literal/length or distance alphabet will not occur in the
kusano 7d535a
         block, and should not participate in the Huffman code
kusano 7d535a
         construction algorithm given earlier.  If only one distance
kusano 7d535a
         code is used, it is encoded using one bit, not zero bits; in
kusano 7d535a
         this case there is a single code length of one, with one unused
kusano 7d535a
         code.  One distance code of zero bits means that there are no
kusano 7d535a
         distance codes used at all (the data is all literals).
kusano 7d535a
kusano 7d535a
         We can now define the format of the block:
kusano 7d535a
kusano 7d535a
               5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
kusano 7d535a
               5 Bits: HDIST, # of Distance codes - 1        (1 - 32)
kusano 7d535a
               4 Bits: HCLEN, # of Code Length codes - 4     (4 - 19)
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                     [Page 13]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
               (HCLEN + 4) x 3 bits: code lengths for the code length
kusano 7d535a
                  alphabet given just above, in the order: 16, 17, 18,
kusano 7d535a
                  0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
kusano 7d535a
kusano 7d535a
                  These code lengths are interpreted as 3-bit integers
kusano 7d535a
                  (0-7); as above, a code length of 0 means the
kusano 7d535a
                  corresponding symbol (literal/length or distance code
kusano 7d535a
                  length) is not used.
kusano 7d535a
kusano 7d535a
               HLIT + 257 code lengths for the literal/length alphabet,
kusano 7d535a
                  encoded using the code length Huffman code
kusano 7d535a
kusano 7d535a
               HDIST + 1 code lengths for the distance alphabet,
kusano 7d535a
                  encoded using the code length Huffman code
kusano 7d535a
kusano 7d535a
               The actual compressed data of the block,
kusano 7d535a
                  encoded using the literal/length and distance Huffman
kusano 7d535a
                  codes
kusano 7d535a
kusano 7d535a
               The literal/length symbol 256 (end of data),
kusano 7d535a
                  encoded using the literal/length Huffman code
kusano 7d535a
kusano 7d535a
         The code length repeat codes can cross from HLIT + 257 to the
kusano 7d535a
         HDIST + 1 code lengths.  In other words, all code lengths form
kusano 7d535a
         a single sequence of HLIT + HDIST + 258 values.
kusano 7d535a
kusano 7d535a
   3.3. Compliance
kusano 7d535a
kusano 7d535a
      A compressor may limit further the ranges of values specified in
kusano 7d535a
      the previous section and still be compliant; for example, it may
kusano 7d535a
      limit the range of backward pointers to some value smaller than
kusano 7d535a
      32K.  Similarly, a compressor may limit the size of blocks so that
kusano 7d535a
      a compressible block fits in memory.
kusano 7d535a
kusano 7d535a
      A compliant decompressor must accept the full range of possible
kusano 7d535a
      values defined in the previous section, and must accept blocks of
kusano 7d535a
      arbitrary size.
kusano 7d535a
kusano 7d535a
4. Compression algorithm details
kusano 7d535a
kusano 7d535a
   While it is the intent of this document to define the "deflate"
kusano 7d535a
   compressed data format without reference to any particular
kusano 7d535a
   compression algorithm, the format is related to the compressed
kusano 7d535a
   formats produced by LZ77 (Lempel-Ziv 1977, see reference [2] below);
kusano 7d535a
   since many variations of LZ77 are patented, it is strongly
kusano 7d535a
   recommended that the implementor of a compressor follow the general
kusano 7d535a
   algorithm presented here, which is known not to be patented per se.
kusano 7d535a
   The material in this section is not part of the definition of the
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                     [Page 14]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
   specification per se, and a compressor need not follow it in order to
kusano 7d535a
   be compliant.
kusano 7d535a
kusano 7d535a
   The compressor terminates a block when it determines that starting a
kusano 7d535a
   new block with fresh trees would be useful, or when the block size
kusano 7d535a
   fills up the compressor's block buffer.
kusano 7d535a
kusano 7d535a
   The compressor uses a chained hash table to find duplicated strings,
kusano 7d535a
   using a hash function that operates on 3-byte sequences.  At any
kusano 7d535a
   given point during compression, let XYZ be the next 3 input bytes to
kusano 7d535a
   be examined (not necessarily all different, of course).  First, the
kusano 7d535a
   compressor examines the hash chain for XYZ.  If the chain is empty,
kusano 7d535a
   the compressor simply writes out X as a literal byte and advances one
kusano 7d535a
   byte in the input.  If the hash chain is not empty, indicating that
kusano 7d535a
   the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
kusano 7d535a
   same hash function value) has occurred recently, the compressor
kusano 7d535a
   compares all strings on the XYZ hash chain with the actual input data
kusano 7d535a
   sequence starting at the current point, and selects the longest
kusano 7d535a
   match.
kusano 7d535a
kusano 7d535a
   The compressor searches the hash chains starting with the most recent
kusano 7d535a
   strings, to favor small distances and thus take advantage of the
kusano 7d535a
   Huffman encoding.  The hash chains are singly linked. There are no
kusano 7d535a
   deletions from the hash chains; the algorithm simply discards matches
kusano 7d535a
   that are too old.  To avoid a worst-case situation, very long hash
kusano 7d535a
   chains are arbitrarily truncated at a certain length, determined by a
kusano 7d535a
   run-time parameter.
kusano 7d535a
kusano 7d535a
   To improve overall compression, the compressor optionally defers the
kusano 7d535a
   selection of matches ("lazy matching"): after a match of length N has
kusano 7d535a
   been found, the compressor searches for a longer match starting at
kusano 7d535a
   the next input byte.  If it finds a longer match, it truncates the
kusano 7d535a
   previous match to a length of one (thus producing a single literal
kusano 7d535a
   byte) and then emits the longer match.  Otherwise, it emits the
kusano 7d535a
   original match, and, as described above, advances N bytes before
kusano 7d535a
   continuing.
kusano 7d535a
kusano 7d535a
   Run-time parameters also control this "lazy match" procedure.  If
kusano 7d535a
   compression ratio is most important, the compressor attempts a
kusano 7d535a
   complete second search regardless of the length of the first match.
kusano 7d535a
   In the normal case, if the current match is "long enough", the
kusano 7d535a
   compressor reduces the search for a longer match, thus speeding up
kusano 7d535a
   the process.  If speed is most important, the compressor inserts new
kusano 7d535a
   strings in the hash table only when no match was found, or when the
kusano 7d535a
   match is not "too long".  This degrades the compression ratio but
kusano 7d535a
   saves time since there are both fewer insertions and fewer searches.
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                     [Page 15]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
5. References
kusano 7d535a
kusano 7d535a
   [1] Huffman, D. A., "A Method for the Construction of Minimum
kusano 7d535a
       Redundancy Codes", Proceedings of the Institute of Radio
kusano 7d535a
       Engineers, September 1952, Volume 40, Number 9, pp. 1098-1101.
kusano 7d535a
kusano 7d535a
   [2] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data
kusano 7d535a
       Compression", IEEE Transactions on Information Theory, Vol. 23,
kusano 7d535a
       No. 3, pp. 337-343.
kusano 7d535a
kusano 7d535a
   [3] Gailly, J.-L., and Adler, M., ZLIB documentation and sources,
kusano 7d535a
       available in ftp://ftp.uu.net/pub/archiving/zip/doc/
kusano 7d535a
kusano 7d535a
   [4] Gailly, J.-L., and Adler, M., GZIP documentation and sources,
kusano 7d535a
       available as gzip-*.tar in ftp://prep.ai.mit.edu/pub/gnu/
kusano 7d535a
kusano 7d535a
   [5] Schwartz, E. S., and Kallick, B. "Generating a canonical prefix
kusano 7d535a
       encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169.
kusano 7d535a
kusano 7d535a
   [6] Hirschberg and Lelewer, "Efficient decoding of prefix codes,"
kusano 7d535a
       Comm. ACM, 33,4, April 1990, pp. 449-459.
kusano 7d535a
kusano 7d535a
6. Security Considerations
kusano 7d535a
kusano 7d535a
   Any data compression method involves the reduction of redundancy in
kusano 7d535a
   the data.  Consequently, any corruption of the data is likely to have
kusano 7d535a
   severe effects and be difficult to correct.  Uncompressed text, on
kusano 7d535a
   the other hand, will probably still be readable despite the presence
kusano 7d535a
   of some corrupted bytes.
kusano 7d535a
kusano 7d535a
   It is recommended that systems using this data format provide some
kusano 7d535a
   means of validating the integrity of the compressed data.  See
kusano 7d535a
   reference [3], for example.
kusano 7d535a
kusano 7d535a
7. Source code
kusano 7d535a
kusano 7d535a
   Source code for a C language implementation of a "deflate" compliant
kusano 7d535a
   compressor and decompressor is available within the zlib package at
kusano 7d535a
   ftp://ftp.uu.net/pub/archiving/zip/zlib/.
kusano 7d535a
kusano 7d535a
8. Acknowledgements
kusano 7d535a
kusano 7d535a
   Trademarks cited in this document are the property of their
kusano 7d535a
   respective owners.
kusano 7d535a
kusano 7d535a
   Phil Katz designed the deflate format.  Jean-Loup Gailly and Mark
kusano 7d535a
   Adler wrote the related software described in this specification.
kusano 7d535a
   Glenn Randers-Pehrson converted this document to RFC and HTML format.
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                     [Page 16]
kusano 7d535a

kusano 7d535a
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996
kusano 7d535a
kusano 7d535a
kusano 7d535a
9. Author's Address
kusano 7d535a
kusano 7d535a
   L. Peter Deutsch
kusano 7d535a
   Aladdin Enterprises
kusano 7d535a
   203 Santa Margarita Ave.
kusano 7d535a
   Menlo Park, CA 94025
kusano 7d535a
kusano 7d535a
   Phone: (415) 322-0103 (AM only)
kusano 7d535a
   FAX:   (415) 322-1734
kusano 7d535a
   EMail: <ghost@aladdin.com></ghost@aladdin.com>
kusano 7d535a
kusano 7d535a
   Questions about the technical content of this specification can be
kusano 7d535a
   sent by email to:
kusano 7d535a
kusano 7d535a
   Jean-Loup Gailly <gzip@prep.ai.mit.edu> and</gzip@prep.ai.mit.edu>
kusano 7d535a
   Mark Adler <madler@alumni.caltech.edu></madler@alumni.caltech.edu>
kusano 7d535a
kusano 7d535a
   Editorial comments on this specification can be sent by email to:
kusano 7d535a
kusano 7d535a
   L. Peter Deutsch <ghost@aladdin.com> and</ghost@aladdin.com>
kusano 7d535a
   Glenn Randers-Pehrson <randeg@alumni.rpi.edu></randeg@alumni.rpi.edu>
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
kusano 7d535a
Deutsch                      Informational                     [Page 17]
kusano 7d535a