kusano fc6ab3
kusano fc6ab3
  "http://www.w3.org/TR/REC-html40/loose.dtd">
kusano fc6ab3
kusano fc6ab3
kusano fc6ab3
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
kusano fc6ab3
<title>zlib Usage Example</title>
kusano fc6ab3
kusano fc6ab3
kusano fc6ab3
kusano fc6ab3

zlib Usage Example

kusano fc6ab3
We often get questions about how the <tt>deflate()</tt> and <tt>inflate()</tt> functions should be used.
kusano fc6ab3
Users wonder when they should provide more input, when they should use more output,
kusano fc6ab3
what to do with a <tt>Z_BUF_ERROR</tt>, how to make sure the process terminates properly, and
kusano fc6ab3
so on.  So for those who have read <tt>zlib.h</tt> (a few times), and
kusano fc6ab3
would like further edification, below is an annotated example in C of simple routines to compress and decompress
kusano fc6ab3
from an input file to an output file using <tt>deflate()</tt> and <tt>inflate()</tt> respectively.  The
kusano fc6ab3
annotations are interspersed between lines of the code.  So please read between the lines.
kusano fc6ab3
We hope this helps explain some of the intricacies of zlib.
kusano fc6ab3

kusano fc6ab3
Without further adieu, here is the program <tt>zpipe.c</tt>:
kusano fc6ab3
kusano fc6ab3
/* zpipe.c: example of proper use of zlib's inflate() and deflate()
kusano fc6ab3
   Not copyrighted -- provided to the public domain
kusano fc6ab3
   Version 1.4  11 December 2005  Mark Adler */
kusano fc6ab3
kusano fc6ab3
/* Version history:
kusano fc6ab3
   1.0  30 Oct 2004  First version
kusano fc6ab3
   1.1   8 Nov 2004  Add void casting for unused return values
kusano fc6ab3
                     Use switch statement for inflate() return values
kusano fc6ab3
   1.2   9 Nov 2004  Add assertions to document zlib guarantees
kusano fc6ab3
   1.3   6 Apr 2005  Remove incorrect assertion in inf()
kusano fc6ab3
   1.4  11 Dec 2005  Add hack to avoid MSDOS end-of-line conversions
kusano fc6ab3
                     Avoid some compiler warnings for input and output buffers
kusano fc6ab3
 */
kusano fc6ab3
kusano fc6ab3
We now include the header files for the required definitions.  From
kusano fc6ab3
<tt>stdio.h</tt> we use <tt>fopen()</tt>, <tt>fread()</tt>, <tt>fwrite()</tt>,
kusano fc6ab3
<tt>feof()</tt>, <tt>ferror()</tt>, and <tt>fclose()</tt> for file i/o, and
kusano fc6ab3
<tt>fputs()</tt> for error messages.  From <tt>string.h</tt> we use
kusano fc6ab3
<tt>strcmp()</tt> for command line argument processing.
kusano fc6ab3
From <tt>assert.h</tt> we use the <tt>assert()</tt> macro.
kusano fc6ab3
From <tt>zlib.h</tt>
kusano fc6ab3
we use the basic compression functions <tt>deflateInit()</tt>,
kusano fc6ab3
<tt>deflate()</tt>, and <tt>deflateEnd()</tt>, and the basic decompression
kusano fc6ab3
functions <tt>inflateInit()</tt>, <tt>inflate()</tt>, and
kusano fc6ab3
<tt>inflateEnd()</tt>.
kusano fc6ab3
kusano fc6ab3
#include <stdio.h>
kusano fc6ab3
#include <string.h>
kusano fc6ab3
#include <assert.h>
kusano fc6ab3
#include "zlib.h"
kusano fc6ab3
kusano fc6ab3
This is an ugly hack required to avoid corruption of the input and output data on
kusano fc6ab3
Windows/MS-DOS systems.  Without this, those systems would assume that the input and output
kusano fc6ab3
files are text, and try to convert the end-of-line characters from one standard to
kusano fc6ab3
another.  That would corrupt binary data, and in particular would render the compressed data unusable.
kusano fc6ab3
This sets the input and output to binary which suppresses the end-of-line conversions.
kusano fc6ab3
<tt>SET_BINARY_MODE()</tt> will be used later on <tt>stdin</tt> and <tt>stdout</tt>, at the beginning of <tt>main()</tt>.
kusano fc6ab3
kusano fc6ab3
#if defined(MSDOS) || defined(OS2) || defined(WIN32) || defined(__CYGWIN__)
kusano fc6ab3
#  include <fcntl.h>
kusano fc6ab3
#  include <io.h>
kusano fc6ab3
#  define SET_BINARY_MODE(file) setmode(fileno(file), O_BINARY)
kusano fc6ab3
#else
kusano fc6ab3
#  define SET_BINARY_MODE(file)
kusano fc6ab3
#endif
kusano fc6ab3
kusano fc6ab3
<tt>CHUNK</tt> is simply the buffer size for feeding data to and pulling data
kusano fc6ab3
from the zlib routines.  Larger buffer sizes would be more efficient,
kusano fc6ab3
especially for <tt>inflate()</tt>.  If the memory is available, buffers sizes
kusano fc6ab3
on the order of 128K or 256K bytes should be used.
kusano fc6ab3
kusano fc6ab3
#define CHUNK 16384
kusano fc6ab3
kusano fc6ab3
The <tt>def()</tt> routine compresses data from an input file to an output file.  The output data
kusano fc6ab3
will be in the zlib format, which is different from the gzip or zip
kusano fc6ab3
formats.  The zlib format has a very small header of only two bytes to identify it as
kusano fc6ab3
a zlib stream and to provide decoding information, and a four-byte trailer with a fast
kusano fc6ab3
check value to verify the integrity of the uncompressed data after decoding.
kusano fc6ab3
kusano fc6ab3
/* Compress from file source to file dest until EOF on source.
kusano fc6ab3
   def() returns Z_OK on success, Z_MEM_ERROR if memory could not be
kusano fc6ab3
   allocated for processing, Z_STREAM_ERROR if an invalid compression
kusano fc6ab3
   level is supplied, Z_VERSION_ERROR if the version of zlib.h and the
kusano fc6ab3
   version of the library linked do not match, or Z_ERRNO if there is
kusano fc6ab3
   an error reading or writing the files. */
kusano fc6ab3
int def(FILE *source, FILE *dest, int level)
kusano fc6ab3
{
kusano fc6ab3
kusano fc6ab3
Here are the local variables for <tt>def()</tt>.  <tt>ret</tt> will be used for zlib
kusano fc6ab3
return codes.  <tt>flush</tt> will keep track of the current flushing state for <tt>deflate()</tt>,
kusano fc6ab3
which is either no flushing, or flush to completion after the end of the input file is reached.
kusano fc6ab3
<tt>have</tt> is the amount of data returned from <tt>deflate()</tt>.  The <tt>strm</tt> structure
kusano fc6ab3
is used to pass information to and from the zlib routines, and to maintain the
kusano fc6ab3
<tt>deflate()</tt> state.  <tt>in</tt> and <tt>out</tt> are the input and output buffers for
kusano fc6ab3
<tt>deflate()</tt>.
kusano fc6ab3
kusano fc6ab3
    int ret, flush;
kusano fc6ab3
    unsigned have;
kusano fc6ab3
    z_stream strm;
kusano fc6ab3
    unsigned char in[CHUNK];
kusano fc6ab3
    unsigned char out[CHUNK];
kusano fc6ab3
kusano fc6ab3
The first thing we do is to initialize the zlib state for compression using
kusano fc6ab3
<tt>deflateInit()</tt>.  This must be done before the first use of <tt>deflate()</tt>.
kusano fc6ab3
The <tt>zalloc</tt>, <tt>zfree</tt>, and <tt>opaque</tt> fields in the <tt>strm</tt>
kusano fc6ab3
structure must be initialized before calling <tt>deflateInit()</tt>.  Here they are
kusano fc6ab3
set to the zlib constant <tt>Z_NULL</tt> to request that zlib use
kusano fc6ab3
the default memory allocation routines.  An application may also choose to provide
kusano fc6ab3
custom memory allocation routines here.  <tt>deflateInit()</tt> will allocate on the
kusano fc6ab3
order of 256K bytes for the internal state.
kusano fc6ab3
(See zlib Technical Details.)
kusano fc6ab3

kusano fc6ab3
<tt>deflateInit()</tt> is called with a pointer to the structure to be initialized and
kusano fc6ab3
the compression level, which is an integer in the range of -1 to 9.  Lower compression
kusano fc6ab3
levels result in faster execution, but less compression.  Higher levels result in
kusano fc6ab3
greater compression, but slower execution.  The zlib constant Z_DEFAULT_COMPRESSION,
kusano fc6ab3
equal to -1,
kusano fc6ab3
provides a good compromise between compression and speed and is equivalent to level 6.
kusano fc6ab3
Level 0 actually does no compression at all, and in fact expands the data slightly to produce
kusano fc6ab3
the zlib format (it is not a byte-for-byte copy of the input).
kusano fc6ab3
More advanced applications of zlib
kusano fc6ab3
may use <tt>deflateInit2()</tt> here instead.  Such an application may want to reduce how
kusano fc6ab3
much memory will be used, at some price in compression.  Or it may need to request a
kusano fc6ab3
gzip header and trailer instead of a zlib header and trailer, or raw
kusano fc6ab3
encoding with no header or trailer at all.
kusano fc6ab3

kusano fc6ab3
We must check the return value of <tt>deflateInit()</tt> against the zlib constant
kusano fc6ab3
<tt>Z_OK</tt> to make sure that it was able to
kusano fc6ab3
allocate memory for the internal state, and that the provided arguments were valid.
kusano fc6ab3
<tt>deflateInit()</tt> will also check that the version of zlib that the <tt>zlib.h</tt>
kusano fc6ab3
file came from matches the version of zlib actually linked with the program.  This
kusano fc6ab3
is especially important for environments in which zlib is a shared library.
kusano fc6ab3

kusano fc6ab3
Note that an application can initialize multiple, independent zlib streams, which can
kusano fc6ab3
operate in parallel.  The state information maintained in the structure allows the zlib
kusano fc6ab3
routines to be reentrant.
kusano fc6ab3
kusano fc6ab3
    /* allocate deflate state */
kusano fc6ab3
    strm.zalloc = Z_NULL;
kusano fc6ab3
    strm.zfree = Z_NULL;
kusano fc6ab3
    strm.opaque = Z_NULL;
kusano fc6ab3
    ret = deflateInit(&strm, level);
kusano fc6ab3
    if (ret != Z_OK)
kusano fc6ab3
        return ret;
kusano fc6ab3
kusano fc6ab3
With the pleasantries out of the way, now we can get down to business.  The outer <tt>do</tt>-loop
kusano fc6ab3
reads all of the input file and exits at the bottom of the loop once end-of-file is reached.
kusano fc6ab3
This loop contains the only call of <tt>deflate()</tt>.  So we must make sure that all of the
kusano fc6ab3
input data has been processed and that all of the output data has been generated and consumed
kusano fc6ab3
before we fall out of the loop at the bottom.
kusano fc6ab3
kusano fc6ab3
    /* compress until end of file */
kusano fc6ab3
    do {
kusano fc6ab3
kusano fc6ab3
We start off by reading data from the input file.  The number of bytes read is put directly
kusano fc6ab3
into <tt>avail_in</tt>, and a pointer to those bytes is put into <tt>next_in</tt>.  We also
kusano fc6ab3
check to see if end-of-file on the input has been reached.  If we are at the end of file, then <tt>flush</tt> is set to the
kusano fc6ab3
zlib constant <tt>Z_FINISH</tt>, which is later passed to <tt>deflate()</tt> to
kusano fc6ab3
indicate that this is the last chunk of input data to compress.  We need to use <tt>feof()</tt>
kusano fc6ab3
to check for end-of-file as opposed to seeing if fewer than <tt>CHUNK</tt> bytes have been read.  The
kusano fc6ab3
reason is that if the input file length is an exact multiple of <tt>CHUNK</tt>, we will miss
kusano fc6ab3
the fact that we got to the end-of-file, and not know to tell <tt>deflate()</tt> to finish
kusano fc6ab3
up the compressed stream.  If we are not yet at the end of the input, then the zlib
kusano fc6ab3
constant <tt>Z_NO_FLUSH</tt> will be passed to <tt>deflate</tt> to indicate that we are still
kusano fc6ab3
in the middle of the uncompressed data.
kusano fc6ab3

kusano fc6ab3
If there is an error in reading from the input file, the process is aborted with
kusano fc6ab3
<tt>deflateEnd()</tt> being called to free the allocated zlib state before returning
kusano fc6ab3
the error.  We wouldn't want a memory leak, now would we?  <tt>deflateEnd()</tt> can be called
kusano fc6ab3
at any time after the state has been initialized.  Once that's done, <tt>deflateInit()</tt> (or
kusano fc6ab3
<tt>deflateInit2()</tt>) would have to be called to start a new compression process.  There is
kusano fc6ab3
no point here in checking the <tt>deflateEnd()</tt> return code.  The deallocation can't fail.
kusano fc6ab3
kusano fc6ab3
        strm.avail_in = fread(in, 1, CHUNK, source);
kusano fc6ab3
        if (ferror(source)) {
kusano fc6ab3
            (void)deflateEnd(&strm);
kusano fc6ab3
            return Z_ERRNO;
kusano fc6ab3
        }
kusano fc6ab3
        flush = feof(source) ? Z_FINISH : Z_NO_FLUSH;
kusano fc6ab3
        strm.next_in = in;
kusano fc6ab3
kusano fc6ab3
The inner <tt>do</tt>-loop passes our chunk of input data to <tt>deflate()</tt>, and then
kusano fc6ab3
keeps calling <tt>deflate()</tt> until it is done producing output.  Once there is no more
kusano fc6ab3
new output, <tt>deflate()</tt> is guaranteed to have consumed all of the input, i.e.,
kusano fc6ab3
<tt>avail_in</tt> will be zero.
kusano fc6ab3
kusano fc6ab3
        /* run deflate() on input until output buffer not full, finish
kusano fc6ab3
           compression if all of source has been read in */
kusano fc6ab3
        do {
kusano fc6ab3
kusano fc6ab3
Output space is provided to <tt>deflate()</tt> by setting <tt>avail_out</tt> to the number
kusano fc6ab3
of available output bytes and <tt>next_out</tt> to a pointer to that space.
kusano fc6ab3
kusano fc6ab3
            strm.avail_out = CHUNK;
kusano fc6ab3
            strm.next_out = out;
kusano fc6ab3
kusano fc6ab3
Now we call the compression engine itself, <tt>deflate()</tt>.  It takes as many of the
kusano fc6ab3
<tt>avail_in</tt> bytes at <tt>next_in</tt> as it can process, and writes as many as
kusano fc6ab3
<tt>avail_out</tt> bytes to <tt>next_out</tt>.  Those counters and pointers are then
kusano fc6ab3
updated past the input data consumed and the output data written.  It is the amount of
kusano fc6ab3
output space available that may limit how much input is consumed.
kusano fc6ab3
Hence the inner loop to make sure that
kusano fc6ab3
all of the input is consumed by providing more output space each time.  Since <tt>avail_in</tt>
kusano fc6ab3
and <tt>next_in</tt> are updated by <tt>deflate()</tt>, we don't have to mess with those
kusano fc6ab3
between <tt>deflate()</tt> calls until it's all used up.
kusano fc6ab3

kusano fc6ab3
The parameters to <tt>deflate()</tt> are a pointer to the <tt>strm</tt> structure containing
kusano fc6ab3
the input and output information and the internal compression engine state, and a parameter
kusano fc6ab3
indicating whether and how to flush data to the output.  Normally <tt>deflate</tt> will consume
kusano fc6ab3
several K bytes of input data before producing any output (except for the header), in order
kusano fc6ab3
to accumulate statistics on the data for optimum compression.  It will then put out a burst of
kusano fc6ab3
compressed data, and proceed to consume more input before the next burst.  Eventually,
kusano fc6ab3
<tt>deflate()</tt>
kusano fc6ab3
must be told to terminate the stream, complete the compression with provided input data, and
kusano fc6ab3
write out the trailer check value.  <tt>deflate()</tt> will continue to compress normally as long
kusano fc6ab3
as the flush parameter is <tt>Z_NO_FLUSH</tt>.  Once the <tt>Z_FINISH</tt> parameter is provided,
kusano fc6ab3
<tt>deflate()</tt> will begin to complete the compressed output stream.  However depending on how
kusano fc6ab3
much output space is provided, <tt>deflate()</tt> may have to be called several times until it
kusano fc6ab3
has provided the complete compressed stream, even after it has consumed all of the input.  The flush
kusano fc6ab3
parameter must continue to be <tt>Z_FINISH</tt> for those subsequent calls.
kusano fc6ab3

kusano fc6ab3
There are other values of the flush parameter that are used in more advanced applications.  You can
kusano fc6ab3
force <tt>deflate()</tt> to produce a burst of output that encodes all of the input data provided
kusano fc6ab3
so far, even if it wouldn't have otherwise, for example to control data latency on a link with
kusano fc6ab3
compressed data.  You can also ask that <tt>deflate()</tt> do that as well as erase any history up to
kusano fc6ab3
that point so that what follows can be decompressed independently, for example for random access
kusano fc6ab3
applications.  Both requests will degrade compression by an amount depending on how often such
kusano fc6ab3
requests are made.
kusano fc6ab3

kusano fc6ab3
<tt>deflate()</tt> has a return value that can indicate errors, yet we do not check it here.  Why
kusano fc6ab3
not?  Well, it turns out that <tt>deflate()</tt> can do no wrong here.  Let's go through
kusano fc6ab3
<tt>deflate()</tt>'s return values and dispense with them one by one.  The possible values are
kusano fc6ab3
<tt>Z_OK</tt>, <tt>Z_STREAM_END</tt>, <tt>Z_STREAM_ERROR</tt>, or <tt>Z_BUF_ERROR</tt>.  <tt>Z_OK</tt>
kusano fc6ab3
is, well, ok.  <tt>Z_STREAM_END</tt> is also ok and will be returned for the last call of
kusano fc6ab3
<tt>deflate()</tt>.  This is already guaranteed by calling <tt>deflate()</tt> with <tt>Z_FINISH</tt>
kusano fc6ab3
until it has no more output.  <tt>Z_STREAM_ERROR</tt> is only possible if the stream is not
kusano fc6ab3
initialized properly, but we did initialize it properly.  There is no harm in checking for
kusano fc6ab3
<tt>Z_STREAM_ERROR</tt> here, for example to check for the possibility that some
kusano fc6ab3
other part of the application inadvertently clobbered the memory containing the zlib state.
kusano fc6ab3
<tt>Z_BUF_ERROR</tt> will be explained further below, but
kusano fc6ab3
suffice it to say that this is simply an indication that <tt>deflate()</tt> could not consume
kusano fc6ab3
more input or produce more output.  <tt>deflate()</tt> can be called again with more output space
kusano fc6ab3
or more available input, which it will be in this code.
kusano fc6ab3
kusano fc6ab3
            ret = deflate(&strm, flush);    /* no bad return value */
kusano fc6ab3
            assert(ret != Z_STREAM_ERROR);  /* state not clobbered */
kusano fc6ab3
kusano fc6ab3
Now we compute how much output <tt>deflate()</tt> provided on the last call, which is the
kusano fc6ab3
difference between how much space was provided before the call, and how much output space
kusano fc6ab3
is still available after the call.  Then that data, if any, is written to the output file.
kusano fc6ab3
We can then reuse the output buffer for the next call of <tt>deflate()</tt>.  Again if there
kusano fc6ab3
is a file i/o error, we call <tt>deflateEnd()</tt> before returning to avoid a memory leak.
kusano fc6ab3
kusano fc6ab3
            have = CHUNK - strm.avail_out;
kusano fc6ab3
            if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
kusano fc6ab3
                (void)deflateEnd(&strm);
kusano fc6ab3
                return Z_ERRNO;
kusano fc6ab3
            }
kusano fc6ab3
kusano fc6ab3
The inner <tt>do</tt>-loop is repeated until the last <tt>deflate()</tt> call fails to fill the
kusano fc6ab3
provided output buffer.  Then we know that <tt>deflate()</tt> has done as much as it can with
kusano fc6ab3
the provided input, and that all of that input has been consumed.  We can then fall out of this
kusano fc6ab3
loop and reuse the input buffer.
kusano fc6ab3

kusano fc6ab3
The way we tell that <tt>deflate()</tt> has no more output is by seeing that it did not fill
kusano fc6ab3
the output buffer, leaving <tt>avail_out</tt> greater than zero.  However suppose that
kusano fc6ab3
<tt>deflate()</tt> has no more output, but just so happened to exactly fill the output buffer!
kusano fc6ab3
<tt>avail_out</tt> is zero, and we can't tell that <tt>deflate()</tt> has done all it can.
kusano fc6ab3
As far as we know, <tt>deflate()</tt>
kusano fc6ab3
has more output for us.  So we call it again.  But now <tt>deflate()</tt> produces no output
kusano fc6ab3
at all, and <tt>avail_out</tt> remains unchanged as <tt>CHUNK</tt>.  That <tt>deflate()</tt> call
kusano fc6ab3
wasn't able to do anything, either consume input or produce output, and so it returns
kusano fc6ab3
<tt>Z_BUF_ERROR</tt>.  (See, I told you I'd cover this later.)  However this is not a problem at
kusano fc6ab3
all.  Now we finally have the desired indication that <tt>deflate()</tt> is really done,
kusano fc6ab3
and so we drop out of the inner loop to provide more input to <tt>deflate()</tt>.
kusano fc6ab3

kusano fc6ab3
With <tt>flush</tt> set to <tt>Z_FINISH</tt>, this final set of <tt>deflate()</tt> calls will
kusano fc6ab3
complete the output stream.  Once that is done, subsequent calls of <tt>deflate()</tt> would return
kusano fc6ab3
<tt>Z_STREAM_ERROR</tt> if the flush parameter is not <tt>Z_FINISH</tt>, and do no more processing
kusano fc6ab3
until the state is reinitialized.
kusano fc6ab3

kusano fc6ab3
Some applications of zlib have two loops that call <tt>deflate()</tt>
kusano fc6ab3
instead of the single inner loop we have here.  The first loop would call
kusano fc6ab3
without flushing and feed all of the data to <tt>deflate()</tt>.  The second loop would call
kusano fc6ab3
<tt>deflate()</tt> with no more
kusano fc6ab3
data and the <tt>Z_FINISH</tt> parameter to complete the process.  As you can see from this
kusano fc6ab3
example, that can be avoided by simply keeping track of the current flush state.
kusano fc6ab3
kusano fc6ab3
        } while (strm.avail_out == 0);
kusano fc6ab3
        assert(strm.avail_in == 0);     /* all input will be used */
kusano fc6ab3
kusano fc6ab3
Now we check to see if we have already processed all of the input file.  That information was
kusano fc6ab3
saved in the <tt>flush</tt> variable, so we see if that was set to <tt>Z_FINISH</tt>.  If so,
kusano fc6ab3
then we're done and we fall out of the outer loop.  We're guaranteed to get <tt>Z_STREAM_END</tt>
kusano fc6ab3
from the last <tt>deflate()</tt> call, since we ran it until the last chunk of input was
kusano fc6ab3
consumed and all of the output was generated.
kusano fc6ab3
kusano fc6ab3
        /* done when last data in file processed */
kusano fc6ab3
    } while (flush != Z_FINISH);
kusano fc6ab3
    assert(ret == Z_STREAM_END);        /* stream will be complete */
kusano fc6ab3
kusano fc6ab3
The process is complete, but we still need to deallocate the state to avoid a memory leak
kusano fc6ab3
(or rather more like a memory hemorrhage if you didn't do this).  Then
kusano fc6ab3
finally we can return with a happy return value.
kusano fc6ab3
kusano fc6ab3
    /* clean up and return */
kusano fc6ab3
    (void)deflateEnd(&strm);
kusano fc6ab3
    return Z_OK;
kusano fc6ab3
}
kusano fc6ab3
kusano fc6ab3
Now we do the same thing for decompression in the <tt>inf()</tt> routine. <tt>inf()</tt>
kusano fc6ab3
decompresses what is hopefully a valid zlib stream from the input file and writes the
kusano fc6ab3
uncompressed data to the output file.  Much of the discussion above for <tt>def()</tt>
kusano fc6ab3
applies to <tt>inf()</tt> as well, so the discussion here will focus on the differences between
kusano fc6ab3
the two.
kusano fc6ab3
kusano fc6ab3
/* Decompress from file source to file dest until stream ends or EOF.
kusano fc6ab3
   inf() returns Z_OK on success, Z_MEM_ERROR if memory could not be
kusano fc6ab3
   allocated for processing, Z_DATA_ERROR if the deflate data is
kusano fc6ab3
   invalid or incomplete, Z_VERSION_ERROR if the version of zlib.h and
kusano fc6ab3
   the version of the library linked do not match, or Z_ERRNO if there
kusano fc6ab3
   is an error reading or writing the files. */
kusano fc6ab3
int inf(FILE *source, FILE *dest)
kusano fc6ab3
{
kusano fc6ab3
kusano fc6ab3
The local variables have the same functionality as they do for <tt>def()</tt>.  The
kusano fc6ab3
only difference is that there is no <tt>flush</tt> variable, since <tt>inflate()</tt>
kusano fc6ab3
can tell from the zlib stream itself when the stream is complete.
kusano fc6ab3
kusano fc6ab3
    int ret;
kusano fc6ab3
    unsigned have;
kusano fc6ab3
    z_stream strm;
kusano fc6ab3
    unsigned char in[CHUNK];
kusano fc6ab3
    unsigned char out[CHUNK];
kusano fc6ab3
kusano fc6ab3
The initialization of the state is the same, except that there is no compression level,
kusano fc6ab3
of course, and two more elements of the structure are initialized.  <tt>avail_in</tt>
kusano fc6ab3
and <tt>next_in</tt> must be initialized before calling <tt>inflateInit()</tt>.  This
kusano fc6ab3
is because the application has the option to provide the start of the zlib stream in
kusano fc6ab3
order for <tt>inflateInit()</tt> to have access to information about the compression
kusano fc6ab3
method to aid in memory allocation.  In the current implementation of zlib
kusano fc6ab3
(up through versions 1.2.x), the method-dependent memory allocations are deferred to the first call of
kusano fc6ab3
<tt>inflate()</tt> anyway.  However those fields must be initialized since later versions
kusano fc6ab3
of zlib that provide more compression methods may take advantage of this interface.
kusano fc6ab3
In any case, no decompression is performed by <tt>inflateInit()</tt>, so the
kusano fc6ab3
<tt>avail_out</tt> and <tt>next_out</tt> fields do not need to be initialized before calling.
kusano fc6ab3

kusano fc6ab3
Here <tt>avail_in</tt> is set to zero and <tt>next_in</tt> is set to <tt>Z_NULL</tt> to
kusano fc6ab3
indicate that no input data is being provided.
kusano fc6ab3
kusano fc6ab3
    /* allocate inflate state */
kusano fc6ab3
    strm.zalloc = Z_NULL;
kusano fc6ab3
    strm.zfree = Z_NULL;
kusano fc6ab3
    strm.opaque = Z_NULL;
kusano fc6ab3
    strm.avail_in = 0;
kusano fc6ab3
    strm.next_in = Z_NULL;
kusano fc6ab3
    ret = inflateInit(&strm);
kusano fc6ab3
    if (ret != Z_OK)
kusano fc6ab3
        return ret;
kusano fc6ab3
kusano fc6ab3
The outer <tt>do</tt>-loop decompresses input until <tt>inflate()</tt> indicates
kusano fc6ab3
that it has reached the end of the compressed data and has produced all of the uncompressed
kusano fc6ab3
output.  This is in contrast to <tt>def()</tt> which processes all of the input file.
kusano fc6ab3
If end-of-file is reached before the compressed data self-terminates, then the compressed
kusano fc6ab3
data is incomplete and an error is returned.
kusano fc6ab3
kusano fc6ab3
    /* decompress until deflate stream ends or end of file */
kusano fc6ab3
    do {
kusano fc6ab3
kusano fc6ab3
We read input data and set the <tt>strm</tt> structure accordingly.  If we've reached the
kusano fc6ab3
end of the input file, then we leave the outer loop and report an error, since the
kusano fc6ab3
compressed data is incomplete.  Note that we may read more data than is eventually consumed
kusano fc6ab3
by <tt>inflate()</tt>, if the input file continues past the zlib stream.
kusano fc6ab3
For applications where zlib streams are embedded in other data, this routine would
kusano fc6ab3
need to be modified to return the unused data, or at least indicate how much of the input
kusano fc6ab3
data was not used, so the application would know where to pick up after the zlib stream.
kusano fc6ab3
kusano fc6ab3
        strm.avail_in = fread(in, 1, CHUNK, source);
kusano fc6ab3
        if (ferror(source)) {
kusano fc6ab3
            (void)inflateEnd(&strm);
kusano fc6ab3
            return Z_ERRNO;
kusano fc6ab3
        }
kusano fc6ab3
        if (strm.avail_in == 0)
kusano fc6ab3
            break;
kusano fc6ab3
        strm.next_in = in;
kusano fc6ab3
kusano fc6ab3
The inner <tt>do</tt>-loop has the same function it did in <tt>def()</tt>, which is to
kusano fc6ab3
keep calling <tt>inflate()</tt> until has generated all of the output it can with the
kusano fc6ab3
provided input.
kusano fc6ab3
kusano fc6ab3
        /* run inflate() on input until output buffer not full */
kusano fc6ab3
        do {
kusano fc6ab3
kusano fc6ab3
Just like in <tt>def()</tt>, the same output space is provided for each call of <tt>inflate()</tt>.
kusano fc6ab3
kusano fc6ab3
            strm.avail_out = CHUNK;
kusano fc6ab3
            strm.next_out = out;
kusano fc6ab3
kusano fc6ab3
Now we run the decompression engine itself.  There is no need to adjust the flush parameter, since
kusano fc6ab3
the zlib format is self-terminating. The main difference here is that there are
kusano fc6ab3
return values that we need to pay attention to.  <tt>Z_DATA_ERROR</tt>
kusano fc6ab3
indicates that <tt>inflate()</tt> detected an error in the zlib compressed data format,
kusano fc6ab3
which means that either the data is not a zlib stream to begin with, or that the data was
kusano fc6ab3
corrupted somewhere along the way since it was compressed.  The other error to be processed is
kusano fc6ab3
<tt>Z_MEM_ERROR</tt>, which can occur since memory allocation is deferred until <tt>inflate()</tt>
kusano fc6ab3
needs it, unlike <tt>deflate()</tt>, whose memory is allocated at the start by <tt>deflateInit()</tt>.
kusano fc6ab3

kusano fc6ab3
Advanced applications may use
kusano fc6ab3
<tt>deflateSetDictionary()</tt> to prime <tt>deflate()</tt> with a set of likely data to improve the
kusano fc6ab3
first 32K or so of compression.  This is noted in the zlib header, so <tt>inflate()</tt>
kusano fc6ab3
requests that that dictionary be provided before it can start to decompress.  Without the dictionary,
kusano fc6ab3
correct decompression is not possible.  For this routine, we have no idea what the dictionary is,
kusano fc6ab3
so the <tt>Z_NEED_DICT</tt> indication is converted to a <tt>Z_DATA_ERROR</tt>.
kusano fc6ab3

kusano fc6ab3
<tt>inflate()</tt> can also return <tt>Z_STREAM_ERROR</tt>, which should not be possible here,
kusano fc6ab3
but could be checked for as noted above for <tt>def()</tt>.  <tt>Z_BUF_ERROR</tt> does not need to be
kusano fc6ab3
checked for here, for the same reasons noted for <tt>def()</tt>.  <tt>Z_STREAM_END</tt> will be
kusano fc6ab3
checked for later.
kusano fc6ab3
kusano fc6ab3
            ret = inflate(&strm, Z_NO_FLUSH);
kusano fc6ab3
            assert(ret != Z_STREAM_ERROR);  /* state not clobbered */
kusano fc6ab3
            switch (ret) {
kusano fc6ab3
            case Z_NEED_DICT:
kusano fc6ab3
                ret = Z_DATA_ERROR;     /* and fall through */
kusano fc6ab3
            case Z_DATA_ERROR:
kusano fc6ab3
            case Z_MEM_ERROR:
kusano fc6ab3
                (void)inflateEnd(&strm);
kusano fc6ab3
                return ret;
kusano fc6ab3
            }
kusano fc6ab3
kusano fc6ab3
The output of <tt>inflate()</tt> is handled identically to that of <tt>deflate()</tt>.
kusano fc6ab3
kusano fc6ab3
            have = CHUNK - strm.avail_out;
kusano fc6ab3
            if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
kusano fc6ab3
                (void)inflateEnd(&strm);
kusano fc6ab3
                return Z_ERRNO;
kusano fc6ab3
            }
kusano fc6ab3
kusano fc6ab3
The inner <tt>do</tt>-loop ends when <tt>inflate()</tt> has no more output as indicated
kusano fc6ab3
by not filling the output buffer, just as for <tt>deflate()</tt>.  In this case, we cannot
kusano fc6ab3
assert that <tt>strm.avail_in</tt> will be zero, since the deflate stream may end before the file
kusano fc6ab3
does.
kusano fc6ab3
kusano fc6ab3
        } while (strm.avail_out == 0);
kusano fc6ab3
kusano fc6ab3
The outer <tt>do</tt>-loop ends when <tt>inflate()</tt> reports that it has reached the
kusano fc6ab3
end of the input zlib stream, has completed the decompression and integrity
kusano fc6ab3
check, and has provided all of the output.  This is indicated by the <tt>inflate()</tt>
kusano fc6ab3
return value <tt>Z_STREAM_END</tt>.  The inner loop is guaranteed to leave <tt>ret</tt>
kusano fc6ab3
equal to <tt>Z_STREAM_END</tt> if the last chunk of the input file read contained the end
kusano fc6ab3
of the zlib stream.  So if the return value is not <tt>Z_STREAM_END</tt>, the
kusano fc6ab3
loop continues to read more input.
kusano fc6ab3
kusano fc6ab3
        /* done when inflate() says it's done */
kusano fc6ab3
    } while (ret != Z_STREAM_END);
kusano fc6ab3
kusano fc6ab3
At this point, decompression successfully completed, or we broke out of the loop due to no
kusano fc6ab3
more data being available from the input file.  If the last <tt>inflate()</tt> return value
kusano fc6ab3
is not <tt>Z_STREAM_END</tt>, then the zlib stream was incomplete and a data error
kusano fc6ab3
is returned.  Otherwise, we return with a happy return value.  Of course, <tt>inflateEnd()</tt>
kusano fc6ab3
is called first to avoid a memory leak.
kusano fc6ab3
kusano fc6ab3
    /* clean up and return */
kusano fc6ab3
    (void)inflateEnd(&strm);
kusano fc6ab3
    return ret == Z_STREAM_END ? Z_OK : Z_DATA_ERROR;
kusano fc6ab3
}
kusano fc6ab3
kusano fc6ab3
That ends the routines that directly use zlib.  The following routines make this
kusano fc6ab3
a command-line program by running data through the above routines from <tt>stdin</tt> to
kusano fc6ab3
<tt>stdout</tt>, and handling any errors reported by <tt>def()</tt> or <tt>inf()</tt>.
kusano fc6ab3

kusano fc6ab3
<tt>zerr()</tt> is used to interpret the possible error codes from <tt>def()</tt>
kusano fc6ab3
and <tt>inf()</tt>, as detailed in their comments above, and print out an error message.
kusano fc6ab3
Note that these are only a subset of the possible return values from <tt>deflate()</tt>
kusano fc6ab3
and <tt>inflate()</tt>.
kusano fc6ab3
kusano fc6ab3
/* report a zlib or i/o error */
kusano fc6ab3
void zerr(int ret)
kusano fc6ab3
{
kusano fc6ab3
    fputs("zpipe: ", stderr);
kusano fc6ab3
    switch (ret) {
kusano fc6ab3
    case Z_ERRNO:
kusano fc6ab3
        if (ferror(stdin))
kusano fc6ab3
            fputs("error reading stdin\n", stderr);
kusano fc6ab3
        if (ferror(stdout))
kusano fc6ab3
            fputs("error writing stdout\n", stderr);
kusano fc6ab3
        break;
kusano fc6ab3
    case Z_STREAM_ERROR:
kusano fc6ab3
        fputs("invalid compression level\n", stderr);
kusano fc6ab3
        break;
kusano fc6ab3
    case Z_DATA_ERROR:
kusano fc6ab3
        fputs("invalid or incomplete deflate data\n", stderr);
kusano fc6ab3
        break;
kusano fc6ab3
    case Z_MEM_ERROR:
kusano fc6ab3
        fputs("out of memory\n", stderr);
kusano fc6ab3
        break;
kusano fc6ab3
    case Z_VERSION_ERROR:
kusano fc6ab3
        fputs("zlib version mismatch!\n", stderr);
kusano fc6ab3
    }
kusano fc6ab3
}
kusano fc6ab3
kusano fc6ab3
Here is the <tt>main()</tt> routine used to test <tt>def()</tt> and <tt>inf()</tt>.  The
kusano fc6ab3
<tt>zpipe</tt> command is simply a compression pipe from <tt>stdin</tt> to <tt>stdout</tt>, if
kusano fc6ab3
no arguments are given, or it is a decompression pipe if <tt>zpipe -d</tt> is used.  If any other
kusano fc6ab3
arguments are provided, no compression or decompression is performed.  Instead a usage
kusano fc6ab3
message is displayed.  Examples are <tt>zpipe < foo.txt > foo.txt.z</tt> to compress, and
kusano fc6ab3
<tt>zpipe -d < foo.txt.z > foo.txt</tt> to decompress.
kusano fc6ab3
kusano fc6ab3
/* compress or decompress from stdin to stdout */
kusano fc6ab3
int main(int argc, char **argv)
kusano fc6ab3
{
kusano fc6ab3
    int ret;
kusano fc6ab3
kusano fc6ab3
    /* avoid end-of-line conversions */
kusano fc6ab3
    SET_BINARY_MODE(stdin);
kusano fc6ab3
    SET_BINARY_MODE(stdout);
kusano fc6ab3
kusano fc6ab3
    /* do compression if no arguments */
kusano fc6ab3
    if (argc == 1) {
kusano fc6ab3
        ret = def(stdin, stdout, Z_DEFAULT_COMPRESSION);
kusano fc6ab3
        if (ret != Z_OK)
kusano fc6ab3
            zerr(ret);
kusano fc6ab3
        return ret;
kusano fc6ab3
    }
kusano fc6ab3
kusano fc6ab3
    /* do decompression if -d specified */
kusano fc6ab3
    else if (argc == 2 && strcmp(argv[1], "-d") == 0) {
kusano fc6ab3
        ret = inf(stdin, stdout);
kusano fc6ab3
        if (ret != Z_OK)
kusano fc6ab3
            zerr(ret);
kusano fc6ab3
        return ret;
kusano fc6ab3
    }
kusano fc6ab3
kusano fc6ab3
    /* otherwise, report usage */
kusano fc6ab3
    else {
kusano fc6ab3
        fputs("zpipe usage: zpipe [-d] < source > dest\n", stderr);
kusano fc6ab3
        return 1;
kusano fc6ab3
    }
kusano fc6ab3
}
kusano fc6ab3
kusano fc6ab3

kusano fc6ab3
Copyright (c) 2004, 2005 by Mark Adler
Last modified 11 December 2005
kusano fc6ab3
kusano fc6ab3