kusano 7d535a
# LZ4 Streaming API Example : Line by Line Text Compression
kusano 7d535a
by *Takayuki Matsuoka*
kusano 7d535a
kusano 7d535a
`blockStreaming_lineByLine.c` is LZ4 Straming API example which implements line by line incremental (de)compression.
kusano 7d535a
kusano 7d535a
Please note the following restrictions :
kusano 7d535a
kusano 7d535a
 - Firstly, read "LZ4 Streaming API Basics".
kusano 7d535a
 - This is relatively advanced application example.
kusano 7d535a
 - Output file is not compatible with lz4frame and platform dependent.
kusano 7d535a
kusano 7d535a
kusano 7d535a
## What's the point of this example ?
kusano 7d535a
kusano 7d535a
 - Line by line incremental (de)compression.
kusano 7d535a
 - Handle huge file in small amount of memory
kusano 7d535a
 - Generally better compression ratio than Block API
kusano 7d535a
 - Non-uniform block size
kusano 7d535a
kusano 7d535a
kusano 7d535a
## How the compression works
kusano 7d535a
kusano 7d535a
First of all, allocate "Ring Buffer" for input and LZ4 compressed data buffer for output.
kusano 7d535a
kusano 7d535a
```
kusano 7d535a
(1)
kusano 7d535a
    Ring Buffer
kusano 7d535a
kusano 7d535a
    +--------+
kusano 7d535a
    | Line#1 |
kusano 7d535a
    +---+----+
kusano 7d535a
        |
kusano 7d535a
        v
kusano 7d535a
     {Out#1}
kusano 7d535a
kusano 7d535a
kusano 7d535a
(2)
kusano 7d535a
    Prefix Mode Dependency
kusano 7d535a
          +----+
kusano 7d535a
          |    |
kusano 7d535a
          v    |
kusano 7d535a
    +--------+-+------+
kusano 7d535a
    | Line#1 | Line#2 |
kusano 7d535a
    +--------+---+----+
kusano 7d535a
                 |
kusano 7d535a
                 v
kusano 7d535a
              {Out#2}
kusano 7d535a
kusano 7d535a
kusano 7d535a
(3)
kusano 7d535a
          Prefix   Prefix
kusano 7d535a
          +----+   +----+
kusano 7d535a
          |    |   |    |
kusano 7d535a
          v    |   v    |
kusano 7d535a
    +--------+-+------+-+------+
kusano 7d535a
    | Line#1 | Line#2 | Line#3 |
kusano 7d535a
    +--------+--------+---+----+
kusano 7d535a
                          |
kusano 7d535a
                          v
kusano 7d535a
                       {Out#3}
kusano 7d535a
kusano 7d535a
kusano 7d535a
(4)
kusano 7d535a
                        External Dictionary Mode
kusano 7d535a
                +----+   +----+
kusano 7d535a
                |    |   |    |
kusano 7d535a
                v    |   v    |
kusano 7d535a
    ------+--------+-+------+-+--------+
kusano 7d535a
          |  ....  | Line#X | Line#X+1 |
kusano 7d535a
    ------+--------+--------+-----+----+
kusano 7d535a
                            ^     |
kusano 7d535a
                            |     v
kusano 7d535a
                            |  {Out#X+1}
kusano 7d535a
                            |
kusano 7d535a
                          Reset
kusano 7d535a
kusano 7d535a
kusano 7d535a
(5)
kusano 7d535a
                                    Prefix
kusano 7d535a
                                    +-----+
kusano 7d535a
                                    |     |
kusano 7d535a
                                    v     |
kusano 7d535a
    ------+--------+--------+----------+--+-------+
kusano 7d535a
          |  ....  | Line#X | Line#X+1 | Line#X+2 |
kusano 7d535a
    ------+--------+--------+----------+-----+----+
kusano 7d535a
                            ^                |
kusano 7d535a
                            |                v
kusano 7d535a
                            |            {Out#X+2}
kusano 7d535a
                            |
kusano 7d535a
                          Reset
kusano 7d535a
```
kusano 7d535a
kusano 7d535a
Next (see (1)), read first line to ringbuffer and compress it by `LZ4_compress_continue()`.
kusano 7d535a
For the first time, LZ4 doesn't know any previous dependencies,
kusano 7d535a
so it just compress the line without dependencies and generates compressed line {Out#1} to LZ4 compressed data buffer.
kusano 7d535a
After that, write {Out#1} to the file and forward ringbuffer offset.
kusano 7d535a
kusano 7d535a
Do the same things to second line (see (2)).
kusano 7d535a
But in this time, LZ4 can use dependency to Line#1 to improve compression ratio.
kusano 7d535a
This dependency is called "Prefix mode".
kusano 7d535a
kusano 7d535a
Eventually, we'll reach end of ringbuffer at Line#X (see (4)).
kusano 7d535a
This time, we should reset ringbuffer offset.
kusano 7d535a
After resetting, at Line#X+1 pointer is not adjacent, but LZ4 still maintain its memory.
kusano 7d535a
This is called "External Dictionary Mode".
kusano 7d535a
kusano 7d535a
In Line#X+2 (see (5)), finally LZ4 forget almost all memories but still remains Line#X+1.
kusano 7d535a
This is the same situation as Line#2.
kusano 7d535a
kusano 7d535a
Continue these procedure to the end of text file.
kusano 7d535a
kusano 7d535a
kusano 7d535a
## How the decompression works
kusano 7d535a
kusano 7d535a
Decompression will do reverse order.
kusano 7d535a
kusano 7d535a
 - Read compressed line from the file to buffer.
kusano 7d535a
 - Decompress it to the ringbuffer.
kusano 7d535a
 - Output decompressed plain text line to the file.
kusano 7d535a
 - Forward ringbuffer offset. If offset exceedes end of the ringbuffer, reset it.
kusano 7d535a
kusano 7d535a
Continue these procedure to the end of the compressed file.