|
kusano |
7d535a |
# LZ4 Streaming API Example : Line by Line Text Compression
|
|
kusano |
7d535a |
by *Takayuki Matsuoka*
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
`blockStreaming_lineByLine.c` is LZ4 Straming API example which implements line by line incremental (de)compression.
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
Please note the following restrictions :
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
- Firstly, read "LZ4 Streaming API Basics".
|
|
kusano |
7d535a |
- This is relatively advanced application example.
|
|
kusano |
7d535a |
- Output file is not compatible with lz4frame and platform dependent.
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
## What's the point of this example ?
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
- Line by line incremental (de)compression.
|
|
kusano |
7d535a |
- Handle huge file in small amount of memory
|
|
kusano |
7d535a |
- Generally better compression ratio than Block API
|
|
kusano |
7d535a |
- Non-uniform block size
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
## How the compression works
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
First of all, allocate "Ring Buffer" for input and LZ4 compressed data buffer for output.
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
```
|
|
kusano |
7d535a |
(1)
|
|
kusano |
7d535a |
Ring Buffer
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
+--------+
|
|
kusano |
7d535a |
| Line#1 |
|
|
kusano |
7d535a |
+---+----+
|
|
kusano |
7d535a |
|
|
|
kusano |
7d535a |
v
|
|
kusano |
7d535a |
{Out#1}
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
(2)
|
|
kusano |
7d535a |
Prefix Mode Dependency
|
|
kusano |
7d535a |
+----+
|
|
kusano |
7d535a |
| |
|
|
kusano |
7d535a |
v |
|
|
kusano |
7d535a |
+--------+-+------+
|
|
kusano |
7d535a |
| Line#1 | Line#2 |
|
|
kusano |
7d535a |
+--------+---+----+
|
|
kusano |
7d535a |
|
|
|
kusano |
7d535a |
v
|
|
kusano |
7d535a |
{Out#2}
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
(3)
|
|
kusano |
7d535a |
Prefix Prefix
|
|
kusano |
7d535a |
+----+ +----+
|
|
kusano |
7d535a |
| | | |
|
|
kusano |
7d535a |
v | v |
|
|
kusano |
7d535a |
+--------+-+------+-+------+
|
|
kusano |
7d535a |
| Line#1 | Line#2 | Line#3 |
|
|
kusano |
7d535a |
+--------+--------+---+----+
|
|
kusano |
7d535a |
|
|
|
kusano |
7d535a |
v
|
|
kusano |
7d535a |
{Out#3}
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
(4)
|
|
kusano |
7d535a |
External Dictionary Mode
|
|
kusano |
7d535a |
+----+ +----+
|
|
kusano |
7d535a |
| | | |
|
|
kusano |
7d535a |
v | v |
|
|
kusano |
7d535a |
------+--------+-+------+-+--------+
|
|
kusano |
7d535a |
| .... | Line#X | Line#X+1 |
|
|
kusano |
7d535a |
------+--------+--------+-----+----+
|
|
kusano |
7d535a |
^ |
|
|
kusano |
7d535a |
| v
|
|
kusano |
7d535a |
| {Out#X+1}
|
|
kusano |
7d535a |
|
|
|
kusano |
7d535a |
Reset
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
(5)
|
|
kusano |
7d535a |
Prefix
|
|
kusano |
7d535a |
+-----+
|
|
kusano |
7d535a |
| |
|
|
kusano |
7d535a |
v |
|
|
kusano |
7d535a |
------+--------+--------+----------+--+-------+
|
|
kusano |
7d535a |
| .... | Line#X | Line#X+1 | Line#X+2 |
|
|
kusano |
7d535a |
------+--------+--------+----------+-----+----+
|
|
kusano |
7d535a |
^ |
|
|
kusano |
7d535a |
| v
|
|
kusano |
7d535a |
| {Out#X+2}
|
|
kusano |
7d535a |
|
|
|
kusano |
7d535a |
Reset
|
|
kusano |
7d535a |
```
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
Next (see (1)), read first line to ringbuffer and compress it by `LZ4_compress_continue()`.
|
|
kusano |
7d535a |
For the first time, LZ4 doesn't know any previous dependencies,
|
|
kusano |
7d535a |
so it just compress the line without dependencies and generates compressed line {Out#1} to LZ4 compressed data buffer.
|
|
kusano |
7d535a |
After that, write {Out#1} to the file and forward ringbuffer offset.
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
Do the same things to second line (see (2)).
|
|
kusano |
7d535a |
But in this time, LZ4 can use dependency to Line#1 to improve compression ratio.
|
|
kusano |
7d535a |
This dependency is called "Prefix mode".
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
Eventually, we'll reach end of ringbuffer at Line#X (see (4)).
|
|
kusano |
7d535a |
This time, we should reset ringbuffer offset.
|
|
kusano |
7d535a |
After resetting, at Line#X+1 pointer is not adjacent, but LZ4 still maintain its memory.
|
|
kusano |
7d535a |
This is called "External Dictionary Mode".
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
In Line#X+2 (see (5)), finally LZ4 forget almost all memories but still remains Line#X+1.
|
|
kusano |
7d535a |
This is the same situation as Line#2.
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
Continue these procedure to the end of text file.
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
## How the decompression works
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
Decompression will do reverse order.
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
- Read compressed line from the file to buffer.
|
|
kusano |
7d535a |
- Decompress it to the ringbuffer.
|
|
kusano |
7d535a |
- Output decompressed plain text line to the file.
|
|
kusano |
7d535a |
- Forward ringbuffer offset. If offset exceedes end of the ringbuffer, reset it.
|
|
kusano |
7d535a |
|
|
kusano |
7d535a |
Continue these procedure to the end of the compressed file.
|