Skip to content

Commit

Permalink
Initial tests with 2.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
flanglet authored May 11, 2024
1 parent c84e2dc commit fd98764
Showing 1 changed file with 47 additions and 44 deletions.
91 changes: 47 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ Kanzi is a modern, modular, expandable and efficient lossless data compressor im
* expandable: clean design with heavy use of interfaces as contracts makes integrating and expanding the code easy. No dependencies.
* efficient: the code is optimized for efficiency (trade-off between compression ratio and speed).

Unlike the most common lossless data compressors, Kanzi uses a variety of different compression algorithms and supports a wider range of compression ratios as a result. Most usual compressors do not take advantage of the many cores and threads available on modern CPUs (what a waste!). Kanzi is concurrent by design and uses several go routines by default to compress blocks concurrently. It is not compatible with standard compression formats. Kanzi is a lossless data compressor, not an archiver. It uses checksums (optional but recommended) to validate data integrity but does not have a mechanism for data recovery. It also lacks data deduplication across files.
Unlike the most common lossless data compressors, Kanzi uses a variety of different compression algorithms and supports a wider range of compression ratios as a result. Most usual compressors do not take advantage of the many cores and threads available on modern CPUs (what a waste!). Kanzi is concurrent by design and uses threads to compress several blocks in parallel. It is not compatible with standard compression formats.

Kanzi is a lossless data compressor, not an archiver. It uses checksums (optional but recommended) to validate data integrity but does not have a mechanism for data recovery. It also lacks data deduplication across files. However, Kanzi generates a bitstream that is seekable (one or several consecutive blocks can be decompressed without the need for the whole bitstream to be decompressed).

For more details, check https://github.com/flanglet/kanzi-go/wiki.

Expand All @@ -25,6 +27,7 @@ There is Java implementation available here: https://github.com/flanglet/kanzi
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=flanglet_kanzi-go&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=flanglet_kanzi-go)
[![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=flanglet_kanzi-go&metric=ncloc)](https://sonarcloud.io/summary/new_code?id=flanglet_kanzi-go)
[![Documentation](https://godoc.org/github.com/flanglet/kanzi-go?status.svg)](http://godoc.org/github.com/flanglet/kanzi-go/v2)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)


## Why Kanzi
Expand All @@ -33,7 +36,7 @@ There are many excellent, open-source lossless data compressors available alread

If gzip is starting to show its age, zstd and brotli are open-source, standardized and used
daily by millions of people. Zstd is incredibly fast and probably the best choice in many cases.
There are a few scenarios where Kanzi could be a better choice:
There are a few scenarios where Kanzi czn be a better choice:

- gzip, lzma, brotli, zstd are all LZ based. It means that they can reach certain compression
ratios only. Kanzi also makes use of BWT and CM which can compress beyond what LZ can do.
Expand All @@ -48,28 +51,28 @@ at compression time to better compress specific kinds of data.

- Kanzi can take advantage of the multiple cores of a modern CPU to improve performance

- It is easy to implement a new transform or entropy codec to either test an idea or improve
compression ratio on specific kinds of data.
- Implementing a new transform or entropy codec (to either test an idea or improve compression ratio on specific kinds of data) is simple.



## Benchmarks

Test machine:

AWS c5a8xlarge: AMD EPYC 7R32 (32 vCPUs), 64 GB RAM

go 1.21.3
go 1.21.10

Ubuntu 24.04 LTS

Ubuntu 22.04.3 LTS
Kanzi version 2.3.0

Kanzi version 2.2
On this machine, Kanzi uses up to 16 threads (half of CPUs by default).

bzip3 uses 16 threads. zstd uses 16 threads for compression and 1 for decompression,
other compressors are single threaded.

The default block size at level 9 is 32MB, severely limiting the number of threads
in use, especially with enwik8, but all tests are performed with default values.

On this machine, Kanzi can use up to 16 threads depending on compression level
(the default block size at level 9 is 32MB, severly limiting the number of threads
in use, especially with enwik8, but all tests are performed with default values).
bzip3 uses 16 threads. zstd can use 2 for compression, other compressors
are single threaded.


### silesia.tar
Expand All @@ -79,30 +82,31 @@ Download at http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip
| Compressor | Encoding (sec) | Decoding (sec) | Size |
|---------------------------------|-----------------|-----------------|------------------|
|Original | | | 211,957,760 |
|s2 -cpu 16 | 0.494 | 0.868 | 86,650,932 |
|**Kanzi -l 1** | **0.683** | **0.255** | **80,284,705** |
|s2 -cpu 16 -better | 1.517 | 0.868 | 79,555,929 |
|Zstd 1.5.5 -2 | 0.761 | 0.286 | 69,590,245 |
|**Kanzi -l 2** | **0.707** | **0.302** | **68,231,498** |
|s2 -cpu 16 | 0.494 | 0.868 | 86,650,932 |
|**Kanzi -l 1** | **0.504** | **0.366** | **80,277,212** |
|Lz4 1.9.5 -4 | 0.321 | 0.330 | 79,912,419 |
|s2 -cpu 16 -better | 1.517 | 0.868 | 79,555,929 |
|Zstd 1.5.6 -2 -T16 | 0.151 | 0.271 | 69,556,157 |
|**Kanzi -l 2** | **0.533** | **0.263** | **68,195,845** |
|Brotli 1.1.0 -2 | 1.749 | 2.459 | 68,044,145 |
|Gzip 1.10 -9 | 20.15 | 1.316 | 67,652,229 |
|**Kanzi -l 3** | **1.204** | **0.368** | **64,916,444** |
|Zstd 1.5.5 -5 | 2.003 | 0.324 | 63,103,408 |
|**Kanzi -l 4** | **1.272** | **0.681** | **60,770,201** |
|Zstd 1.5.5 -9 | 4.166 | 0.282 | 59,444,065 |
|Gzip 1.12 -9 | 20.09 | 1.403 | 67,652,449 |
|**Kanzi -l 3** | **1.057** | **0.359** | **65,613,695** |
|Zstd 1.5.6 -5 -T16 | 0.356 | 0.289 | 63,131,656 |
|**Kanzi -l 4** | **1.125** | **0.519** | **61,249,959** |
|Zstd 1.5.5 -9 -T16 | 0.690 | 0.278 | 59,429,335 |
|Brotli 1.1.0 -6 | 14.53 | 4.263 | 58,552,177 |
|Zstd 1.5.5 -13 | 19.15 | 0.276 | 58,061,115 |
|Zstd 1.5.6 -13 -T16 | 3.244 | 0.272 | 58,041,112 |
|Brotli 1.1.0 -9 | 70.07 | 7.149 | 56,408,353 |
|Bzip2 1.0.8 -9 | 16.94 | 6.734 | 54,572,500 |
|**Kanzi -l 5** | **2.355** | **1.055** | **54,051,139** |
|Zstd 1.5.5 -19 | 92.82 | 0.302 | 52,989,654 |
|**Kanzi -l 6** | **3.414** | **2.235** | **49,517,823** |
|Lzma 5.2.5 -9 | 92.6 | 3.075 | 48,744,632 |
|**Kanzi -l 7** | **4.387** | **3.098** | **47,308,484** |
|**Kanzi -l 5** | **2.161** | **1.043** | **54,039,773** |
|Zstd 1.5.6 -19 -T16 | 20.87 | 0.303 | 52,889,925 |
|**Kanzi -l 6** | **2.779** | **2.056** | **49,567,817** |
|Lzma 5.4.5 -9 | 95.97 | 3.172 | 48,745,354 |
|**Kanzi -l 7** | **3.738** | **2.888** | **47,520,629** |
|bzip3 1.3.2.r4-gb2d61e8 -j 16 | 2.682 | 3.221 | 47,237,088 |
|**Kanzi -l 8** | **19.64** | **21.33** | **43,247,248** |
|**Kanzi -l 9** | **42.41** | **48.37** | **41,807,179** |
|zpaq 7.15 -m5 -t16 | 213.8 | 213.8 | 40,050,429 |
|**Kanzi -l 8** | **19.47** | **20.01** | **43,167,429** |
|**Kanzi -l 9** | **45.17** | **45.55** | **41,497,835** |
|zpaq 7.15 -m5 -t16 | 213.8 | 213.8 | 40,050,429 |



Expand All @@ -113,24 +117,23 @@ Download at https://mattmahoney.net/dc/enwik8.zip
| Compressor | Encoding (sec) | Decoding (sec) | Size |
|------------------------|------------------|------------------|------------------|
|Original | | | 100,000,000 |
|**Kanzi -l 1** | **0.465** | **0.171** | **43,747,730** |
|**Kanzi -l 2** | **0.481** | **0.196** | **37,745,093** |
|**Kanzi -l 3** | **0.761** | **0.301** | **33,839,184** |
|**Kanzi -l 4** | **0.764** | **0.472** | **29,598,635** |
|**Kanzi -l 5** | **0.896** | **0.494** | **26,527,955** |
|**Kanzi -l 6** | **1.433** | **1.104** | **24,076,669** |
|**Kanzi -l 7** | **3.093** | **2.277** | **22,817,376** |
|**Kanzi -l 8** | **13.63** | **13.17** | **21,181,978** |
|**Kanzi -l 9** | **32.44** | **34.11** | **20,035,133** |

|**Kanzi -l 1** | **0.425** | **0.149** | **43,746,017** |
|**Kanzi -l 2** | **0.432** | **0.179** | **37,816,913** |
|**Kanzi -l 3** | **0.683** | **0.245** | **33,865,383** |
|**Kanzi -l 4** | **0.621** | **0.365** | **29,597,577** |
|**Kanzi -l 5** | **0.808** | **0.437** | **26,528,023** |
|**Kanzi -l 6** | **1.212** | **0.916** | **24,076,674** |
|**Kanzi -l 7** | **2.321** | **2.755** | **22,817,373** |
|**Kanzi -l 8** | **12.52** | **12.27** | **21,181,983** |
|**Kanzi -l 9** | **32.24** | **32.27** | **20,035,138** |



# Build

Using formal releases is recommended (see https://github.com/flanglet/kanzi-go/releases).
```
go install github.com/flanglet/kanzi-go/v2/app@v2.2.0
go install github.com/flanglet/kanzi-go/v2/app@v2.3.0
```

Otherwise, to build manually from the latest tag, follow the instructions below:
Expand Down

0 comments on commit fd98764

Please sign in to comment.