data_simple.txt - This dataset contains letters and alphabets with consecutive repeatedness. Example: aaaaabbbbbcccc
data_random.txt - This dataset includes letters and alphabets generated randomly and with an equal probability of occurrence for each character. Example: svdabmdo23
Along with data, {abcdefghijklmnopqrstuvwxyz0123456789} is the set of characters I considered while creating the datasets. Both of the datasets contain 10 million characters.
Firstly, the results for data_simple.txt
Compression speed (MB/s) | Decompression speed (MB/s) | Compression Ratio | Memory Usage during Compression (KB) | Memory Usage during De-compression (KB) | |
---|---|---|---|---|---|
Run length encoding algorithm | 3175.75 | 180.429 | 259.337 | 0 | 0 |
Dictionary Coder LZW algorithm | 4.49374 | 37.9599 | 247.299 | 2529.56 | 2529.5 |
Observing the values from the above table, it becomes clear that the run-length encoding algorithm performs better than the Dictionary coder by a large margin. It has far better (de)compression speeds, better compression ratio and almost 0 memory usage (since it does not store any dictionary or data during runtime apart from the given data). Run length encoding algorithm is able to perform better because of the fact that there is consecutive relatedness in data and the run length algorithm is designed for this very fact.
However, results turn upside down when there are randomised data patterns and consecutive repeatedness is removed.
Consider the following results from data_random.txt
Compression speed (MB/s) | Decompression speed (MB/s) | Compression Ratio | Memory Usage during Compression (KB) | Memory Usage during De-compression (KB) | |
---|---|---|---|---|---|
Run length encoding algorithm | 176.993 | 8552.4 | 0.514291 | 0 | 0 |
Dictionary Coder LZW algorithm | 2.96384 | 82.4991 | 3.49742 | 178705 | 178704 |
Observing the table, it is evident that the dictionary coder algorithm is slower than run-length encoding but performs compression tasks significantly better than the run-length encoding algorithm, as apparent by compression ratios. The run-length algorithm fails to compress the data, whereas the dictionary coder algorithm does it seamlessly.
Thus, the run-length encoding algorithm works better when there is consecutive repeatedness in data but fails to work on data where characters are randomly present without any visible pattern. It is generally used to compress images, audio, and video.
However, the dictionary coder-based LZW algorithm works in either case; it is slower than the run-length algorithm but effective in compressing the data. Dictionary-based algorithms are more reliable and were implemented in Unix systems.