Skip to content

Comparitive study of Lossless data compression algorithms, Run Length Encoding and Dictionary Coder LZW algorithm. Also generated multiple datasets to perfrom stress testing and comparitive analysis.

Notifications You must be signed in to change notification settings

raj-jaiswal-98/data-compression-algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

data-compression-algorithms

data_simple.txt - This dataset contains letters and alphabets with consecutive repeatedness. Example: aaaaabbbbbcccc

data_random.txt - This dataset includes letters and alphabets generated randomly and with an equal probability of occurrence for each character. Example: svdabmdo23

Along with data, {abcdefghijklmnopqrstuvwxyz0123456789} is the set of characters I considered while creating the datasets. Both of the datasets contain 10 million characters.

Firstly, the results for data_simple.txt

Compression speed (MB/s) Decompression speed (MB/s) Compression Ratio Memory Usage during Compression (KB) Memory Usage during De-compression (KB)
Run length encoding algorithm 3175.75 180.429 259.337 0 0
Dictionary Coder LZW algorithm 4.49374 37.9599 247.299 2529.56 2529.5

Observing the values from the above table, it becomes clear that the run-length encoding algorithm performs better than the Dictionary coder by a large margin. It has far better (de)compression speeds, better compression ratio and almost 0 memory usage (since it does not store any dictionary or data during runtime apart from the given data). Run length encoding algorithm is able to perform better because of the fact that there is consecutive relatedness in data and the run length algorithm is designed for this very fact.

However, results turn upside down when there are randomised data patterns and consecutive repeatedness is removed.

Consider the following results from data_random.txt

Compression speed (MB/s) Decompression speed (MB/s) Compression Ratio Memory Usage during Compression (KB) Memory Usage during De-compression (KB)
Run length encoding algorithm 176.993 8552.4 0.514291 0 0
Dictionary Coder LZW algorithm 2.96384 82.4991 3.49742 178705 178704

Observing the table, it is evident that the dictionary coder algorithm is slower than run-length encoding but performs compression tasks significantly better than the run-length encoding algorithm, as apparent by compression ratios. The run-length algorithm fails to compress the data, whereas the dictionary coder algorithm does it seamlessly.

Thus, the run-length encoding algorithm works better when there is consecutive repeatedness in data but fails to work on data where characters are randomly present without any visible pattern. It is generally used to compress images, audio, and video.

However, the dictionary coder-based LZW algorithm works in either case; it is slower than the run-length algorithm but effective in compressing the data. Dictionary-based algorithms are more reliable and were implemented in Unix systems.

About

Comparitive study of Lossless data compression algorithms, Run Length Encoding and Dictionary Coder LZW algorithm. Also generated multiple datasets to perfrom stress testing and comparitive analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages