data-compression-algorithms

data_simple.txt - This dataset contains letters and alphabets with consecutive repeatedness. Example: aaaaabbbbbcccc

data_random.txt - This dataset includes letters and alphabets generated randomly and with an equal probability of occurrence for each character. Example: svdabmdo23

Along with data, {abcdefghijklmnopqrstuvwxyz0123456789} is the set of characters I considered while creating the datasets. Both of the datasets contain 10 million characters.

Firstly, the results for data_simple.txt

	Compression speed (MB/s)	Decompression speed (MB/s)	Compression Ratio	Memory Usage during Compression (KB)	Memory Usage during De-compression (KB)
Run length encoding algorithm	3175.75	180.429	259.337	0	0
Dictionary Coder LZW algorithm	4.49374	37.9599	247.299	2529.56	2529.5

Observing the values from the above table, it becomes clear that the run-length encoding algorithm performs better than the Dictionary coder by a large margin. It has far better (de)compression speeds, better compression ratio and almost 0 memory usage (since it does not store any dictionary or data during runtime apart from the given data). Run length encoding algorithm is able to perform better because of the fact that there is consecutive relatedness in data and the run length algorithm is designed for this very fact.

However, results turn upside down when there are randomised data patterns and consecutive repeatedness is removed.

Consider the following results from data_random.txt

	Compression speed (MB/s)	Decompression speed (MB/s)	Compression Ratio	Memory Usage during Compression (KB)	Memory Usage during De-compression (KB)
Run length encoding algorithm	176.993	8552.4	0.514291	0	0
Dictionary Coder LZW algorithm	2.96384	82.4991	3.49742	178705	178704

Observing the table, it is evident that the dictionary coder algorithm is slower than run-length encoding but performs compression tasks significantly better than the run-length encoding algorithm, as apparent by compression ratios. The run-length algorithm fails to compress the data, whereas the dictionary coder algorithm does it seamlessly.

Thus, the run-length encoding algorithm works better when there is consecutive repeatedness in data but fails to work on data where characters are randomly present without any visible pattern. It is generally used to compress images, audio, and video.

However, the dictionary coder-based LZW algorithm works in either case; it is slower than the run-length algorithm but effective in compressing the data. Dictionary-based algorithms are more reliable and were implemented in Unix systems.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
README.md		README.md
lossless compression.cpp		lossless compression.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-compression-algorithms

About

Releases

Packages

Languages

raj-jaiswal-98/data-compression-algorithms

Folders and files

Latest commit

History

Repository files navigation

data-compression-algorithms

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages