-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.orig
171 lines (154 loc) · 7.35 KB
/
README.orig
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
DESCRIPTION
-----------
This archive contains a simple and readable ANSI C library implementing LZSS
encoding and decoding. This implementation is not intended to be the best,
fastest, smallest, or any other performance related adjective.
Included in this library is a sample program demonstrating the usage of the
encode and decoding routines. The library has been designed so that it may
be linked with different sliding window search routines which can speed up
the compression time at the cost of additional memory space.
More information on LZSS encoding may be found at:
http://michael.dipperstein.com/lzss
http://www.datacompression.info/LZSS.shtml
FILES
-----
COPYING - Rules for copying and distributing LGPL software
bitfile.c - Library to allow bitwise reading and writing of files.
bitfile.h - Header for bitfile library.
brute.c - File implementing brute force search for strings matching the
strings to be encoded.
COPYING - Rules for copying and distributing GPL software
COPYING.LESSER - Rules for copying and distributing LGPL software
hash.c - File implementing hash table search for strings matching the
strings to be encoded.
kmp.c - File implementing the Knuth-Morris-Pratt string matching
algorithm to search for strings matching the strings to be
encoded.
list.c - File implementing linked list indexed search for strings
matching the strings to be encoded.
lzlocal.h - Header file defining interface to be used by files
implementing searches for strings matching strings to be
encoded.
lzdecode.c - LZSS decoding source
lzencode.c - LZSS encoding source
lzss.h - LZSS encoding/decoding header files
lzvars.c - Variables used by both lzencode and lzdecode
Makefile - makefile for this project (assumes gcc compiler and GNU make)
optlist.c - Source code for GetOptlist function and supporting functions
optlist.h - Header file to be included by code using the optlist library
README - this file
sample.c - Sample program demonstrating usage of encode and decode
routines.
tree.c - File implementing a sorted binary tree to index and search
for strings matching the strings to be encoded.
BUILDING
--------
To build these files with GNU make and gcc:
1. Edit the Makefile variable FMOBJ to choose the string matching technique
which will be used.
2. Windows users should define the environment variable OS to be Windows or
Windows_NT. This is often already done.
3. Enter the command "make" from the command line.
The sample programs comp and decomp are not built by default. To build these
programs on Unix/Linux use the commands "make comp" and "make decomp". Windows
users should use the commands "make comp.exe" and "make decomp.exe"
USAGE
-----
Usage: sample <options>
options:
-c : Encode input file to output file.
-d : Decode input file to output file.
-i <filename> : Name of input file.
-o <filename> : Name of output file.
-h|? : Print out command line options.
-c Performs LZSS style compression on specified input file (see -i)
writing the results to the specified output file (see -o).
-d Decompresses the specified input file (see -i) writing the results to
the specified output file (see -o). Only files compressed by this
program may be decompressed.
-i <filename> The name of the input file. There is no valid usage of this
program without a specified input file.
-o <filename> The name of the output file. There is no valid usage of this
program without a specified input file.
LIBRARY API
-----------
Encoding Data:
int EncodeLZSS(FILE *fpIn, FILE *fpOut);
fpIn
The file stream to be encoded. It must opened. NULL pointers will return
an error.
fpOut
The file stream receiving the encoded results. It must be opened. NULL
pointers will return an error.
Return Value
Zero for success, -1 for failure. Error type is contained in errno. Files
will remain open.
Decoding Data:
int DecodeLZSS(FILE *fpIn, FILE *fpOut);
fpIn
The file stream to be decoded. It must be opened. NULL pointers will
return an error.
fpOut
The file stream receiving the decoded results. It must be opened. NULL
pointers will return an error.
Return Value
Zero for success, -1 for failure. Error type is contained in errno. Files
will remain open.
HISTORY
-------
11/24/03 - Initial release
12/10/03 - Changed handling of sliding window to better match standard
algorithm description.
12/11/03 - Added version with linked lists to speed up encode.
12/12/03 - Added version with hash table to speed up encode.
02/21/04 - Major changes:
* Separated encode/decode, match finding, and main.
* Use bitfiles for reading/writing files
* Use traditional LZSS encoding where the coded/uncoded bits
precede the symbol they are associated with, rather than
aggregating the bits.
11/08/04 - Major changes:
* Split encode and decode routines for to allow for separate
linking (see comp.c and decomp.c)
* Makefile now builds code as libraries. This should make LGPL
compliance a bit easier.
* Upgraded to latest bitfile library.
* Add the option to pass compression/decompression routines file
pointers instead of file names.
06/21/05 - Corrected BitFileGetBits/PutBits error that accessed an extra
byte when given an integral number of bytes.
12/27/05 - Major changes:
* Use slower but clearer Get/PutBitsInt for reading/writing bits.
* Replace mod with conditional Wrap macro.
* Allow hash table to change size with dictionary.
12/25/06 - Corrected bug in allocation of default decode output.
- Minor comment corrections
03/24/07 - Closes output file in EncodeLZSSByName and DecodeLZSSByName
08/30/07 - Explicitly licensed under LGPL version 3.
- Replaces getopt() with optlist library.
02/11/10 - Added code that uses the Knuth-Morris-Pratt string matching
algorithm to search for string matches during the encoding process.
08/16/10 - Added code that uses a binary tree to sort strings as they are
added to the sliding window and to search for string matches during
the encoding process.
11/30/14 - Corrected binary tree code for adding a new character into the
sliding window dictionary.
- Changed return value to 0 for success and -1 for failure with
reason in errno.
- Upgraded to latest oplist and bitfile libraries.
- Tighter adherence to Michael Barr's "Top 10 Bug-Killing Coding
Standard Rules" (http://www.barrgroup.com/webinars/10rules).
TODO
----
- Experiment with string matching techniques and data structures
- suffix trees
- suffix arrays
- multi-byte hash keys
- Boyer-Moore
- hash/binary tree combo, using one tree for each hash key
- Use a lazy encoding process.
- Defer writing a code word until it is determined that a longer match
isn't formed by adding the next symbols from the unecoded stream.
AUTHOR
------
Michael Dipperstein ([email protected])