I'm currently taking a beginners C++ class and this program has been throwing me for a loop. I've completed a section of code but am clueless on what to do next. I haven't had the best of teachers and have pretty much used the book we are using to get this far. Any help is appreciated. I will post the code I've completed so far on the next post. Below is the program requirements and purpose.
PURPOSE:
A ebook text files are to be compressed into a small, portable file format with the following characteristics.
1) The file should be compressed in such a way that it need not be entirely decompressed in order to be used. Unlike a ZIP or other general-purpose compression algorithm, this one will compress the data without inter-dependency of the bytes.
2) The compressed file should be searchable and displayable as-is, without a decompression phase. This allows very large files to be displayed and searched without storing the entire file in memory.
APPROACH:
The compression technique to be used will be based on a word-to-byte look-up table consisting of a one-byte encoding for the (approximately) 255 most common words in the text, with the 256th byte value representing an ‘escape’ character indicating that the following characters, up to the next delimiter, are not encoded. Following is a tiny example, where the notation <153> represents a single byte containing the value 153 decimal.
Given some text file, an encoding table of the most commonly occurring words might be constructed like:
<256> the
<255> a
<254> an
... of
<240>
... this
<200>
<199> is
... interesting
<169>
... This means ‘following text is uncompressed’
<0>
etc.
A sentence fragment such as the following:
...the interesting part of a problem like this is the use of an
algorithmic...
could be encoded as (spaces included below only for readability):
<256> <169> <0> part <240> <255> <0> problem <0> like <200> <199> <256> <0> use <240> <254> <0> algorithmic
The uncompressed text is about 72 characters (bytes), whereas the compressed version would require 43 bytes.
_____________________________________________________________________________
Other requirements:
1. Execution- Console program, accepts file name ending in TXT or
CMP; with no parameter, or other extension, prints USAGE explanation
2. Execution- If the TXT or CMP file named does not exist, print ‘File
not found’ error message OR if the .KEY file matching the .CMP does not
exist, print ‘Matching KEY file does not exist’.
3. Execution- Same TXT behavior; If the filename has CMP extension,
the program will open the CMP and KEY files, and create a TXT file
which is an exact copy of the original TXT file.
10
4. Execution- Compression of TXT file is based on # occurrences X
length of word, with approximately the 255 most frequent product
results words represented by a single byte. Capitalization, implied
spaces, other considerations are left to the programmer’s discretion.
5. Execution- In either mode, after compression or decompression, the
program will display:
Uncompressed bytes: 9999 Compressed bytes: 9999 99%
where the % value (1 or 2 digits) is compressed / uncompressed bytes