huffman decoder

updated:
----------
i came up with a code to create the binary tree for my key (huffman encoder).

.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
namespace file_info {
	class scanner {
	private:
		std::ifstream _scanner_file;
		std::stringstream _buffer;
	public:
		void set_file_copy(std::string& file_loc);
		std::string get_file_copy() const;
	};
}

namespace decoder {
	class Node {
	public:
		char _character;
		Node *_left, *_right;

		Node() { _left = _right = NULL; }
	};

	class decompressor {
	private:
		file_info::scanner _decode_key;
		std::string _file_type;
		std::string buffer;
		std::ofstream _decode_file;
		std::ifstream _encode_file;
		Node* _root = new Node;
		Node* _root_cpy = _root;
	public:
		void read_key(std::string& decoding_key_loc);
		void build_decoding_tree(Node *&root, int index);
		void test(std::string& encoded_file_loc);
		void print(Node* root, unsigned k);
	};
}


.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
void file_info::scanner::set_file_copy(std::string& file_loc) {
	_scanner_file.open(file_loc, std::ios::in);
	if (_scanner_file.is_open()) {
		_buffer << _scanner_file.rdbuf();
		_scanner_file.close();
	}
	else std::cout << "Unable to open the file for copy operation." << std::endl;
}

std::string file_info::scanner::get_file_copy() const {
	return _buffer.str();
}

void decoder::decompressor::read_key(std::string& decoding_key_loc) {
	_decode_key.set_file_copy(decoding_key_loc);
	buffer = _decode_key.get_file_copy();
	build_decoding_tree(_root, 0);
	print(_root, 0);
	std::cout << std::endl;
	std::cout << buffer << std::endl;
}

void decoder::decompressor::build_decoding_tree(Node *&root, int index) {
	if (buffer[index] == '0') {
		if (root->_left != NULL)
			build_decoding_tree(root->_left, index);
		else {	
			root->_left = new Node;
			++index;
			build_decoding_tree(root->_left, index);
		}
	}
	else if (buffer[index] == '1') {
		if (root->_right != NULL)
			build_decoding_tree(root->_right, index);
		else {
			root->_right = new Node; 
			++index;
			build_decoding_tree(root->_right, index);
		}
	}
	else if (buffer[index] == 32) {
		++index;
		if ((buffer[index + 1] == '0') || (buffer[index + 1 ] == '1')) {
			root->_character = buffer[index];
			_root = _root_cpy;
			++index;
			build_decoding_tree(_root, index);
		}
		else
			_file_type = buffer.substr(index);
	}
}


void decoder::decompressor::test(std::string& encoded_file_loc) {
	Node *p = _root;
	char byte;
	_encode_file.open(encoded_file_loc, std::ios::in | std::ios::binary);
	byte = _encode_file.get();
	for (int count = 0; count < 8; count++) {
		bool b = byte & (1 << (7 - count));
		if (b)
			p = p->_right;
		else
			p = p->_left;
		if (p->_character) { std::cout << p->_character; p = _root; }
	}
}

void decoder::decompressor::print(Node* root, unsigned k) {
	if (root != NULL) {
		print(root->_left, k + 3);
		for (unsigned i = 0; i < k; i++) {
			std::cout << " ";
		}
		if (root->_character) { std::cout << "-" << "(" << root->_character << ")" << std::endl; }
		else { std::cout << "-" << std::endl; }
		print(root->_right, k + 3);
	}
}


main.cpp
1
2
3
4
5
6
decoder::decompressor ts;
std::string t = "C:/Users/User/Desktop/Key.txt";
std::string enc = "C:/Users/User/Desktop/encoded.bin";

ts.read_key(t);
ts.test(enc);


the encoded.bin contains the following:
כּׂF¯ֻn$‹ׁx§ֿב׃;d¶ײ‚

the key.txt contains:
00010 010
110 11100 !00011 c0110 d0111 e00100 g11101 h1000 i1001 l00101 m11110 n1010 o11111 r1011 s0000 t00110 w00111 y.txt

if you replicate the following code with two files containing the above and run it it will say that the "programname.exe has stopped working" but if you remove the ts.test(enc) it will work.
the function print shows the binary tree that was created for the decoding process and the problem is that i cant seem to find the problem i spent many hours trying to fix this section but to no success i assume that the binary tree is not generated correctly although the print function shows that the tree formed correctly and the test function doesn't work although in theory it should as it is very simple. (note that in the test function i tried changing the if(root->_character) to if((root->_left==NULL)&&(root->_right == NULL)) but the problem remains.
Last edited on
-the first  character value is -1 (ascii).

There are no negative numbers in the ASCII standard. All the values must be between 0 and 127, if you have any other value then you are not dealing with an ASCII value. You seem to be using "ASCII value" when you really mean the "decimal value" there is a difference between these two terms.

As I told you in your other post that strange character has a value of over 127 and that character appears to be consisting of multiple bytes.

Don't forget that when you overflow a signed integral value (a signed char in this case) you invoke Undefined Behavior.



1)if you use a static_cast<int> on a char shouldn't it show you its ascii code in decimal ?
from the ascii table all of the other characters code appear there i.e. space/new line/'1'/'0'.
2)i can read the  character with no problem into a map<char,vector<bool>> that i thought solved my problem in the last post but as i said using the map to decode leads to ambiguity so my question is how can i spot the strange behavior if it happens ?
3)aside the  character can you recommend on a way to create a binary tree for decoding with the key ?

i hope it doesn't seem too demanding/aggressive (my English isn't that good).
1)if you use a static_cast<int> on a char shouldn't it show you its ascii code in decimal ?
from the ascii table all of the other characters code appear there i.e. space/new line/'1'/'0'.

That cast is happening after you have already invoked Undefined Behavior and once you invoke UB anything can happen.

2)i can read the  character with no problem into a map<char,vector<bool>> that i thought solved my problem in the last post but as i said using the map to decode leads to ambiguity so my question is how can i spot the strange behavior if it happens ?

No, that's the point I am making, you can't read that stupid "character" into a map<char, anything> because that "character" has decimal value larger than what a signed char can hold. A signed char can only hold decimal values up to and including decimal 127, anything larger than this will overflow the char invoking Undefined Behavior. Try switching to an unsigned char and see what you get.

i will say again as i said many times in the previous post the  character is not the problem i have and i am not interested in solving it at the moment at all i am more concerned about how to store it back into a binary tree
Last edited on
Good luck.
Topic archived. No new replies allowed.