I want to know how, if there's any reasonable way, to take a 32 bit int and turn it into a character that will be counted as that number in a formatted file. I have done this before. I saved a long number as a char, and I then extracted that same number later. I want to know how I must've done it. I'm very very close to finishing a program and I need to know what it was I did. I tried to hire a programmer and he just gave me crap that it can't be done. I have seen it be done. I realize I'll be using char32_t. But if you can tell me how to input and output those, I'd be much appreciative. Thanks.
char d;
for (int x = 0 ; x < H.size(); ++x) {
m++;
longint b = H[x].to_ulong();
d = b;
out << d;
}
H is a vector of bitset<32>'s. I'm curious that I can do this, I wanted to some professional experience to validate my claim. This actually works as far as I can tell. I don't know if you knew that. But you can get maximum space relief with it in a compression scenario.
If you're concerned about space then <vector> is the wrong approach. std::vector allows (amortized) constant time insertion at the end by increasing the container capacity exponentially.
Therefore at most half of the allocated memory in the vector doesn't contain bitsets at any given time.
In your code, b should be unsigned. If H[x].to_ulong() evaluates to a value that's too big to fit in the signed integer, then the value of that signed integer is up to the implementation.
You also can't rely on longint (unsigned or not) being 32 bits wide. If you require that, use std::uint32_t from <cstdint> instead.
The conversion from longint to char in d = b; discards information.
As for whether or not this actually does what you intend, I can't tell since you never clarified anything. You are dropping the top bits of the value, which can be done with a simple assignment like on line 6.
Okay, you're right on this, you almost nailed what I was looking for with the things you said there, @mbozzi. an unsigned char, according to climits.h allows for 256 or greater size for chars. Do I have to use climits in my code to stop the discarding of information?
The left side of the asterisk is the read in value of the 'H' vector after output to file. The left (which apparently wants to be a bugger right now) is on the right of the asterisk. I know, that if I come out with the left, I'm going to be able to get the information back. That's my goal. I'm making a compression utility. Just a neat wrapper I made, isn't it? According to climits.h a char is not represented only in 8 bits. But it can be made of any number of bits.
A char is not necessarily 8 bits. It is necessarily one machine byte. The size of a char is a constant expression. The constant expression sizeof(char) evaluates to 1. The precise number of bits in a char can be obtained from the C macro constant CHAR_BIT.
Since your longint is 32 bits I feel like it is a reasonable assumption that CHAR_BIT is 8.
--
So where's the compression going on? What does the left side of the asterisk represent? You haven't really explained your problem.
Compression relies on the fact that in most messages there is less entropy than is usually apparent per the length of the message. This entropy is the information-theoretic kind, i.e., "Shannon entropy".
If you hope to compress data but not lose any information (called "lossless compression"), you must have (or obtain through analysis) information about the data you are compressing. The total entropy of a message represents the absolute limit on the size of the compressed message.
You can't fit 32 bits of message into 8 bits unless you know something about the 32 bits. Truncating the top 24 just won't work.
while (counter < length) {
int p = static_cast<int>(mybuffer[counter]);
counter++;
// Here we give the relative bit size
// 8 can be changed to whatever.
// you'll always end up with 1/2 + 1
longlong a = static_cast<longlong>(pow(2, MAX_BITS));
if (p < 0)
p = p * (-1);
totalCnt++;
do {
if (0 <= p - a/2 - a/4) {
insert_leaf(Hi, RS);
p = p - a/2 - a/4;
}
elseif (0 <= p - a/2) {
insert_leaf(Hi, LS);
p = p - a/2;
}
elseif (0 <= p - a/4) {
insert_leaf(Lo, LS);
p = p - a/4;
}
else
insert_leaf(Lo, RS);
a = a/4;
} while (p != 0);
}
That's my compressor. It looks like I'm taking 4 bits, but I'm really recording 8. It's just a strange way to count them (CONFIDENTIAL TASK). I just want a software version that does the same. I showed you the wrapper to get 32bit chars, in which I save 75% of my file size. the top wrapper just saves them into sequential bytes. I know it sounds lame, but I save 75% (somewhere in the 32 bit chars, as stated).