bit-shifting

Feb 24, 2009 at 3:34pm

Hey,

I wrote a little program that encrypts textfiles. It has the name "bitshift" and it accepts parameters from the command line (program itself,source text,destination text). An other program "unbitshift" decrypts the encrypted text-files, and that also accepts parameters from the commandline(unbitshift,encrypted text,destination text). But something goes wrong when I try to decrypt the text file: newline and other characters are no longer there...(but the text is good!)
This is the code of "bitshift":

#include <iostream>
#include <ios>
#include <fstream>
using namespace std;

int main(int argc,char* argv[])//enter in the commandline:bitshift mysource.txt mydestination.txt
{
    ifstream infile;infile.open(argv[1],ios::in | ios::binary);//first file
    ofstream outfile;outfile.open(argv[2],ios::out | ios::binary);//second file
    infile.seekg(0,ios::beg);
    int posB=infile.tellg();
    infile.seekg(0,ios::end);
    int posE=infile.tellg();
    int length=posE-posB;
    infile.seekg(0,ios::beg);
    outfile.seekp(0,ios::beg);
    char* a=new char[length];
    infile.read(a,length);
    bool currentBit;char currentByte;
    for(int i=0;i <length;i++)
         {currentByte=a[i];
          currentBit=(int)((unsigned char)currentByte/128);//test if the first bit is zero
          currentByte=currentByte<<1;//left-shifting          
          if(currentBit){currentByte+=1;}//if the first bit was zero,then this zero should be added at the end
          a[i]=currentByte;
         }
    outfile.write(a,length);
    return 0;
}

Edit & run on cpp.sh

This is "unbitshift" that decrypts:

#include <iostream>
#include <ios>
#include <fstream>
using namespace std;

int main(int argc,char* argv[])
{
    ifstream infile;infile.open(argv[1],ios::in | ios::binary);//first file
    ofstream outfile;outfile.open(argv[2],ios::out | ios::binary);//second file
    infile.seekg(0,ios::beg);
    int posB=infile.tellg();
    infile.seekg(0,ios::end);
    int posE=infile.tellg();
    int length=posE-posB;
    infile.seekg(0,ios::beg);
    outfile.seekp(0,ios::beg);
    char* a=new char[length];
    infile.read(a,length);
    bool currentBit;char currentByte;
    for(int i=0;i <length;i++)
         {currentByte=a[i];
          currentBit=(int)(currentByte/1);
          currentByte=currentByte>>1;          
          if(currentBit){currentByte+=128;}
          a[i]=currentByte;
         }
    outfile.write(a,length);
    return 0;
}

Edit & run on cpp.sh

Thanks in advance

Feb 24, 2009 at 6:09pm

kbw (9488)

In unbitshift, currentBit is incorrectly initialised. It's probably always set to 1, even when the low bit of a[i] is zero, resulting in the high bit always being set.

Feb 24, 2009 at 7:00pm

nicolasfranck (36)

No, it doesn't change a thing, as I expected. In this case currentBit doesn't have to be initialized: it derives its value from the expression "(int)(currentByte/1);"

Feb 24, 2009 at 11:33pm

kbw (9488)

Yes, but currentByte/1 == currentByte. It does not tell you whether the first bit is set or not.

Feb 25, 2009 at 12:38am

helios (17607)

For the first loop:

currentByte=a[i];
//currentBit=(int)((unsigned char)currentByte/128);//test if the first bit is zero
//Shifting is faster than dividing (actually, the compiler should optimize this
//expression automatically).
currentBit=currentByte>>7;
//currentByte=currentByte<<1;//left-shifting          
//if(currentBit){currentByte+=1;}//if the first bit was zero,then this zero should be added at the end
currentByte=(currentByte<<1)&currentBit;
a[i]=currentByte;

For the second loop:

currentByte=a[i];
//currentBit=(int)(currentByte/1);
currentBit=currentByte&1;
//currentByte=currentByte>>1;          
//if(currentBit){currentByte+=128;}
currentByte=(currentByte>>1)&(currentBit<<7);
a[i]=currentByte;

Feb 25, 2009 at 12:42am

kbw (9488)

For the first loop:
currentByte=(currentByte<<1) | currentBit;

For the second loop:
currentByte=(currentByte>>1) | (currentBit<<7);

You need ORs instead of ANDs. currentByte should be an unsigned char rather than char.

Last edited on Feb 25, 2009 at 12:44am

Feb 25, 2009 at 1:26am

helios (17607)

You need ORs instead of ANDs.

Oops.

The signedness of the type doesn't matter as long as it's not converted at any point to a bigger size.

Feb 25, 2009 at 1:30am

Ganellon (32)

    for(int i=0;i <length;i++)
         {currentByte=a[i];
          // currentBit=(int)(currentByte/1); this does nothing
          currentByte=currentByte>>1;
          if(currentByte<0) // changed
            currentByte+=128;
          a[i]=currentByte;
         }
    outfile.write(a,length);

Change that one line, and that should work for normal input.

It should be noted that this is not an especially secure form of encryption.

edit: the reason your program doesn't work is because right bitshift doesn't change the sign bit.
10001000 >> 1 = 11000100

letter T: 84: 0101 0100
left shift 1 = -88: 1010 1000
right shift 1 = -44?! 1101 0100
add 128 = 84 01010100

So the problem is, when you have chars that don't set the sign bit when you left shift, you are mistakenly setting the sign bit when you add 128. So your spaces and newlines are being converted to signed values.
letter <space>: 32: 0010 0000
left shift 1 = 64: 0100 0000 (sign bit not set)
right shift 1 = 32: 0010 0000
add 128 = -96: 1010 0000
= garbage characters in your output

Last edited on Feb 25, 2009 at 2:11am

Feb 25, 2009 at 8:15pm

nicolasfranck (36)

Your code works perfectly, Ganellon! Thanks!

What I don't understand, Ganellon: currentByte is right-shifted without knowing the bit-value that gets lost at the right side.

That is why I changed the code in "unbitshift" (although it gives the same result):

for(int i=0;i <length;i++)
         {currentByte=a[i];
          currentBit=currentByte & 1;
          currentByte=currentByte >>1;
          //changing the first value to zero!!=>-128!!
          if((currentBit==0)&&(currentByte<0)){currentByte=currentByte-128;}

          a[i]=currentByte;cout << currentByte;
         }

Something goes wrong in the code of kbw, I think:

1
2
3

currentBit=currentByte&1;
currentByte=(currentByte>>1)|(currentBit<<7);
a[i]=currentByte;

For example:
currentByte is 1000 0000
then is right-shifted by one: 1100 0000 (the sign bit is inserted from the left!)
1100 0000 | 0000 0000 becomes 1100 0000 (nothing changes!!!)

But thank you all for helping, it helped me a lot,

cheers

Last edited on Feb 25, 2009 at 8:25pm

Feb 25, 2009 at 9:26pm

helios (17607)

I would use kbw's code and change the type of currentByte to unsigned char. That code you're using there doesn't look very good (that if seems really out of place).

Last edited on Feb 25, 2009 at 9:27pm

Feb 25, 2009 at 11:13pm

Ganellon (32)

I would use kbw's code and change the type of currentByte to unsigned char. That code you're using there doesn't look very good (that if seems really out of place).

Helios is correct. I wasn't trying to fix your program -- just trying to show you where the fundamental error was. There is more work involved in really fixing it. KBW's solution is close, and more elegant, but doesn't tell the whole story; you will need to do a comparison, and I'll explain why.

If you have a character in a[i] that is in the extended ascii range (-1 through -128), the sign bit is set. So rather than explicitly checking that one particular bit, you can compare a[i] < 0. If true, then a[i] >> 7 = 1. They are equivalent. I suspect, however, that this:
currentBit = a[i] >> 7;
is significantly faster than this:
currentBit = (a[i] < 0) ? 1 : 0;
as Helios already pointed out.

So here's the rub. In your code, you are grabbing that sign bit, and looping around to the first bit, and storing it there like a pseudo-stack that you're going to pop off when you decrypt the text:

encrypt:

for(int i = 0; i < length; i++)
{
    currentByte = a[i];
    currentBit = (currentByte < 0); // true if sign bit, false if not
    currentByte = currentByte << 1;
    if(currentBit) // this will be true if extended ascii, false otherwise
      currentByte+=1; // store the fact of sign bit in 1's place
    a[i] = currentByte; // store the character into the array
}

Your current code is NOT checking whether the sign bit is already set after your right shift, so in order for your code to be 100%, you absolutely need to check the sign bit before doing anything else. If you stored a 1 in the 1's place when encrypting, that means the initial unencrypted value of currentByte was negative, i.e. in the extended ascii range. Therefore, your final unencrypted value must ALSO be < 0, so you can't just +=128. You first have to check to see if the sign bit was set by your bitshift left when you first encrypted, and you can do that by comparing to < 0 again, or by >> 7.See explanation in the code.

decrypt:

for(int i = 0; i<length; i++)
{
  currentByte = a[i];
  currentBit = currentByte & 1; // if the first bit is set, currentBit = true
  currentByte >> 1; // moves your bits back over, but doesn't change sign bit
  if(currentBit) // if true, then this char value was initially < 0
  {
    if(currentByte>=0) // value needs sign bit reset
      currentByte+=128; // replaces the sign bit
  }
  else // the initial value was not extended ascii, therefore > 0
  {
    if(currentByte<0) // it got set during bit shift left << in encryption, turn off  
      currentByte +=128; // turn off the sign bit
  }
  a[i] = currentByte;
}

The reason the original code I posted worked was because, as I said, for NORMAL input (keystrokes) you can probably not expect to find a[i] < 0. In the event you might encounter those characters, you need to use the code above. It's "written out" so you can see what each piece of the logic is examining, but can be simplified using logical operators. You can also turn the sign bit on and off with (XOR) ^-128, but I don't think you gain anything by it.

Hope this helps explain it better. Helios and kbw called it, though. kbw had it nailed in his first post.

Feb 26, 2009 at 7:14pm

nicolasfranck (36)

That's interesting. Now I understand the reason between signed and unsigned integer in terms of binary codes!
So when something is signed, then the right bit is also interpreted as the sign bit, and every right shift doesn't change the sign bit. So in this case 1000 0000 is not 128 but -128!
But if it is not signed, then the right bit is used to store higher values (unsigned int: 0...64000) and right shifting does change the right bit (to zero).
That's why it became so complicated. I just could not understand why a char with the value 127 could become negative when something was added to it.

I changed the chars to unsigned and every becomes more clear to me. Working with values from 0 to 255 is more logical when used in mathematical expression(127+2 doesn't become 129 but -127), and it follows the standard ascii-codes (instead of using -128 to -1 for the extended ascii) as given in many tables on the internet.

So my code could be much simpler with unsigned chars:
encrypt:

bool currentBit;unsigned char currentByte;
    for(int i=0;i <length;i++)
         {currentByte=a[i];
          currentBit=currentByte>>7;
          currentByte=(currentByte<<1)|(currentBit);
          a[i]=currentByte;
         }

decrypt:

bool currentBit;unsigned char currentByte;
    for(int i=0;i <length;i++)
         {currentByte = a[i];
          currentBit = currentByte & 1; 
          currentByte=(currentBit<<7)|(currentByte>>1);
          a[i] = currentByte;
         }

Thanks a lot you all, you really helped me!

PS. I tried it also on pictures, databases...and it works perfect!

Last edited on Feb 26, 2009 at 8:00pm

Feb 26, 2009 at 8:55pm

Ganellon (32)

Your code works! Good.
You learned something! Outstanding.

Things to do:
Notice how similar your encrypt and decrypt programs are. You can re-write your program into a single executable that accepts an additional "switch" command line argument.

crypter.exe -e sourcefile outputfile = encrypt
crypter.exe -d sourcefile outputfile = decrypt

In your program, you'd only need to check if -e or -d (set a bool), and change the cryptographic portion of your code (the loop) accordingly.

Afterward, think about how you can use your length value and/or loop counter to modify the content and/or placement of each character you're storing in your array. Since you aren't changing the byte count, you can use them reliably to modify values so that if your input is "AAA" you don't get "ééé" in your output file. (That's the kind of output that guarantees your code is broken in 10 seconds or less.) It doesn't need to be brilliant, but it could be something less obvious, and you've got some tools to work with already handy in your existing code.

Topic archived. No new replies allowed.