I was researching various crypto algorithms when I came across this paragraph on Wikipedia:
Furthermore, some recurring properties may be found in the ciphertexts generated by the first cipher. Since those ciphertexts are the plaintexts used by the second cipher, the second cipher may be rendered vulnerable to attacks based on known plaintext properties (see references below). This is the case when the first layer is a program P that always adds the same string S of characters at the beginning (or end) of all ciphertexts (commonly known as a magic number). [...] This string should be removed before adding a second layer.
My plaintexts are LZMA streams, which always start with "\xfd7zXZ". I'm using CBC mode with a random key and initialization vector, which are encrypted through RSA and prepended to the encrypted stream. It seems to me that this defeats the potential attack mentioned in that paragraph, since the encryptor starts in a randomized state and even encrypting the exact same plaintext with the same key produces different ciphertexts.
Am I correct?
Don't confuse encryption with decryption (or breaking the encryption).
In the case of the vulnerability, it doesn't matter how many different randomized states may exist when generating the ciphertext -- the only one that matters is the one state that contains your known plaintext.
When you are trying to break the cipher, you don't care how many different ciphertexts could have been produced by the encryption algorithm.
You only care about the instance of ciphertext in your possession.
The capacity to produce different ciphertexts does not increase encryption strength.
(Also, knowing that the same algorithm can be used to produce multiple different ciphertexts for the same plaintext means that several instances of ciphertexts over the same plaintext can be used to learn more about the encryption algorithm as well.)
Well, it's impossible to learn more about the algorithm, since I'm just using Twofish from Crypto++, and the particular way I'm using it is open source, as well.
I think the number of different ciphertexts that are produced from a single plaintext does matter, as far as a known plaintext attack is concerned. Such an attack, as I understand it, would involve trying to find bits of the key that were leaked into the ciphertext because of regularities in the plaintext. In CBC mode the first block is XORed by the initialization vector, so if you have a randomized IV for each separate plaintext, to me this would seem equivalent to having a random plaintext. Or, more accurately, a plaintext where the first 16 bytes are random.
This is of course assuming that the key and IV are secure in the RSA block, and that the randomness source for the IV is sufficiently high quality.
Disclaimer: I am not a security expert (I am presently implementing SHA1 so I am not completely uninformed).
It seems to me that this defeats the potential attack mentioned in that paragraph
With the preceding disclaimer: I think yes, your precautions do defeat the potential attack described above.
In CBC mode the first block is XORed by the initialization vector, so if you have a randomized IV for each separate plaintext, to me this would seem equivalent to having a random plaintext.
This makes a lot of sense, I think I agree completely. If I were you, I wouldn't worry any longer and declare the problem solved. In fact, in my opinion you may have even overdone it a little bit.
I understand that if you have a real reason to pursue such extraordinary security measures you wouldn't tell us about it, so I would speculate that you are simply making your system so secure to scratch your itch?
I just realized that this is a computer encryption thread, so we cannot continue without the mandatory xkcd reference:
Well, I treat it partly as a learning exercise, but there is a reason to be careful. This scheme will be used to encrypt backups to be uploaded to online servers, so I need to ensure both confidentiality and integrity.
Lately I've been reading about cases of crypto done wrong (e.g. KeePass ensures confidentiality but not integrity, so an attacker could silently alter it when if you upload it somewhere. It's also vulnerable to progressive corruption by hardware failure), so I want to make sure at least that I'm not making any obvious blunders.
Have you seen this? I used this SHA-256 implementation to add a type hash to my serialization project without adding complex dependencies. Seems like a good place to start if you're implementing. https://github.com/B-Con/crypto-algorithms
Thanks for the link! Certainly I will look at it for the other SHA algorithms (with SHA1 I am mostly done, so I guess I will stick to my implementation).
I am a bit worried on the endianness: I am making extra effort to be endian-agnostic (I don't use any bitwise operations at all). For example, to extract oe of the middle bytes in an uint32_t I use (x/256)%256 - with the same input this should produce the same result on both bigendian and littleendian architectures. Of course, it is slower but unless I'll be running bruteforce cracking I doubt I will ever need the extra speed.
Yes, they are performed between CPU registers, which don't have endianness, since endianness is a property of a byte-addressed memory layout, and CPUs (generally) don't have an internal address space of any kind.
You only need to watch out for endianness issues when feeding the input stream into the hash state. Avoiding, for example, things like *(std::uint32_t *)buffer.
By the way, if x is unsigned the compiler may optimize (x / 256) % 256 into (x >> 8) & 0xFF.