Has anyone in here ever tried to make a codebreaker from scratch?
I'm thinking about trying to build a codebreaker to crack a Vigenere (stright alphebetic substitution using a keyword) cipher. The things to check would be frequency of letters, frequency of letter pairs, that sort of thing. Ideally the user would be able to see both the ciphertext and plaintext during and after decrypting at the same time (split screen?) Of course the program would spit out the keyword at the end of the analysis if it works. i would like to do this without using any sort of a dictionary helper function if possible.
I would be interested to hear about your frustrations/successes if you have tried something like this.
I have, it's a great way to practice I\O and arrays. The only problem with what you want to do is that you will never likley find a perfect way to break even single alphabet ciphers using your own formulas so a dictionary brute force, using assumptions based on the data you've gathered with the frequency analysis, is required at some level; enter the dictionary. Or is that not what you meant by dictionary helper?
What I had in mind is that I don't want to use a dictionary to validate words, eg: the most common 3 letter word in a sufficiently large ciphertext would be 'the'. I would rather do it by finding letter pairs such as 'qu'. The thinking being that if you find a character that is only ever followed by another character you can make some assumptions, such as q is always followed by u.
Outside the topic though, I found some info on the Ubuntu Programming forums where a challenge was issued for a Vigenere program.
There were 16 languages represented in the submissions, from Ada to Cobol etc... but no C++.
Another page (http://pajhome.org.uk/crypt/vigenere.html) suggests "using the method of coincidence indexes" that looks interesting. The more I read the more I want to do this!
That would be fine, even necessary with words like 'the' or 'is'. But once you have those figured out you have 5 out of 26 letters figured out, that's if your "opponent" doesn't use characters to delimit sentances instead of '.'.
This is doable but at some point your program is only a program and you have to hard code a brute force attempt at some words. It's like this
- Your program finds that 'q' is the most common letter found in the large text, 'r' is second and 'p' is third.
- You know that the text is in english so you can assume that they are 'q' = 'e', 'r' = 't', 'p' = 'o'.
- The word you are deciphering is displayed as 'rgqmq'. You insert the letter you know so you have 'tgeme'. Off the top of my head this could be 'there' or 'theme' etc.
- Your program should be able to match these known characters to a list of words and use that. Example cont.: You assume that 'g'='h', 'm'='r' so the word becomes 'there' if you then assign all of the 'g's to 'h's and you later find a word in the text that isn't in your dictionary then you might rule that possibility out.
- I was never able to fully automate this. I always needed a human to check the text afterwords.