Hi im currently creating a brute password cracker that will allow the user to enter their password and the program will attempt to crack that password by cycling through all possible options. I would like to expand upon this so the user can select pre-determined option given to crack passwords for the RockYou database of passwords (Around 30 million).
Would it be better to compare the current guess the program is on to all the passwords or instead import all passwords into an array however the array size would be around 30 million elements so my questions is 2 fold which of the above 2 methods is better/more efficient and how you accomplish having 30 mil elements inside the array.
first try the database. Then if none of those worked, just brute force it. 30M repeats is basically zero in the grand scheme of possible combinations -- by the time you check to eliminate them from a duplicate check, you could have already tested it to see if it worked again, so its a wash.
I can't think of a clever way to eliminate double checking the database values that does not cost as much or more than just double checking.
when checking the database use some sort of O(1) retrieval system like a map so you don't have to search etc.
30 million items of 20 bytes each is still less than 1GB. It will fit in memory on modern machines.
In regards to the database it just a text file that contains the RockYou passwords, so would it be better to import all the password from the text file into the array, or instead when the program starts guessing eg: (aaaaa) it will compare that against all possible options inside the text file via nextline.
doesn't matter. Everything I said still holds true..
read the text file into a <map>, check all those, then kick off the brute force after.
Don't put them into an array unless you hash it yourself off the index somehow. Use map. Don't want to search the array for strings, this is terribly slow. You want to know instantly if its in the container. Map does this. Strings are annoying to convert to indices for DIY hashing, its not worth the trouble.