Generating randomized values within a given range with no redundancy.

Apr 21, 2009 at 6:20pm
I'm working on a project right now which processes a lot of data. The gist of it is that you give it a string of hashed text, and it generates random text which is then hashed and compared to the original string until a match is found. However, this is very inefficient, because the random number generator creates a lot of redundant values, and the redundancy increases exponentially as the key space is further exhausted. I just realized that if I were able to remove the redundancy of the RNG, the cracking times of these hashes could be dramatically reduced.

So now the problem is how to go about doing so. Storing every used number in an array and comparing new values to the used ones is unfeasible. The amount of memory required to store so much data just doesn't exist, and the lookup times of such a table would negate any benefit by several fold.

A sequential search of the keyspace is infeasible as well; it completely remove the redundancy, but it would also make keys at the end of the keyspace require much more time to crack than keys at the beginning. Randomization has to be a part of the algorithm.

Anyone have any ideas? This is certainly an interesting problem, I'd love to see what sort of solutions you can throw at me. (I don't necessarily need code, just an idea of how a professional programmer would go about doing this.) Thanks for any help you can offer, I greatly appreciate it.
Last edited on Apr 21, 2009 at 6:23pm
Apr 21, 2009 at 6:30pm
What makes you think it is a solvable problem?
Apr 21, 2009 at 6:37pm
Everything is a solvable problem. I'm confident I could find a solution to this problem on my own as well, but it will take me a lot less time if I can get some ideas from other people first.

Last edited on Apr 21, 2009 at 6:38pm
Apr 21, 2009 at 6:41pm
I'm a little confused.

Why are you generating a random string looking for a match? If the goal is to select a random item, just pick a random number between [0,n-1] (where 'n' is the total number of items), and extract the item that way.
Apr 21, 2009 at 6:45pm
I'm confused too.

The runtime efficiency of the algorithm is not affected by how you search.

Furthermore, one-way hash functions are by definition supposed to be written such that it is computationally infeasible to find a string S' which hashes to the same value as the string S. This is one of the foundations of computer security.
Apr 21, 2009 at 6:52pm
I'm generating a random string because I need a random string. The key space in which I'm searching can be up to trillions of keys in size, depending on the length of the keys and character set.

So basically you have a key (which we'll call X) of length N, which is hashed in an attempt to keep people from knowing what it is. The key space (S) is the length of your character set (C) to the power of N. (S = C^N)

S is the number of keys you will have to search to find a match if the very last key is your target; this is without any redundancy. If C is 62, and N is 5, S = about 916,000,000. At the rate of 3 million keys per second, this would take roughly five minutes optimally. In a real world scenario, you won't check 3 million keys per second, you don't know N, and the keys generated have an exponentially increasing redundancy rate, which means that out of those 3 million keys checked per second, only a fraction of them are unique, and this decreases over time.

If I can remove the redundancy rate altogether, I'm that much closer to practically cracking any given hash.

EDIT: jsmith: I know one-way hash functions are supposed to be secure. I'm attempting to prove that they're not.
Last edited on Apr 21, 2009 at 6:55pm
Apr 21, 2009 at 7:14pm
Everything is a solvable problem.


When you get to solving FTL travel, send me a PM. I'll be interested in investing.

How long have you been doing cryptanalysis?
Apr 21, 2009 at 7:18pm
Moving faster than the speed of light isn't a problem. There's no practical use for it.

What I was saying is that given a computing problem like mine, everything is solvable. It's just a matter of how much time and processing power you can invest, and how low you can bring the requirements of the time and processing power needed.

If you had an infinite amount of processing power coupled with an infinite amount of time, my problem wouldn't exist. The problem I've shown you only comes into play with the real-world consideration that it has to work with the given computing and time restraints.

And it's entirely possible to work within these restraints, it's just a matter of how much time and effort you are willing to put into optimizing and improving your code, which is what I'm doing here.
Last edited on Apr 21, 2009 at 7:31pm
Apr 21, 2009 at 7:46pm
If you had an infinite amount of processing power coupled with an infinite amount of time, my problem wouldn't exist.


Nah -- you only need one of those. :-)

What I was saying is that given a computing problem like mine, everything is solvable.


That's a bit naive. Cryptographers are really good at understanding computational complexity. There are computational limits outside of your code. You cannot optimize away every constraint.
Apr 21, 2009 at 7:59pm
You cannot optimize away every constraint.


That's very true. At the same time, it's foolish to not even try to solve a given problem because it's not easy to do so. While I realize that some problems are completely impractical to solve by today's constraints, I also realize that this is not one of them.

My statement still stands, this is completely solvable. In fact, while we were debating this, I came up with my own solution.

Let me know if you're interested, and I'll tell you about it. Otherwise, I have some coding to do. ;)
Last edited on Apr 21, 2009 at 8:03pm
Apr 21, 2009 at 8:11pm
In fact, while we were debating this, I came up with my own solution.


I'm here to help in any way I can. ;-)

Good luck!
Apr 21, 2009 at 8:14pm
Thanks! I'll come back if I have any issues.

EDIT: Here are my notes on the solution I've worked out, if anyone's interested. They're written in a format that I understand, which relate to aspects specific to my program, so sorry if some of it is unclear.

*Create a vector array in the base implementation of the processing path class from which all paths derive.

*Each thread has its own key space to search. For eight threads and a key space of 4096, each thread would have 512 keys to search.

Thread 0 -- 0-511
Thread 1 -- 512-1023
Thread 2 -- 1024-1535
Thread 3 -- 1536-2047
Thread 4 -- 2048-2559
Thread 5 -- 2560-3071
Thread 6 -- 3072-3583
Thread 7 -- 3584-4095

In this way, no thread overlaps another thread's key space, and no mutexes are required to modify the array which stores the used values.

*After a key has been checked, it is inserted in numerical form, sorted in ascending order, in the vector array defined in the base class. Every new key is checked against the vector array using a binary search. If the check fails, the key is thrown out, and a new one is generated.


This should eliminate the problem I've been having with redundancy.

Also, if anyone's interested, you can find more about my project here: http://lavernasbrute.googlecode.com
Last edited on Apr 21, 2009 at 8:28pm
Apr 22, 2009 at 12:32am
Well I'll be....

So you aren't proving that one way hashes are insecure; you're just proving that 99.9999% of all passwords in use today are insecure.

Still, though, I think your logic is not correct in that your random method is not faster than the "start at the beginning" approach, unless the creator of the password knows your cracking algorithm, in which case they can devise a password that will take N iterations to crack instead of the theoretical N/2. (Where N is so large that the 1/2 const is irrelevant).

Interesting project though, using the ATI GPU as a number cruncher.
Apr 22, 2009 at 1:08am
I don't think I'm following.
So you have 8 threads randomizing strings and hashing them, right? What good is assigning the threads key spaces to search? If the strings are random*, the resulting hashes should also be random*, so thread 0 could very well generate 3584.
Am I missing something, here?

*Meant to mean PR, of course.
Apr 22, 2009 at 8:13pm
I don't think I'm following.
So you have 8 threads randomizing strings and hashing them, right? What good is assigning the threads key spaces to search?


Dividing and mapping the key space among all available threads is the way I'm planning on getting around using mutices, as they tend to be artificial performance killers. Instead of using a single master array where values will be stored, I divide the key space by the number of available threads, and each thread gets its own portion of the key space, and only that portion. (Using the example given above, thread 4 would only generate values between 2048 and 2559) This means that I can use a localized array for spent values for every thread, and I don't need to lock any resources.

Pretty clever, eh?

The example I gave showing the key space mapped to eight threads was just an example. In the actual implementation, the number of threads would be dynamic, as well as the way the key space is mapped to the threads.

So you aren't proving that one way hashes are insecure; you're just proving that 99.9999% of all passwords in use today are insecure.


That's the idea.

Still, though, I think your logic is not correct in that your random method is not faster than the "start at the beginning" approach...


That is also correct, the PRNG I'm using requires a certain amount of processing power, which lowers the number of keys generated per second. However, if the algorithm was static, it would introduce a weakness in that any potential target could choose a string at the end of the key space, crippling the effectiveness of the attack.

I realize that it's a trade off, but in my mind, it's a trade off that's worth the performance hit. If the target can't anticipate the order in which keys are generated, there's no vulnerability.

Also, the PRNG uses a very small percentage of the program's available resources compared to other aspects, such as NTLM hashing algorithm.
Apr 23, 2009 at 1:52am
You do realize that you can start "anywhere" in the array and iterate up to the end, then go back to the beginning, e.g.:

abcdefgh ... first first way
defghabc ... another way
etc...

And doing this would take probably next to no extra time (to simply change the starting location). You could randomize where you start and then the potential target wouldn't be able to do anything about it.
Apr 23, 2009 at 5:25pm
You're a genius, thank you so much for that.

Now there's the problem of getting X number of threads to evenly work on the key space in a sequential order, with a variable offset. But that's an entirely different problem.

Thanks for the idea, that's exactly what I needed!

EDIT: I've marked this topic as solved, but feel free to continue posting suggestions.
Last edited on Apr 23, 2009 at 5:37pm
Apr 24, 2009 at 11:49am
ok... now something offtopic:

may u make a squirrl fly to sirius by paddling with its tail (without a space ship)?:)

just joking:P...


btw... interesting stuff^^...
Topic archived. No new replies allowed.