Iam trying to load a very large text file into a graph, where all words with the same lenght and that have all letters exept one in common are conected, what would be the fastest way to do so? I know having just one huge graph would be slow, whats a better way of doing it?
It seems to me that the hard part here is, given a word W, finding the words that differ from it by exactly one letter. If you can solve that then the rest is easy. This seems related to Hamming Distance so you might start there.
LB, I need to create a program where when given two words it returns the shortest path between them by changing one letter at a time...
for example for the words pool and food the answer would be
pool
fool
food
I'm not sure if this is the best way, but here is my 2 cents.
Use the hamming distance heuristic mentioned above as well as the partition algorithm used in quick sort and partition each string from the file such that all the strings on the left of it have a smaller hamming distance and all the strings to the right have a greater than or equal distance from it.
In the end, the string you are looking for is in the center (along with any duplicates) and now it is trivial to calculate the minimal number of changes required to transform all the strings in the text file to be like the one in the middle.
Start with all words of a given length.
Sort them, starting with the second character.
Go through the sorted collection. All adjacent words that are equal in their 2nd-nth characters are adjacent in the graph.
Now rotate the characters in the words and repeat the process.
Do this n-1 times. You've now built the graph.
However, this is solving the first problem that you posted. The actual problem that you're trying to solve is in your second post, and that might be much easier:
for each letter in the first word that's different from the second word
charge it to the corresponding letter in the second word.
if the new word is valid (if it's in the original set from the file)
recurse on the new word.