Unscrambler

Oct 29, 2014 at 1:21am

This program should be easy for those who are experienced //not me
So i want to make a simple word unscrambler but dont know how //obviously
I am thinking that i can set a value to an individual word.
For example "bucky"
the scrambled word would be "ubcyk"
shouldnt both words contain the same value since both have the same characters?
I tried &ubcyk but no matter the word it will always be 4 bytes.
is there any sort of value that is stored based on the characters that make a word? If there are then i could just create a list and search each word for this value until the pairs match.

Oct 29, 2014 at 3:12am

helios (17607)

shouldnt both words contain the same value since both have the same characters?
[...]
is there any sort of value that is stored based on the characters that make a word? If there are then i could just create a list and search each word for this value until the pairs match.

No. The minimal value that would hold the relevant information would be a list of pairs of characters and character counts, such that f("bucky") = [('b', 1), ('c', 1), ('k', 1), ('u', 1), ('y', 1)], and f("ubcyk") = [('b', 1), ('c', 1), ('k', 1), ('u', 1), ('y', 1)], and therefore f("bucky") = f("ubcyk") (f(x) = f(y) if and only if x is an anagram of y). Computing this value is non-trivial, and no language implementation would do it by default.
You can implement f() yourself, but in reality you don't want f(), ~~because comparing those values is also, non-trivial~~. You really want a function that tells you if two strings are anagrams of each other.
However,

For example "bucky"
the scrambled word would be "ubcyk"

What makes "ubcyk" scrambled and "bucky" unscrambled? How would you tell the computer this? You mentioned creating a list, and that would sort of work, but what if the scrambled word happens to be in the list?
For example, suppose the original word is "rental", and the scrambled word is "antler". Should the program leave the word as is, replace it with "rental", or replace it with "learnt"? How would the program know which is right?

Last edited on Oct 31, 2014 at 4:29pm

Oct 30, 2014 at 5:54pm

IAmAFailedProgram (4)

I think I have an idea on how to program this.
I could set a numerical value to each alphabetical character and then i could just find the product of all the characters in a word and compare that product to the product of a wordlist.
EX: Scrambled word = ubcyk
u=3
b=5
c=8
y=9
k=22
thus 3*5*8*9*22 = x \\too lazy to do the math
so since bucky has the same letters than the product would be the same.
All I would need to do is program a converter of the original word to a product. Then every step afterwards would be simple as long as the wordlist is not HUGE because some words have same letters but in different positions EX: dog and god.

Oct 30, 2014 at 6:15pm

helios (17607)

How would you tell apart 3*5*8*9*22 from 2*3*4*5*9*22, or from 2*2*2*2*3*3*3*5*11?

Last edited on Oct 30, 2014 at 6:17pm

Oct 31, 2014 at 9:23am

IAmAFailedProgram (4)

Again, it only works for small lists of words. Lets say your words are limited to 20-5 character words. Meaning that possibly non of them will have the same combination since the possibilities are far too huge. I've come across another problem.
How can you make multiple inputs?
Example: I need for the user to input the letters of the word. I can do something like

cin >> letterOne;
if (letterOne =! 0){
cin >> letterTwo;
else blahblahblah }
if (letterTwo =! 0) {
cin >> letterThree}

\\this allows for the user to input a 0 whenever they have no more letters but is there a way to maybe create a list of inputs automatically? Or anyway to make this much more simple?

this allows the user to enter a 0 whenever

Oct 31, 2014 at 3:18pm

helios (17607)

So at this point you're requiring that:
1. There are no mutual anagrams in the list of unscrambled words.
2. There are no words with the same multiplied value.
3. The scrambled input words unscramble to a word that is definitely in the list.

Regarding #3, you can't tell with certainty if a word is on the list (you can tell with certainty that a word is not on the list) because for any given word's multiplied value there may be several unrelated words with the same value.
For example, suppose your list only contains "book". Then
value("book") = value("obko")
But it's possible that
value("book") = value("rty")
even though "try" is not on the list.

Now, there is a way to salvage this situation by choosing carefully the number you assign to each letter, and I'm trying to get you to figure it out.

Oct 31, 2014 at 4:26pm

m4ster r0shi (2201)

I'm trying to teach my mom programming (don't ask why) and I settled with something like what you're trying to do as a final goal for the course. I caught her the other day watching a tv quiz show where you are given a scrambled word and have to find the original one [1] and I said: "You know, you can write a program to do that...", to which she replied: "Really??? How???"

I'd recommend the approach helios outlined in his first post. If you want to do this in order to learn, implementing the proposed f function ~~and the necessary equality operation~~ will be a good exercise. If you want to do this fast, there are languages (e.g. Clojure [2], the one I'll be trying to teach my mom) where ~~both operations are~~ this operation is essentially built-in:

; definition
(defn find-originals [scrambled-word]
  (let [all-words (-> "dictionary_with_one_word_per_line.txt" 
                    slurp (clojure.string/split #"\n"))
        target-frequencies (frequencies scrambled-word)]
    (filter #(= (frequencies %) target-frequencies) all-words)))

; usage
(find-originals "ertlan")
; ("antler" "learnt" "rental")

[1] http://www.youtube.com/results?search_query=%CE%A4%CE%97%CE%9B%CE%95%CE%9A%CE%A5%CE%92%CE%9F%CE%A3

[2] http://clojure.org/

Last edited on Nov 2, 2014 at 3:06am

Oct 31, 2014 at 7:56pm

JLBorges (13770)

The aha! insight is to sign each word in the dictionary so that words in the same anagram class have the same signature, ... We'll use a signature based in sorting: order the letters within the word alphabetically. (Footnote: This anagram algorithm has been independently discovered by many people, dating at least as far back as the mid-1960's.)
...
When an equivalence relation defines classes, it is helpful to define a signature such that every item in the class has the same signature and no other item does. Sorting the ~~words~~ letters within a word yields one signature for an anagram class; other signatures are given by sorting and then representing duplicates by a count (so the signature of "mississippi" might be "i4m1p2s4") or by keeping a 26-integer vector telling how many times each letter occurs. ...

Jon Bentley in 'Programming Pearls' http://www.amazon.com/Programming-Pearls-Jon-Bentley/dp/8177588583

Note: keeping a vector telling how many times each letter occurs is essentially what m4ster r0shi did.

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <unordered_map>
#include <sstream>

using anagram_map = std::unordered_map< std::string, std::vector<std::string> > ;

std::string make_key( std::string letters )
{
    for( char& c : letters ) c = std::tolower(c) ;
    std::sort( letters.begin(), letters.end() ) ;
    return letters ;
}

anagram_map make_anagram_map( std::istream& stm )
{
    anagram_map result ;
    std::string word ;
    while( stm >> word ) result[ make_key(word) ].push_back(word) ;
    return result ;
}

std::vector<std::string> look_up( const std::string& letters, const anagram_map& amap )
{
    auto iter = amap.find( make_key(letters) ) ;
    return iter != amap.end() ? iter->second : std::vector<std::string>{} ;
}

int main()
{
    std::istringstream word_list( "leapt palet petal plate pleat keel leek blase sable "
                                  "sleep peels builder rebuild "
                                  "again it only works for small lists of words "
                                  "lets say your words are limited to " ) ;

    auto amap = make_anagram_map(word_list) ;

    std::string letters ;
    while( std::cin >> letters )
    {
        std::cout << "scrambled letters: '" << letters ;
        const auto anagrams = look_up( letters, amap )  ;
        if( anagrams.empty() ) std::cout << "' => no words were found\n" ;
        else
        {
            std::cout << "' => words: " ;
            for( const std::string& ana : anagrams ) std::cout << "'" << ana << "' " ;
            std::cout << '\n' ;
        }
    }
}

Edit & run on cpp.sh

http://coliru.stacked-crooked.com/a/8e5bdeba59217675

Last edited on Oct 31, 2014 at 8:47pm

Topic archived. No new replies allowed.