Algorithm to generate anagrams
Solution 1:
Most of these answers are horribly inefficient and/or will only give one-word solutions (no spaces). My solution will handle any number of words and is very efficient.
What you want is a trie data structure. Here's a complete Python implementation. You just need a word list saved in a file named words.txt
You can try the Scrabble dictionary word list here:
http://www.isc.ro/lists/twl06.zip
MIN_WORD_SIZE = 4 # min size of a word in the output
class Node(object):
def __init__(self, letter='', final=False, depth=0):
self.letter = letter
self.final = final
self.depth = depth
self.children = {}
def add(self, letters):
node = self
for index, letter in enumerate(letters):
if letter not in node.children:
node.children[letter] = Node(letter, index==len(letters)-1, index+1)
node = node.children[letter]
def anagram(self, letters):
tiles = {}
for letter in letters:
tiles[letter] = tiles.get(letter, 0) + 1
min_length = len(letters)
return self._anagram(tiles, [], self, min_length)
def _anagram(self, tiles, path, root, min_length):
if self.final and self.depth >= MIN_WORD_SIZE:
word = ''.join(path)
length = len(word.replace(' ', ''))
if length >= min_length:
yield word
path.append(' ')
for word in root._anagram(tiles, path, root, min_length):
yield word
path.pop()
for letter, node in self.children.iteritems():
count = tiles.get(letter, 0)
if count == 0:
continue
tiles[letter] = count - 1
path.append(letter)
for word in node._anagram(tiles, path, root, min_length):
yield word
path.pop()
tiles[letter] = count
def load_dictionary(path):
result = Node()
for line in open(path, 'r'):
word = line.strip().lower()
result.add(word)
return result
def main():
print 'Loading word list.'
words = load_dictionary('words.txt')
while True:
letters = raw_input('Enter letters: ')
letters = letters.lower()
letters = letters.replace(' ', '')
if not letters:
break
count = 0
for word in words.anagram(letters):
print word
count += 1
print '%d results.' % count
if __name__ == '__main__':
main()
When you run the program, the words are loaded into a trie in memory. After that, just type in the letters you want to search with and it will print the results. It will only show results that use all of the input letters, nothing shorter.
It filters short words from the output, otherwise the number of results is huge. Feel free to tweak the MIN_WORD_SIZE
setting. Keep in mind, just using "astronomers" as input gives 233,549 results if MIN_WORD_SIZE
is 1. Perhaps you can find a shorter word list that only contains more common English words.
Also, the contraction "I'm" (from one of your examples) won't show up in the results unless you add "im" to the dictionary and set MIN_WORD_SIZE
to 2.
The trick to getting multiple words is to jump back to the root node in the trie whenever you encounter a complete word in the search. Then you keep traversing the trie until all letters have been used.
Solution 2:
For each word in the dictionary, sort the letters alphabetically. So "foobar" becomes "abfoor."
Then when the input anagram comes in, sort its letters too, then look it up. It's as fast as a hashtable lookup!
For multiple words, you could do combinations of the sorted letters, sorting as you go. Still much faster than generating all combinations.
(see comments for more optimizations and details)