Algorithm for classifying words for hangman difficulty levels as "Easy","Medium", or "Hard"

What is a good algorithm to determine the "difficulty" of a word for a hangman game, so that the game can select words to match a specified difficulty level?

Difficulty would seem related to the number of guesses required, the relative frequency of usage of letters (e.g. words with many uncommon letters may be harder to guess), and potentially the length of the word.

There are also some subjective factors to (attempt to) compensate for, such as the likelihood a word is in the vocabulary of the player, and can be recognised, allowing moving from a guessing strategy based on letter frequencies alone to guessing based on list of known matching words.

My attempt for now is below in ruby. Any suggestions on how to improve the categorisation?

def classify_word(w)
  n = w.chars.to_a.uniq.length # Num. unique chars in w
  if n < 5 and w.length > 4
    return WordDifficulty::Easy
  end
  if n > w.length / 2
    return WordDifficulty::Hard
  else
    return WordDifficulty::Medium
  end
end

I am writing a hangman game I would like my children to play; I am rather too old to be attempting "homework", which may be why the question is receiving so many down votes... Words are drawn randomly from large word databases, which include many obscure words, and are being filtered by the difficulty level determined for the word.


1. Introduction

Here's a way to approach this problem systematically: if you have an algorithm that plays hangman well, then you can take the difficulty of each word to be the number of wrong guesses that your program would take if guessing that word.

2. Aside on hangman strategy

There's an idea that's implicit in some the other answers and comments, that the optimal strategy for the solver would be to base their decisions on the frequency of letters in English, or on the frequency of words in some corpus. This is a seductive idea, but it's not quite right. The solver does best if it accurately models the distribution of words chosen by the setter, and a human setter may well be choosing words based on their rarity or avoidance of frequently used letters. For example, although E is the most frequently used letter in English, if the setter always chooses from the words JUGFUL, RHYTHM, SYZYGY, and ZYTHUM, then a perfect solver does not start by guessing E!

The best approach to modelling the setter depends on the context, but I guess that some kind of Bayesian inductive inference would work well in a context where the solver plays many games against the same setter, or against a group of similar setters.

3. A hangman algorithm

Here I'll outline a solver that is pretty good (but far from perfect). It models the setter as choosing words uniformly from a fixed dictionary. It's a greedy algorithm: at each stage it guesses the letter that minimizes the number of misses, that is, words that do not contain the guess. For example, if no guesses have been made so far, and the possible words are DEED, DEAD and DARE, then:

  • if you guess D or E, there are no misses;
  • if you guess A, there's one miss (DEED);
  • if you guess R, there are two misses (DEED and DEAD);
  • if you guess any other letter, there are three misses.

So either D or E is a good guess in this situation.

(Thanks to Colonel Panic in comments for pointing out that correct guesses are free in hangman—I totally forgot this in my first attempt!)

4. Implementation

Here's an implementation of this algorithm in Python:

from collections import defaultdict
from string import ascii_lowercase

def partition(guess, words):
    """Apply the single letter 'guess' to the sequence 'words' and return
    a dictionary mapping the pattern of occurrences of 'guess' in a
    word to the list of words with that pattern.

    >>> words = 'deed even eyes mews peep star'.split()
    >>> sorted(list(partition('e', words).items()))
    [(0, ['star']), (2, ['mews']), (5, ['even', 'eyes']), (6, ['deed', 'peep'])]

    """
    result = defaultdict(list)
    for word in words:
        key = sum(1 << i for i, letter in enumerate(word) if letter == guess)
        result[key].append(word)
    return result

def guess_cost(guess, words):
    """Return the cost of a guess, namely the number of words that don't
    contain the guess.

    >>> words = 'deed even eyes mews peep star'.split()
    >>> guess_cost('e', words)
    1
    >>> guess_cost('s', words)
    3

    """
    return sum(guess not in word for word in words)

def word_guesses(words, wrong = 0, letters = ''):
    """Given the collection 'words' that match all letters guessed so far,
    generate tuples (wrong, nguesses, word, guesses) where
    'word' is the word that was guessed;
    'guesses' is the sequence of letters guessed;
    'wrong' is the number of these guesses that were wrong;
    'nguesses' is len(guesses).

    >>> words = 'deed even eyes heel mere peep star'.split()
    >>> from pprint import pprint
    >>> pprint(sorted(word_guesses(words)))
    [(0, 1, 'mere', 'e'),
     (0, 2, 'deed', 'ed'),
     (0, 2, 'even', 'en'),
     (1, 1, 'star', 'e'),
     (1, 2, 'eyes', 'en'),
     (1, 3, 'heel', 'edh'),
     (2, 3, 'peep', 'edh')]

    """
    if len(words) == 1:
        yield wrong, len(letters), words[0], letters
        return
    best_guess = min((g for g in ascii_lowercase if g not in letters),
                     key = lambda g:guess_cost(g, words))
    best_partition = partition(best_guess, words)
    letters += best_guess
    for pattern, words in best_partition.items():
        for guess in word_guesses(words, wrong + (pattern == 0), letters):
            yield guess

5. Example results

Using this strategy it's possible to evaluate the difficulty of guessing each word in a collection. Here I consider the six-letter words in my system dictionary:

>>> words = [w.strip() for w in open('/usr/share/dict/words') if w.lower() == w]
>>> six_letter_words = set(w for w in words if len(w) == 6)
>>> len(six_letter_words)
15066
>>> results = sorted(word_guesses(six_letter_words))

The easiest words to guess in this dictionary (together with the sequence of guesses needed for the solver to guess them) are as follows:

>>> from pprint import pprint
>>> pprint(results[:10])
[(0, 1, 'eelery', 'e'),
 (0, 2, 'coneen', 'en'),
 (0, 2, 'earlet', 'er'),
 (0, 2, 'earner', 'er'),
 (0, 2, 'edgrew', 'er'),
 (0, 2, 'eerily', 'el'),
 (0, 2, 'egence', 'eg'),
 (0, 2, 'eleven', 'el'),
 (0, 2, 'enaena', 'en'),
 (0, 2, 'ennead', 'en')]

and the hardest words are these:

>>> pprint(results[-10:])
[(12, 16, 'buzzer', 'eraoiutlnsmdbcfg'),
 (12, 16, 'cuffer', 'eraoiutlnsmdbpgc'),
 (12, 16, 'jugger', 'eraoiutlnsmdbpgh'),
 (12, 16, 'pugger', 'eraoiutlnsmdbpcf'),
 (12, 16, 'suddle', 'eaioulbrdcfghmnp'),
 (12, 16, 'yucker', 'eraoiutlnsmdbpgc'),
 (12, 16, 'zipper', 'eraoinltsdgcbpjk'),
 (12, 17, 'tuzzle', 'eaioulbrdcgszmnpt'),
 (13, 16, 'wuzzer', 'eraoiutlnsmdbpgc'),
 (13, 17, 'wuzzle', 'eaioulbrdcgszmnpt')]

The reason that these are hard is because after you've guessed -UZZLE, you still have seven possibilities left:

>>> ' '.join(sorted(w for w in six_letter_words if w.endswith('uzzle')))
'buzzle guzzle muzzle nuzzle puzzle tuzzle wuzzle'

6. Choice of wordlist

Of course when preparing wordlists for your children you wouldn't start with your computer's system dictionary, you'd start with a list of words that you think they are likely to know. For example, you might have a look at Wiktionary's lists of the most frequently used words in various English corpora.

For example, among the 1,700 six-letter words in the 10,000 most common words in Project Gutenberg as of 2006, the most difficult ten are these:

[(6, 10, 'losing', 'eaoignvwch'),
 (6, 10, 'monkey', 'erdstaoync'),
 (6, 10, 'pulled', 'erdaioupfh'),
 (6, 10, 'slaves', 'erdsacthkl'),
 (6, 10, 'supper', 'eriaoubsfm'),
 (6, 11, 'hunter', 'eriaoubshng'),
 (6, 11, 'nought', 'eaoiustghbf'),
 (6, 11, 'wounds', 'eaoiusdnhpr'),
 (6, 11, 'wright', 'eaoithglrbf'),
 (7, 10, 'soames', 'erdsacthkl')]

(Soames Forsyte is a character in the Forsyte Saga by John Galsworthy; the wordlist has been converted to lower-case so it wasn't possible for me to quickly remove proper names.)


A really simple way would be to compute a score based on the lack of vowels in the word, the number of unique letters, and the commonness of each letter:

letters = 'etaoinshrdlcumwfgypbvkjxqz'
vowels = set('aeiou')

def difficulty(word):
    unique = set(word)
    positions = sum(letters.index(c) for c in word)

    return len(word) * len(unique) * (7 - len(unique & vowels)) * positions

words = ['the', 'potato', 'school', 'egypt', 'floccinaucinihilipilification']

for word in words:
    print difficulty(word), word

And the output:

432 the
3360 potato
7200 school
7800 egypt
194271 floccinaucinihilipilification

You could then score the words with:

        score < 2000   # Easy
 2000 < score < 10000  # Medium
10000 < score          # Hard

You can use the Monte Carlo Method to estimate the difficulty of a word:

  • Simulate a game by guessing a random letter each time, weighted by letter's frequency in your target language, and count how many guesses it took your randomized player to arrive at a solution. Note that since each guess eliminates a letter, this process is finite, and it returns a number from 1 to 26, inclusive.
  • Repeat this process 2*N times, where N is the number of unique letters in your word,
  • Calculate the score by averaging the results of 2*N runs,
  • Determine the complexity level: scores less than ten indicate an easy word, and scores above sixteen indicate a hard word; everything else is medium.