Algorithm to implement a word cloud like Wordle

Context

  • Take a look at Wordle: http://www.wordle.net/
  • It's much better looking than any other word cloud generators I've seen
  • Note: the source is not available - read the FAQ: http://www.wordle.net/faq#code

My Questions

  • Is there an algorithm available that does what Wordle does?
  • If no, what are some alternatives that produces similar kinds of output?

Why I'm asking

  • just curious
  • want to learn

I'm the creator of Wordle. Here's how Wordle actually works:

Count the words, throw away boring words, and sort by the count, descending. Keep the top N words for some N. Assign each word a font size proportional to its count. Generate a Java2D Shape for each word, using the Java2D API.

Each word "wants" to be somewhere, such as "at some random x position in the vertical center". In decreasing order of frequency, do this for each word:

place the word where it wants to be
while it intersects any of the previously placed words
    move it one step along an ever-increasing spiral

That's it. The hard part is in doing the intersection-testing efficiently, for which I use last-hit caching, hierarchical bounding boxes, and a quadtree spatial index (all of which are things you can learn more about with some diligent googling).

Edit: As Reto Aebersold pointed out, there's now a book chapter, freely available, that covers this same territory: Beautiful Visualization, Chapter 3: Wordle


Here's a really nice javascript one from Jason Davies that uses d3. You can even use webfonts with it.

Demo: http://www.jasondavies.com/wordcloud/

Github: https://github.com/jasondavies/d3-cloud


I've implemented an algorithm as described by Jonathan Feinberg using python to create a tag cloud. It is far away from the beautiful clouds of wordle.net but it gives you an idea how it could be done.

You can find the project here.