Algorithm to implement a word cloud like Wordle
Context
- Take a look at Wordle: http://www.wordle.net/
- It's much better looking than any other word cloud generators I've seen
- Note: the source is not available - read the FAQ: http://www.wordle.net/faq#code
My Questions
- Is there an algorithm available that does what Wordle does?
- If no, what are some alternatives that produces similar kinds of output?
Why I'm asking
- just curious
- want to learn
I'm the creator of Wordle. Here's how Wordle actually works:
Count the words, throw away boring words, and sort by the count, descending. Keep the top N words for some N. Assign each word a font size proportional to its count. Generate a Java2D Shape for each word, using the Java2D API.
Each word "wants" to be somewhere, such as "at some random x position in the vertical center". In decreasing order of frequency, do this for each word:
place the word where it wants to be
while it intersects any of the previously placed words
move it one step along an ever-increasing spiral
That's it. The hard part is in doing the intersection-testing efficiently, for which I use last-hit caching, hierarchical bounding boxes, and a quadtree spatial index (all of which are things you can learn more about with some diligent googling).
Edit: As Reto Aebersold pointed out, there's now a book chapter, freely available, that covers this same territory: Beautiful Visualization, Chapter 3: Wordle
Here's a really nice javascript one from Jason Davies that uses d3. You can even use webfonts with it.
Demo: http://www.jasondavies.com/wordcloud/
Github: https://github.com/jasondavies/d3-cloud
I've implemented an algorithm as described by Jonathan Feinberg using python to create a tag cloud. It is far away from the beautiful clouds of wordle.net but it gives you an idea how it could be done.
You can find the project here.