Comparison-based ranking algorithm

I would like to rank or sort a collection of items (with size potentially greater than 100,000) where items in the collection have no intrinsic (comparable) value, instead all I have is the comparisons between any two items which have been provided by users in a subjective manner.

Example: Consider a collection with elements [a, b, c, d] and comparisons by users b > a, a > d, d > c. The correct order of this collection would be [b, a, d, c].

This example is simple, however there could be more complicated cases:

  • Since the comparisons are subjective, a user could also say that c > b. In which case that would cause a conflict with the ordering above.
  • Also you may not have comparisons that “connects” all the items, i.e. b > a, d > c. In which case the ordering is ambiguous. It could be [b, a, d, c] or [d, c, b, a]. In this case either ordering is acceptable.

If possible it would be nice to somehow take into account multiple instances of the same comparison and give those with higher occurrences more weight. But a solution without this condition would still be acceptable.

A similar application of this algorithm was used by Zuckerberg's FaceMash application where he ranked people based on comparisons (if I understood it correctly), but I have not been able to find what that algorithm actually was.

Is there an algorithm which already exists that can solve the problem above? I would not like to spend effort trying to come up with one if that is the case. If there is no specific algorithm, is there perhaps certain types of algorithms or techniques which you can point me to?


Solution 1:

This is a problem that has already occurred in another arena: competitive games! Here, too, the goal is to assign each player a global "rank" on the basis of a series of 1 vs. 1 comparisons. The difficulty, of course, is that the comparisons are not transitive (I take "subjective" to mean "provided by a human being" in your question). Kasparov beats Fischer beats (don't know another chess player!) Bob beats Kasparov, potentially.

This renders useless algorithms that rely on transitivity (i.e. a > b and b > c => a > c) as you end up with (likely) a highly cyclic graph.

Several rating systems have been devised to tackle this problem.

The most well-known system is probably the Elo algorithm/score for competitive chess players. Its descendants (for instance, the Glicko rating system) are more sophisticated and take into account statistical properties of the win/loss record---in other words, how reliable is a rating? This is similar to your idea of weighting more heavily records with more "games" played. Glicko also forms the basis for the TrueSkill system used on Xbox Live for multiplayer video games.

Solution 2:

You may be interested in the minimum feedback arc set problem. Essentially the problem is to find the minimum number of comparisons that "go the wrong way" if the elements are linearly ordered in some ordering. This is the same as finding the minimum number of edges that must be removed to make the graph acyclic. Unfortunately, solving the problem exactly is NP-hard.

A couple of links that discuss the problem:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.8157&rep=rep1&type=pdf

http://en.wikipedia.org/wiki/Feedback_arc_set