Is There Something Called a Weighted Median?

Another way of thinking about this - one that doesn't generalize to arbitrary real weights as Ilmari's answer does, but one that applies to your particular problem and that has some history behind it - is the concept of a multiset median, where elements can appear multiple times; for most statistical applications this is the traditional notion of median used, since there's no guarantee that the results of statistical sampling will be distinct from each other (and often there are guarantees of just the opposite, that not all samples will be unique). Viewed from this perspective, your data isn't the set $\{133, 135, 137, 139, 141\}$ at all, but rather the multiset $\{133, 133, \ldots, 133, 135, 135, \ldots,141\}$ with each element having a multiplicity equal to the count associated with it in your table; with your data defined this way, the 'classical' notion of the median (the middle element of this set) gives the same value that you report, $139$.

As a side note (and wholly unrelated to statistical applications), these sorts of multiset medians also show up in the study of identities in Median Algebras (algebras equipped with a ternary operator representing the median); I heartily recommend checking out the 4th volume of Knuth's The Art Of Computer Programming if you're interested in finding out more about a curious little nook of computer science.


I can't really recall having heard the term "weighted median" before, but it makes perfect sense to me (and Google has heard of it).

One common way to define the ordinary median of a set of samples is as (the/a) value such that the number of samples above and below the median are both less than or equal to half the total number of samples. A natural extension of this definition is to define the median of a set of weighted samples as (the/a) value such that the total weights of the samples above and below the median are both less than or equal to half the total weight of all samples.

It is easy to check that this definition is equivalent to both:

  • the ordinary definition of the median of unweighted samples, when all the samples have the same weight, and
  • the textbook definition of the median of a probability distribution, when the probability of each sample is proportional to its weight.

You are looking for the median speed (line count per hour) and your results are ordered in terms of speed, which is a sensible start.

  • The median speed by document is 137, happening in the third of five documents.
  • The median speed by line is 139, happening in the 153rd or 154th of 306 lines.
  • The median speed by hour is 137, happening at 1.11433... hours of 2.22866...

In a sense, each result is weighted, and you should describe what you are using to calculate the median.

Incidentally the mean speed (total lines divided by total hours) is 137.3... and you could calculate this by weighting the LCPH by hours, suggesting that perhaps that doing this by hours is a sensible approach for the median.