How to make rounded percentages add up to 100%
Consider the four percentages below, represented as float
numbers:
13.626332%
47.989636%
9.596008%
28.788024%
-----------
100.000000%
I need to represent these percentages as whole numbers. If I simply use Math.round()
, I end up with a total of 101%.
14 + 48 + 10 + 29 = 101
If I use parseInt()
, I end up with a total of 97%.
13 + 47 + 9 + 28 = 97
What's a good algorithm to represent any number of percentages as whole numbers while still maintaining a total of 100%?
Edit: After reading some of the comments and answers, there are clearly many ways to go about solving this.
In my mind, to remain true to the numbers, the "right" result is the one that minimizes the overall error, defined by how much error rounding would introduce relative to the actual value:
value rounded error decision
----------------------------------------------------
13.626332 14 2.7% round up (14)
47.989636 48 0.0% round up (48)
9.596008 10 4.0% don't round up (9)
28.788024 29 2.7% round up (29)
In case of a tie (3.33, 3.33, 3.33) an arbitrary decision can be made (e.g. 3, 4, 3).
There are many ways to do just this, provided you are not concerned about reliance on the original decimal data.
The first and perhaps most popular method would be the Largest Remainder Method
Which is basically:
- Rounding everything down
- Getting the difference in sum and 100
- Distributing the difference by adding 1 to items in decreasing order of their decimal parts
In your case, it would go like this:
13.626332%
47.989636%
9.596008%
28.788024%
If you take the integer parts, you get
13
47
9
28
which adds up to 97, and you want to add three more. Now, you look at the decimal parts, which are
.626332%
.989636%
.596008%
.788024%
and take the largest ones until the total reaches 100. So you would get:
14
48
9
29
Alternatively, you can simply choose to show one decimal place instead of integer values. So the numbers would be 48.3 and 23.9 etc. This would drop the variance from 100 by a lot.
Probably the "best" way to do this (quoted since "best" is a subjective term) is to keep a running (non-integral) tally of where you are, and round that value.
Then use that along with the history to work out what value should be used. For example, using the values you gave:
Value CumulValue CumulRounded PrevBaseline Need
--------- ---------- ------------ ------------ ----
0
13.626332 13.626332 14 0 14 ( 14 - 0)
47.989636 61.615968 62 14 48 ( 62 - 14)
9.596008 71.211976 71 62 9 ( 71 - 62)
28.788024 100.000000 100 71 29 (100 - 71)
---
100
At each stage, you don't round the number itself. Instead, you round the accumulated value and work out the best integer that reaches that value from the previous baseline - that baseline is the cumulative value (rounded) of the previous row.
This works because you're not losing information at each stage but rather using the information more intelligently. The 'correct' rounded values are in the final column and you can see that they sum to 100.
You can see the difference between this and blindly rounding each value, in the third value above. While 9.596008
would normally round up to 10
, the accumulated 71.211976
correctly rounds down to 71
- this means that only 9
is needed to add to the previous baseline of 62
.
This also works for "problematic" sequence like three roughly-1/3
values, where one of them should be rounded up:
Value CumulValue CumulRounded PrevBaseline Need
--------- ---------- ------------ ------------ ----
0
33.333333 33.333333 33 0 33 ( 33 - 0)
33.333333 66.666666 67 33 34 ( 67 - 33)
33.333333 99.999999 100 67 33 (100 - 67)
---
100