Format numbers with million (M) and billion (B) suffixes
Obviously you first need to get rid of the commas in the formatted numbers, and gsub("\\,", ...)
is the way to go. This uses findInterval
to select the appropriate suffix for labeling and determine the denominator for a more compact display. Can be easily extended in either direction if one wanted to go below 1.0 or above 1 trillion:
comprss <- function(tx) {
div <- findInterval(as.numeric(gsub("\\,", "", tx)),
c(0, 1e3, 1e6, 1e9, 1e12) ) # modify this if negative numbers are possible
paste(round( as.numeric(gsub("\\,","",tx))/10^(3*(div-1)), 2),
c("","K","M","B","T")[div] )}
You don't need to remove the as.numeric
or gsub
if the input is numeric. It's admittedly superfluous, but would succeed. This is the result with Gregor's example:
> comprss (big_x)
[1] "123 " "500 " "999 " "1.05 K" "9 K"
[6] "49 K" "105.4 K" "998 K" "1.5 M" "20 M"
[11] "313.4 M" "453.12 B"
And with the original input (which was probably a factor variable if entered with read.table
, read.csv
or created with data.frame
.)
comprss (dat$V2)
[1] "6 M" "75 M" "743.45 M" "340 K" "4.3 M"
And of course these can be printed without the quotes using either an explicit print
command using quotes=FALSE
or by using cat
.
If you begin with this numeric vector x
,
x <- c(6e+06, 75000400, 743450000, 340000, 4300000)
you could do the following.
paste(format(round(x / 1e6, 1), trim = TRUE), "M")
# [1] "6.0 M" "75.0 M" "743.5 M" "0.3 M" "4.3 M"
And if you're not concerned about trailing zeros, just remove the format()
call.
paste(round(x / 1e6, 1), "M")
# [1] "6 M" "75 M" "743.5 M" "0.3 M" "4.3 M"
Alternatively, you could assign an S3 class with print method and keep y
as numeric underneath. Here I use paste0()
to make the result a bit more legible.
print.million <- function(x, quote = FALSE, ...) {
x <- paste0(round(x / 1e6, 1), "M")
NextMethod(x, quote = quote, ...)
}
## assign the 'million' class to 'x'
class(x) <- "million"
x
# [1] 6M 75M 743.5M 0.3M 4.3M
x[]
# [1] 6000000 75000400 743450000 340000 4300000
You could do the same for billions and trillions as well. For information on how to put this into a data frame, see this answer, as you'll need both a format()
and an as.data.frame()
method.
Recent versions of the scales
package include functionality to print readable labels. If you're using ggplot or tidyverse, scales
is probably already installed. You might have to update the package though.
In this case, label_number_si
can be used:
> library(scales)
> inp <- c(6000000, 75000400, 743450000, 340000, 4300000)
> label_number_si(accuracy=0.1)(inp)
[1] "6.0M" "75.0M" "743.4M" "340.0K" "4.3M"
Another option, starting with numeric (rather than character) numbers, and works for both millions and billions (and below). You could pass more arguments to formatC
to customize output, and extend to Trillions if need be.
m_b_format = function(x) {
b.index = x >= 1e9
m.index = x >= 1e5 & x < 1e9
output = formatC(x, format = "d", big.mark = ",")
output[b.index] = paste(formatC(x[b.index] / 1e9, digits = 1, format = "f"), "B")
output[m.index] = paste(formatC(x[m.index] / 1e6, digits = 1, format = "f"), "M")
return(output)
}
your_x = c(6e6, 75e6 + 400, 743450000, 340000, 43e6)
> m_b_format(your_x)
[1] "6.0 M" "75.0 M" "743.5 M" "0.3 M" "43.0 M"
big_x = c(123, 500, 999, 1050, 9000, 49000, 105400, 998000,
1.5e6, 2e7, 313402182, 453123634432)
> m_b_format(big_x)
[1] "123" "500" "999" "1,050" "9,000" "49,000"
[7] "0.1 M" "1.0 M" "1.5 M" "20.0 M" "313.4 M" "453.1 B"