Metric prefixes for ggplot2 scales
One of the great packages to visualize data with R is ggplot2. It offers many ways to customize the output to match your needs. When I was using R to present disk usage figures I needed a way to use the metric/SI prefixes (kB, MB, GB, TB, ...) for the output. Here is my supporting function I used to do it.
In ggplot2 functions like scale_y_continuous() are used to set the graphic parameters for the scales. These functions have the optional parameter formatter to provide a user-defined function that will perform the formatting of the labels on that axis. Currently the functions comma(), dollar(), percent() and scientific() are part of the ggplot2 package.
This is my R function FormatSI() to format with SI unit prefixes:
# Format a vector of numeric values according to the
# International System of Units.
# http://en.wikipedia.org/wiki/SI_prefix
#
# Args:
# x : A vector of numeric values
# ...: Remaining args passed to format()
#
# Returns:
# A vector of strings using SI prefix notation
#
# Bugs:
# Does not (yet) work with small (<1) numbers
#
scale.frac <- 1000
scale.unit <- c("k", "M", "G", "T", "P", "E", "Z", "Y")
# Start with empty prefixes
p <- rep(" ", length(x))
# Divide by scale.frac and store scale.unit if value is
# large enough. Repeat for all units.
for(i in 1:length(scale.unit)) {
p[x >= scale.frac] <- scale.unit[i]
x[x >= scale.frac] <- x[x >= scale.frac] / scale.frac
}
return(paste(format(round(x,1), trim=TRUE, scientific=FALSE, ...), p))
}
The function takes a vector of numeric values and returns a vector of strings containing the formatted values:
[1] "0.1 " "13.0 k" "1.4 M"
This function can be used with ggplot2 to provide the formatting for the scale labels:
The code for the function itself plus an example is available for download as FormatSI.R. The example code was used to generate the following graph:
As you can see the labels on the y-axis are formatted with SI unit prefixes to reflect megabytes and gigabytes even though the data contains the size in bytes.
Update 2012-03-21:
With the release of ggplot2 0.9.0 the format specification was changed. The formatter parameter has been replaced with the labels parameter and a function is expected as type.
I also found a discussion on R-help where Ben Tupper showed how to use findInterval to do the job.
So here is the updated code:
# Format a vector of numeric values according
# to the International System of Units.
# http://en.wikipedia.org/wiki/SI_prefix
#
# Based on code by Ben Tupper
# https://stat.ethz.ch/pipermail/r-help/2012-January/299804.html
# Args:
# ...: Args passed to format()
#
# Returns:
# A function to format a vector of strings using
# SI prefix notation
#
function(x) {
limits <- c(1e-24, 1e-21, 1e-18, 1e-15, 1e-12,
1e-9, 1e-6, 1e-3, 1e0, 1e3,
1e6, 1e9, 1e12, 1e15, 1e18,
1e21, 1e24)
prefix <- c("y", "z", "a", "f", "p",
"n", "ยต", "m", " ", "k",
"M", "G", "T", "P", "E",
"Z", "Y")
# Vector with array indices according to position in intervals
i <- findInterval(abs(x), limits)
# Set prefix to " " for very small values < 1e-24
i <- ifelse(i==0, which(limits == 1e0), i)
paste(format(round(x/limits[i], 1),
trim=TRUE, scientific=FALSE, ...),
prefix[i])
}
}
With ggplot2 0.9.0 you will need to use the following call:


Using R to visualize Oracle metrics
OpenVZ beancounter plugin for collectd
What is the Apache HTTP Server doing?