Hacker Emblem

Metric prefixes for ggplot2 scales

BlogNotes to Myself

  • Frontpage
  • Gallery
  • About Me

Metric prefixes for ggplot2 scales

2011-10-07

One of the great packages to visualize data with R is ggplot2. It offers many ways to customize the output to match your needs. When I was using R to present disk usage figures I needed a way to use the metric/SI prefixes (kB, MB, GB, TB, ...) for the output. Here is my supporting function I used to do it.

In ggplot2 functions like scale_y_continuous() are used to set the graphic parameters for the scales. These functions have the optional parameter formatter to provide a user-defined function that will perform the formatting of the labels on that axis. Currently the functions comma(), dollar(), percent() and scientific() are part of the ggplot2 package.

This is my R function FormatSI() to format with SI unit prefixes:

FormatSI <- function(x, ...) {
  # Format a vector of numeric values according to the
  # International System of Units.
  # http://en.wikipedia.org/wiki/SI_prefix
  #
  # Args:
  #   x  : A vector of numeric values
  #   ...: Remaining args passed to format()
  #
  # Returns:
  #   A vector of strings using SI prefix notation
  #
  # Bugs:
  #   Does not (yet) work with small (<1) numbers
  #
  scale.frac <- 1000
  scale.unit <- c("k", "M", "G", "T", "P", "E", "Z", "Y")

  # Start with empty prefixes
  p <- rep(" ", length(x))

  # Divide by scale.frac and store scale.unit if value is
  # large enough. Repeat for all units.
  for(i in 1:length(scale.unit)) {
    p[x >= scale.frac] <- scale.unit[i]
    x[x >= scale.frac] <- x[x >= scale.frac] / scale.frac
  }

  return(paste(format(round(x,1), trim=TRUE, scientific=FALSE, ...), p))
}

The function takes a vector of numeric values and returns a vector of strings containing the formatted values:

> FormatSI(c(0.12, 13000, 1400000))
[1] "0.1  "      "13.0 k"        "1.4 M"

This function can be used with ggplot2 to provide the formatting for the scale labels:

ggplot() + ... + scale_y_continuous(formatter="FormatSI")

The code for the function itself plus an example is available for download as FormatSI.R. The example code was used to generate the following graph:

Digital Equipment Corporation disk drives (SCSI)

As you can see the labels on the y-axis are formatted with SI unit prefixes to reflect megabytes and gigabytes even though the data contains the size in bytes.

Update 2012-03-21:

With the release of ggplot2 0.9.0 the format specification was changed. The formatter parameter has been replaced with the labels parameter and a function is expected as type.

I also found a discussion on R-help where Ben Tupper showed how to use findInterval to do the job.

So here is the updated code:

format_si <- function(...) {
  # Format a vector of numeric values according
  # to the International System of Units.
  # http://en.wikipedia.org/wiki/SI_prefix
  #
  # Based on code by Ben Tupper
  # https://stat.ethz.ch/pipermail/r-help/2012-January/299804.html
  # Args:
  #   ...: Args passed to format()
  #
  # Returns:
  #   A function to format a vector of strings using
  #   SI prefix notation
  #
 
  function(x) {
    limits <- c(1e-24, 1e-21, 1e-18, 1e-15, 1e-12,
                1e-9,  1e-6,  1e-3,  1e0,   1e3,
                1e6,   1e9,   1e12,  1e15,  1e18,
                1e21,  1e24)
    prefix <- c("y",   "z",   "a",   "f",   "p",
                "n",   "ยต",   "m",   " ",   "k",
                "M",   "G",   "T",   "P",   "E",
                "Z",   "Y")
 
    # Vector with array indices according to position in intervals
    i <- findInterval(abs(x), limits)
 
    # Set prefix to " " for very small values < 1e-24
    i <- ifelse(i==0, which(limits == 1e0), i)

    paste(format(round(x/limits[i], 1),
                 trim=TRUE, scientific=FALSE, ...),
          prefix[i])
  }
}

With ggplot2 0.9.0 you will need to use the following call:

ggplot() + ... + scale_y_continuous(labels=format_si())
Posted by Stefan in Software Comments: (0) Trackbacks: (0)
Defined tags for this entry: R, Tools
Possibly related articles
Universal Scalability Law in Oracle
Using R to visualize Oracle metrics
OpenVZ beancounter plugin for collectd
What is the Apache HTTP Server doing?

Trackbacks
Trackback specific URI for this entry
No Trackbacks

Comments
Display comments as (Linear | Threaded)
No comments

Add Comment

Standard emoticons like :-) and ;-) are converted to images.
Textile-formatting allowed
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
 
Submitted comments will be subject to moderation before being displayed.
 
Syndication
  • XML RSS 1.0 feed
  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
Tags
Apache Concepts Database Mac OpenVZ Oracle PostgreSQL R SQL Tools Tuning Unix Virtualization
Recent Posts
GNU Emacs and GPGTool on Mac
Feb 16. 2012
Universal Scalability Law in Oracle
Jan 25. 2012
Logging Apache response times
Dec 6. 2011
Metric prefixes for ggplot2 scales
Oct 7. 2011
Time Machine for Lion on FreeBSD
Sep 28. 2011
License
Creative Commons License
BlogNotes to Myself by Stefan Möding is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
 
Imprint - Privacy Policy