Subscript letters in ggplot axis label

I'm trying to work out how to have subscript letters in an axis label.

dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x=x,y=y)) +
    geom_point() +
    labs(y=expression(Blah[1]))

dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x=x,y=y)) +
    geom_point() +
    labs(y=expression(Blah[1d]))

The first example works as it's just a number, as soon as you have a character in the square brackets, it fails. Blah[subscript(1d)] is essentially what I need, but I can't work out how to get it to let me have letters in subscript. I have tried variations, including paste().

The following examples provide strange behavior:

labs(y=expression(Blah[12])) # this works
labs(y=expression(Blah[d])) # this works
labs(y=expression(Blah[d1])) # this works
labs(y=expression(Blah[1d])) # this fails

Thoughts?


Solution 1:

The reason the last one fails is that the arguments to expression get run through the R parser and an error is returned when they fail the test of whether they could possibly be correct R syntax. The string or token 1d is not a valid R token (or symbol). It would be possible to either break it into valid R tokens and "connect" with non-space operators, backtick it , or use ordinary quotes. I think either is a better way than using paste:

 ggplot(dat, aes(x=x,y=y)) +
     geom_point() +
     labs(y=expression(Blah[1*d]))
 ggplot(dat, aes(x=x,y=y)) +
     geom_point() +
     labs(y=expression(Blah["1d"]))

Tokens (or "names" or "symbols") in R are not supposed to start with digits. So you get around that limitation by either quoting or by separating 1 and d by a non-space separator, the * operator. That "joins" or "ligates" a pure numeric literal with a legal R symbol or token.

To get a percent sign unsubscripted just:

 ggplot(dat, aes(x=x,y=y)) +
    geom_point() +
    labs(y=expression(Blah[1*d]*"%"))

To put parens around the pct-sign:

expression(Blah[1*d]*"(%)")

The % character has special meaning in R parsing, since it signifies the beginning of a user defined infix operator. So using it as a literal requires that it be quoted. The same reasoning requires that "for" and "in" be quoted, because they are in the "reserved words" group for R. There are other reserved words, (but for and in are the ones that trip me up most often.) Type:

 ?Reserved

And another "trick" is to use quotation marks around digits within italic()if you need them italicized. Unquoted digits do not get italicized inside that function.

Caveats: paste is a plotmath function except it has different semantics than the base::paste function. In particular, it has no 'sep' argument. So you can never get a space between the printed arguments and if you try to put in a non-space item, a single instance will appear after all the other arguments labeled as sep=" ".

paste0 is not a plotmath function and so will not get interpreted but rather will appear "unprocessed" with its unprocessed arguments inside parentheses.

Solution 2:

Okay. I swear I didn't post this just to answer it myself, despite how quickly I got it (always the way when you ask a question!)

Here it is:

ggplot(dat, aes(x=x,y=y)) +
    geom_point() +
    labs(y=expression(Blah[1][d]))

Thought it best to post the answer rather than remove the question as it may help someone else one day.

'Blahs' aside, what I actually wanted was expression(paste("Hb", A[1][c]," (%)",sep=""))

Why paste0() doesn't work here is beyond me.