reshape2 melt warning message
Solution 1:
An explanation:
When you melt, you are combining multiple columns into one. In this case, you are combining factor columns, each of which has a levels
attribute. These levels are not the same across columns because your factors are actually different. melt
just coerces each factor to character and drops their attributes when creating the value
column in the result.
In this case the warning doesn't matter, but you need to be very careful when combining columns that are not of the same "type", where "type" does not mean just vector type, but generically the nature of things it refers to. For example, I would not want to melt a column containing speeds in MPH with one containing weights in LBs.
One way to confirm that it is okay to combine your factor columns is to ask yourself whether any possible value in one column would be a reasonable value to have in every other column. If that is the case, then likely the correct thing to do would be to ensure that every factor column has all the possible levels that it could accept (in the same order). If you do this, you will not get a warning when you melt the table.
An illustration:
library(reshape2)
DF <- data.frame(id=1:3, x=letters[1:3], y=rev(letters)[1:3])
str(DF)
The levels for x
and y
are not the same:
'data.frame': 3 obs. of 3 variables:
$ id: int 1 2 3
$ x : Factor w/ 3 levels "a","b","c": 1 2 3
$ y : Factor w/ 3 levels "x","y","z": 3 2 1
Here we melt
and look at the column x
and y
were molten into (value
):
melt(DF, id.vars="id")$value
We get a character vector and a warning:
[1] "a" "b" "c" "z" "y" "x"
Warning message:
attributes are not identical across measure variables; they will be dropped
If however we reset the factors to have the same levels and only then melt:
DF[2:3] <- lapply(DF[2:3], factor, levels=letters)
melt(DF, id.vars="id", factorsAsStrings=F)$value
We get the correct factor and no warnings:
[1] a b c z y x
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
The default behavior of melt
is to drop factor levels even when they are identical, which is why we use factorsAsStrings=F
above. If you had not used that setting you would have gotten a character vector, but no warning. I would argue the default behavior should be to keep the result as a factor, but that is not the case here.
Solution 2:
BrodieG's answer is excellent; however there are some cases where it is impractical to refactor columns (for example GHCN climate data with 128 fixed-width columns that I wanted to melt into a much smaller number of columns).
In that case, the simplest solution is to treat the data as characters rather than factors: for example, you can re-import the data using read.fwf(filename,stringsAsFactors=FALSE)
(the same idea would work for read.csv
). For a smaller number of columns you could convert factors to strings using d$mystring<-as.character(d$myfactor)
.