Data Frame containing hyphens using R

I have created a list (Based on items in a column) in order to subset my dataset into smaller datasets relating to a particular variable. This list contains strings with hyphens in them -.

dim.list <- c('Age_CareContactDate-Gender', 'Age_CareContactDate-Group',
         'Age_ServiceReferralReceivedDate-Gender',
         'Age_ServiceReferralReceivedDate-Gender-0-18',
         'Age_ServiceReferralReceivedDate-Group',
         'Age_ServiceReferralReceivedDate-Group-ReferralReason')

I have then written some code to loop through each item in this list subsetting my main data.

for (i in dim.list) {assign(paste("df1.",i,sep=""),df[df$Dimension==i,])}

This works fine, however when I come to aggregate this in order to get some summary statistics I can't reference the dataset as R stops reading after the hyphen (I assume that the hyphen is some special character)

If I use a different list without hyphens e.g.

dim.list.abr <- c('ACCD_Gen','ACCD_Grp',
              'ASRRD_Gen',
              'ASRRD_Gen_0_18',
              'ASRRD_Grp',
              'ASRRD_Grp_RefRsn')

When my for loop above executes I get 6 data.frames with no observations.

Why is this happening?


Solution 1:

Comment to answer:

Hyphens aren't allowed in standard variable names. Think of a simple example: a-b. Is it a variable name with a hyphen or is it a minus b? The R interpreter assumes a minus b, because it doesn't require spaces for binary operations. You can force non-standard names to work using backticks, e.g.,

# terribly confusing names:
`a-b` <- 5
`x+y` <- 10
`mean(x^2)` <- "this is awful"

but you're better off following the rules and using standard names without special characters like + - * / % $ # @ ! & | ^ ( [ ' " in them. At ?quotes there is a section on Names and Identifiers:

Identifiers consist of a sequence of letters, digits, the period (.) and the underscore. They must not start with a digit nor underscore, nor with a period followed by a digit. Reserved words are not valid identifiers.

So that's why you're getting an error, but what you're doing isn't good practice. I completely agree with Axeman's comments. Use split to divide up your data frame into a list. And keep it in a list rather than use assign, it will be much easier to loop over or use lapply with that way. You might want to read my answer at How to make a list of data frames for a lot of discussion and examples.

Regarding your comment "dim.list is not the complete set of unique entries in the Dimensions column", that just means you need to subset before you split:

nice_list = df[df$Dimension %in% dim.list, ]
nice_list = split(nice_list, nice_list$Dimension)