Split a column of concatenated comma-delimited data and recode output as factors

Solution 1:

You just need to write a function and use apply. First some dummy data:

##Make sure you're not using factors
dd = data.frame(V1 = c("1, 2, 3", "1, 2, 4", "2, 3, 4, 5", 
                         "1, 3, 4", "1, 3, 5", "2, 3, 4, 5"), 
                     stringsAsFactors=FALSE)

Next, create a function that takes in a row and transforms as necessary

make_row = function(i, ncol=5) {
  ##Could make the default NA if needed
  m = numeric(ncol)
  v = as.numeric(strsplit(i, ",")[[1]])
  m[v] = 1
  return(m)
}

Then use apply and transpose the result

t(apply(dd, 1, make_row))

Solution 2:

A long time later, I finally got around to creating a package ("splitstackshape") that deals with this kind of data in an efficient manner. So, for the convenience of others (and some self-promotion, of course) here's a compact solution.

The relevant function for this problem is cSplit_e.

First, the default settings, which retains the original column and uses NA as the fill:

library(splitstackshape)
cSplit_e(data, "V1")
#           V1 V1_1 V1_2 V1_3 V1_4 V1_5
# 1    1, 2, 3    1    1    1   NA   NA
# 2    1, 2, 4    1    1   NA    1   NA
# 3 2, 3, 4, 5   NA    1    1    1    1
# 4    1, 3, 4    1   NA    1    1   NA
# 5    1, 3, 5    1   NA    1   NA    1
# 6 2, 3, 4, 5   NA    1    1    1    1

Second, with dropping the original column and using 0 as the fill.

cSplit_e(data, "V1", drop = TRUE, fill = 0)
#   V1_1 V1_2 V1_3 V1_4 V1_5
# 1    1    1    1    0    0
# 2    1    1    0    1    0
# 3    0    1    1    1    1
# 4    1    0    1    1    0
# 5    1    0    1    0    1
# 6    0    1    1    1    1

Is it possible to run graphical applications such as Firefox without installing a desktop environment?

How to prove the inverse of an inverse of a group element is the element itself without $a + a^{-1} = a^{-1} + a$?

There are 2 homomorphisms: $f(x)=(4x,6x,2x)$ and $g(x,y,z)=(5x-5y+5z,10x-10y+10z)$. Find a group $\ker(g) /{\rm im}(f)$.

Probability of one or more events occurring all with different probabilities [closed]

Tower of Hanoi sequence via eigendecomposition

Topological properties of $(0,1)$ with $B=\{ (1/n,1) \}$

Counter example for the limit comparison test [closed]

fundamental group of manifold, Lee's text topological manifold

is $\mathbb{R}^2\setminus \{(0,0)\}$ homeomorphic to $S^1$?

show that there is no a positive integer $n$ for which $\sqrt{n+1} + \sqrt{n-1}$ is rational

Find real and imaginary parts of $\cot(\frac{\pi}{4}-i\ln 2)$.

If the diameters of ball bearings are normally distributed, determine the percentage with diameters between $0.610$ and $0.618$ inches.