Using gsub to extract character string before white space in R
No need for substring, just use gsub:
gsub( " .*$", "", dob )
# [1] "9/9/43" "9/17/88" "11/21/48"
A space (), then any character (
.
) any number of times (*
) until the end of the string ($
). See ?regex to learn regular expressions.
I often use strsplit
for these sorts of problems but liked how simple Romain's answer was. I thought it would be interesting to compare Romain's solution to a strsplit
answer:
Here's a strsplit
solution:
sapply(strsplit(dob, "\\s+"), "[", 1)
Using the microbenchmark package and dob <- rep(dob, 1000)
with the original data:
Unit: milliseconds
expr min lq median
gsub(" .*$", "", dob) 4.228843 4.247969 4.258232
sapply(strsplit(dob, "\\\\s+"), "[", 1) 14.438241 14.558832 14.634638
uq max neval
4.268029 5.081608 1000
14.756628 53.344984 1000
The clear winner on a Win 7 machine is the gsub
regex from Romain. Thanks for the answer and explanation Romain.
The library stringr
contains a function tailored to this problem.
library(stringr)
word(dob,1)
# [1] "9/9/43" "9/17/88" "11/21/48"