Using gsub to extract character string before white space in R

No need for substring, just use gsub:

gsub( " .*$", "", dob )
# [1] "9/9/43"   "9/17/88"  "11/21/48"

A space (), then any character (.) any number of times (*) until the end of the string ($). See ?regex to learn regular expressions.


I often use strsplit for these sorts of problems but liked how simple Romain's answer was. I thought it would be interesting to compare Romain's solution to a strsplit answer:

Here's a strsplit solution:

sapply(strsplit(dob, "\\s+"), "[", 1)

Using the microbenchmark package and dob <- rep(dob, 1000) with the original data:

Unit: milliseconds
                                    expr       min        lq    median
                   gsub(" .*$", "", dob)  4.228843  4.247969  4.258232
 sapply(strsplit(dob, "\\\\s+"), "[", 1) 14.438241 14.558832 14.634638
        uq       max neval
  4.268029  5.081608  1000
 14.756628 53.344984  1000

The clear winner on a Win 7 machine is the gsub regex from Romain. Thanks for the answer and explanation Romain.


The library stringr contains a function tailored to this problem.

library(stringr)
word(dob,1)
# [1] "9/9/43"   "9/17/88"  "11/21/48"