How to get the text between two words in R?
I am trying to get the text between two words in a sentence.
For example the sentence is -
x <- "This is my first sentence"
Now I want the text between This
and first
which is is my
.
I have tried various functions from R like grep
, grepl
, pmatch
, str_split
. However, I could not get exactly what I want .
This is the closest what I have reached with gsub
.
gsub(".*This\\s*|first*", "", x)
The output it gives is
[1] "is my sentence"
In reality, what I need is only
[1] "is my"
Any help would be appreciated.
You need .*
at the end to match zero or more characters after the 'first'
gsub('^.*This\\s*|\\s*first.*$', '', x)
#[1] "is my"
Another approach using rm_between
from the qdapRegex package.
library(qdapRegex)
rm_between(x, 'This', 'first', extract=TRUE)[[1]]
# [1] "is my"
Since this question is used as a reference, I'll add some possible solutions to build a complete overview. Both are based on a look-ahead/look-behind
regex pattern.
base R
regmatches( x, gregexpr("(?<=This ).*(?= first)", x, perl = TRUE ) )
stringr
stringr::str_extract_all( x, "(?<=This ).+(?= first)" )