Extract text after "/" in a data frame column
I have a data frame that has two columns Link
and Value
. The Link
column has values like "abcd.com/efgh/ijkl/mnop" and is a URL. There are 10,000 rows in this frame which i have taken from a sample of 100,000 rows.
Now I want to extract the data after the last "/" from left to right or first "/" from right to left. So for eg in the above sample shown I was to extract "mnop"
I want to do this for all the 10,000 rows that is there in the column Link
while the Value
column should not be effected.
I was able to to use
a = sapply(webdatatest, substring, 36)
but this is not a dynamic method as positions of last "/" would change. Also this was effecting the second column also.
So need some help on this.
Solution 1:
Try basename()
. It
removes all of the path up to and including the last path separator (if any).
basename("abcd.com/efgh/ijkl/mnop")
# [1] "mnop"
It is vectorized, so you can just stick the whole column in there.
basename(rep("abcd.com/efgh/ijkl/mnop", 3))
# [1] "mnop" "mnop" "mnop"
So, to apply this to one column link
of a data frame webdata
, you can simply do
webdata$link <- basename(webdata$link)
The other obvious function would be sub()
, but I think basename()
will do the trick and it's easier.
sub(".*/", "", rep("abcd.com/efgh/ijkl/mnop", 3))