How can I trim leading and trailing white space?
I am having some trouble with leading and trailing white space in a data.frame.
For example, I look at a specific row
in a data.frame
based on a certain condition:
> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)]
[1] codeHelper country dummyLI dummyLMI dummyUMI
[6] dummyHInonOECD dummyHIOECD dummyOECD
<0 rows> (or 0-length row.names)
I was wondering why I didn't get the expected output since the country Austria obviously existed in my data.frame
. After looking through my code history and trying to figure out what went wrong I tried:
> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]
codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD
18 AUT Austria 0 0 0 0 1
dummyOECD
18 1
All I have changed in the command is an additional white space after Austria.
Further annoying problems obviously arise. For example, when I like to merge two frames based on the country column. One data.frame
uses "Austria "
while the other frame has "Austria"
. The matching doesn't work.
- Is there a nice way to 'show' the white space on my screen so that I am aware of the problem?
- And can I remove the leading and trailing white space in R?
So far I used to write a simple Perl script which removes the whites pace, but it would be nice if I can somehow do it inside R.
Solution 1:
As of R 3.2.0 a new function was introduced for removing leading/trailing white spaces:
trimws()
See: Remove Leading/Trailing Whitespace
Solution 2:
Probably the best way is to handle the trailing white spaces when you read your data file. If you use read.csv
or read.table
you can set the parameterstrip.white=TRUE
.
If you want to clean strings afterwards you could use one of these functions:
# Returns string without leading white space
trim.leading <- function (x) sub("^\\s+", "", x)
# Returns string without trailing white space
trim.trailing <- function (x) sub("\\s+$", "", x)
# Returns string without leading or trailing white space
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
To use one of these functions on myDummy$country
:
myDummy$country <- trim(myDummy$country)
To 'show' the white space you could use:
paste(myDummy$country)
which will show you the strings surrounded by quotation marks (") making white spaces easier to spot.
Solution 3:
To manipulate the white space, use str_trim() in the stringr package. The package has manual dated Feb 15, 2013 and is in CRAN. The function can also handle string vectors.
install.packages("stringr", dependencies=TRUE)
require(stringr)
example(str_trim)
d4$clean2<-str_trim(d4$V2)
(Credit goes to commenter: R. Cotton)
Solution 4:
A simple function to remove leading and trailing whitespace:
trim <- function( x ) {
gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
}
Usage:
> text = " foo bar baz 3 "
> trim(text)
[1] "foo bar baz 3"