Ruby split by whitespace
This is the default behavior of String#split
:
input = <<-TEXT
aa bbb
cc dd ee
TEXT
input.split
Result:
["aa", "bbb", "cc", "dd", "ee"]
This works in all versions of Ruby that I tested, including 1.8.7, 1.9.3, 2.0.0, and 2.1.2.
The following should work for the example you gave:
str.gsub(/\s+/m, ' ').strip.split(" ")
it returns:
["aa", "bbb", "cc", "dd", "ee"]
Meaning of code:
/\s+/m
is the more complicated part. \s
means white space, so \s+
means one ore more white space letters. In the /m
part, m
is called a modifier, in this case it means, multiline, meaning visit many lines, not just one which is the default behavior.
So, /\s+/m
means, find sequences of one or more white spaces.
gsub
means replace all.
strip
is the equivalent of trim
in other languages, and removes spaces from the front and end of the string.
As, I was writing the explanation, it could be the case where you do end up with and end-line character at the end or the beginning of the string.
To be safe
The code could be written as:
str.gsub(/\s+/m, ' ').gsub(/^\s+|\s+$/m, '').split(" ")
So if you had:
str = "\n aa bbb\n cc dd ee\n\n"
Then you'd get:
["aa", "bbb", "cc", "dd", "ee"]
Meaning of new code:
^\s+
a sequence of white spaces at the beginning of the string
\s+$
a sequence of white spaces at the end of the string
So gsub(/^\s+|\s+$/m, '')
means remove any sequence of white space at the beginning of the string and at the end of the string.
input = <<X
aa bbb
cc dd ee
X
input.strip.split(/\s+/)