How do I remove repeated spaces in a string?

Solution 1:

String#squeeze has an optional parameter to specify characters to squeeze.

irb> "asd  asd asd   asd".squeeze(" ")
=> "asd asd asd asd"

Warning: calling it without a parameter will 'squezze' ALL repeated characters, not only spaces:

irb> 'aaa     bbbb     cccc 0000123'.squeeze
=> "a b c 0123"

Solution 2:

>> str = "foo  bar   bar      baaar"
=> "foo  bar   bar      baaar"
>> str.split.join(" ")
=> "foo bar bar baaar"
>>

Solution 3:

Updated benchmark from @zetetic's answer:

require 'benchmark'
include Benchmark

string = "foo  bar   bar      baaar"
n = 1_000_000
bm(12) do |x|
  x.report("gsub      ")   { n.times { string.gsub(/\s+/, " ") } }
  x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
  x.report("split/join")   { n.times { string.split.join(" ") } }
end

Which results in these values when run on my desktop after running it twice:

ruby test.rb; ruby test.rb
                  user     system      total        real
gsub          6.060000   0.000000   6.060000 (  6.061435)
squeeze(' ')  4.200000   0.010000   4.210000 (  4.201619)
split/join    3.620000   0.000000   3.620000 (  3.614499)
                  user     system      total        real
gsub          6.020000   0.000000   6.020000 (  6.023391)
squeeze(' ')  4.150000   0.010000   4.160000 (  4.153204)
split/join    3.590000   0.000000   3.590000 (  3.587590)

The issue is that squeeze removes any repeated character, which results in a different output string and doesn't meet the OP's need. squeeze(' ') does meet the needs, but slows down its operation.

string.squeeze
 => "fo bar bar bar"

I was thinking about how the split.join could be faster and it didn't seem like that would hold up in large strings, so I adjusted the benchmark to see what effect long strings would have:

require 'benchmark'
include Benchmark

string = (["foo  bar   bar      baaar"] * 10_000).join
puts "String length: #{ string.length } characters"
n = 100
bm(12) do |x|
  x.report("gsub      ")   { n.times { string.gsub(/\s+/, " ") } }
  x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
  x.report("split/join")   { n.times { string.split.join(" ") } }
end

ruby test.rb ; ruby test.rb

String length: 250000 characters
                  user     system      total        real
gsub          2.570000   0.010000   2.580000 (  2.576149)
squeeze(' ')  0.140000   0.000000   0.140000 (  0.150298)
split/join    1.400000   0.010000   1.410000 (  1.396078)

String length: 250000 characters
                  user     system      total        real
gsub          2.570000   0.010000   2.580000 (  2.573802)
squeeze(' ')  0.140000   0.000000   0.140000 (  0.150384)
split/join    1.400000   0.010000   1.410000 (  1.397748)

So, long lines do make a big difference.


If you do use gsub then gsub/\s{2,}/, ' ') is slightly faster.

Not really. Here's a version of the benchmark to test just that assertion:

require 'benchmark'
include Benchmark

string = "foo  bar   bar      baaar"
puts string.gsub(/\s+/, " ")
puts string.gsub(/\s{2,}/, ' ')
puts string.gsub(/\s\s+/, " ")

string = (["foo  bar   bar      baaar"] * 10_000).join
puts "String length: #{ string.length } characters"
n = 100
bm(18) do |x|
  x.report("gsub")               { n.times { string.gsub(/\s+/, " ") } }
  x.report('gsub/\s{2,}/, "")')  { n.times { string.gsub(/\s{2,}/, ' ') } }
  x.report("gsub2")              { n.times { string.gsub(/\s\s+/, " ") } }
end
# >> foo bar bar baaar
# >> foo bar bar baaar
# >> foo bar bar baaar
# >> String length: 250000 characters
# >>                          user     system      total        real
# >> gsub                 1.380000   0.010000   1.390000 (  1.381276)
# >> gsub/\s{2,}/, "")    1.590000   0.000000   1.590000 (  1.609292)
# >> gsub2                1.050000   0.010000   1.060000 (  1.051005)

If you want speed, use gsub2. squeeze(' ') will still run circles around a gsub implementation though.

Solution 4:

Important note: this is answer for Ruby on Rails, not plain ruby (both Activesupport and Facets are part of Rails gem)

To complement the other answers, note that both Activesupport and Facets provide String#squish ([update] caveat: it also removes newlines within the string):

>> "foo  bar   bar      baaar".squish
=> "foo bar bar baaar"