How do I remove repeated spaces in a string?
Solution 1:
String#squeeze has an optional parameter to specify characters to squeeze.
irb> "asd asd asd asd".squeeze(" ")
=> "asd asd asd asd"
Warning: calling it without a parameter will 'squezze' ALL repeated characters, not only spaces:
irb> 'aaa bbbb cccc 0000123'.squeeze
=> "a b c 0123"
Solution 2:
>> str = "foo bar bar baaar"
=> "foo bar bar baaar"
>> str.split.join(" ")
=> "foo bar bar baaar"
>>
Solution 3:
Updated benchmark from @zetetic's answer:
require 'benchmark'
include Benchmark
string = "foo bar bar baaar"
n = 1_000_000
bm(12) do |x|
x.report("gsub ") { n.times { string.gsub(/\s+/, " ") } }
x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
x.report("split/join") { n.times { string.split.join(" ") } }
end
Which results in these values when run on my desktop after running it twice:
ruby test.rb; ruby test.rb
user system total real
gsub 6.060000 0.000000 6.060000 ( 6.061435)
squeeze(' ') 4.200000 0.010000 4.210000 ( 4.201619)
split/join 3.620000 0.000000 3.620000 ( 3.614499)
user system total real
gsub 6.020000 0.000000 6.020000 ( 6.023391)
squeeze(' ') 4.150000 0.010000 4.160000 ( 4.153204)
split/join 3.590000 0.000000 3.590000 ( 3.587590)
The issue is that squeeze
removes any repeated character, which results in a different output string and doesn't meet the OP's need. squeeze(' ')
does meet the needs, but slows down its operation.
string.squeeze
=> "fo bar bar bar"
I was thinking about how the split.join
could be faster and it didn't seem like that would hold up in large strings, so I adjusted the benchmark to see what effect long strings would have:
require 'benchmark'
include Benchmark
string = (["foo bar bar baaar"] * 10_000).join
puts "String length: #{ string.length } characters"
n = 100
bm(12) do |x|
x.report("gsub ") { n.times { string.gsub(/\s+/, " ") } }
x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
x.report("split/join") { n.times { string.split.join(" ") } }
end
ruby test.rb ; ruby test.rb
String length: 250000 characters
user system total real
gsub 2.570000 0.010000 2.580000 ( 2.576149)
squeeze(' ') 0.140000 0.000000 0.140000 ( 0.150298)
split/join 1.400000 0.010000 1.410000 ( 1.396078)
String length: 250000 characters
user system total real
gsub 2.570000 0.010000 2.580000 ( 2.573802)
squeeze(' ') 0.140000 0.000000 0.140000 ( 0.150384)
split/join 1.400000 0.010000 1.410000 ( 1.397748)
So, long lines do make a big difference.
If you do use gsub then gsub/\s{2,}/, ' ') is slightly faster.
Not really. Here's a version of the benchmark to test just that assertion:
require 'benchmark'
include Benchmark
string = "foo bar bar baaar"
puts string.gsub(/\s+/, " ")
puts string.gsub(/\s{2,}/, ' ')
puts string.gsub(/\s\s+/, " ")
string = (["foo bar bar baaar"] * 10_000).join
puts "String length: #{ string.length } characters"
n = 100
bm(18) do |x|
x.report("gsub") { n.times { string.gsub(/\s+/, " ") } }
x.report('gsub/\s{2,}/, "")') { n.times { string.gsub(/\s{2,}/, ' ') } }
x.report("gsub2") { n.times { string.gsub(/\s\s+/, " ") } }
end
# >> foo bar bar baaar
# >> foo bar bar baaar
# >> foo bar bar baaar
# >> String length: 250000 characters
# >> user system total real
# >> gsub 1.380000 0.010000 1.390000 ( 1.381276)
# >> gsub/\s{2,}/, "") 1.590000 0.000000 1.590000 ( 1.609292)
# >> gsub2 1.050000 0.010000 1.060000 ( 1.051005)
If you want speed, use gsub2
. squeeze(' ')
will still run circles around a gsub
implementation though.
Solution 4:
Important note: this is answer for Ruby on Rails, not plain ruby
(both Activesupport
and Facets
are part of Rails
gem)
To complement the other answers, note that both Activesupport and Facets provide String#squish ([update] caveat: it also removes newlines within the string):
>> "foo bar bar baaar".squish
=> "foo bar bar baaar"