How can I delete special characters?
I'm practicing with Ruby and regex to delete certain unwanted characters. For example:
input = input.gsub(/<\/?[^>]*>/, '')
and for special characters, example ☻ or :
input = input.gsub('&#', '')
This leaves only numbers, ok. But this only works if the user enters a special character as a code, like this:
™
My question: How I can delete special characters if the user enters a special character without code, like this:
™ ☻
Solution 1:
First of all, I think it might be easier to define what constitutes "correct input" and remove everything else. For example:
input = input.gsub(/[^0-9A-Za-z]/, '')
If that's not what you want (you want to support non-latin alphabets, etc.), then I think you should make a list of the glyphs you want to remove (like ™ or ☻), and remove them one-by-one, since it's hard to distinguish between a Chinese, Arabic, etc. character and a pictograph programmatically.
Finally, you might want to normalize your input by converting to or from HTML escape sequences.
Solution 2:
If you just wanted ASCII characters, then you can use:
original = "aøbauhrhræoeuacå"
cleaned = ""
original.each_byte { |x| cleaned << x unless x > 127 }
cleaned # => "abauhrhroeuac"
Solution 3:
You can use parameterize:
'@!#$%^&*()111'.parameterize
=> "111"
Solution 4:
You can match all the characters you want, and then join them together, like this:
original = "aøbæcå"
stripped = original.scan(/[a-zA-Z]/).to_s
puts stripped
which outputs "abc"