How can I strip punctuation from a string?

For the hope-to-have-an-answer-in-30-seconds part of this question, I'm specifically looking for C#

But in the general case, what's the best way to strip punctuation in any language?

I should add: Ideally, the solutions won't require you to enumerate all the possible punctuation marks.

Related: Strip Punctuation in Python


Solution 1:

new string(myCharCollection.Where(c => !char.IsPunctuation(c)).ToArray());

Solution 2:

Why not simply:

string s = "sxrdct?fvzguh,bij.";
var sb = new StringBuilder();

foreach (char c in s)
{
   if (!char.IsPunctuation(c))
      sb.Append(c);
}

s = sb.ToString();

The usage of RegEx is normally slower than simple char operations. And those LINQ operations look like overkill to me. And you can't use such code in .NET 2.0...

Solution 3:

Describes intent, easiest to read (IMHO) and best performing:

 s = s.StripPunctuation();

to implement:

public static class StringExtension
{
    public static string StripPunctuation(this string s)
    {
        var sb = new StringBuilder();
        foreach (char c in s)
        {
            if (!char.IsPunctuation(c))
                sb.Append(c);
        }
        return sb.ToString();
    }
}

This is using Hades32's algorithm which was the best performing of the bunch posted.

Solution 4:

Assuming "best" means "simplest" I suggest using something like this:

String stripped = input.replaceAll("\\p{Punct}+", "");

This example is for Java, but all sufficiently modern Regex engines should support this (or something similar).

Edit: the Unicode-Aware version would be this:

String stripped = input.replaceAll("\\p{P}+", "");

The first version only looks at punctuation characters contained in ASCII.

Solution 5:

You can use the regex.replace method:

 replace(YourString, RegularExpressionWithPunctuationMarks, Empty String)

Since this returns a string, your method will look something like this:

 string s = Regex.Replace("Hello!?!?!?!", "[?!]", "");

You can replace "[?!]" with something more sophiticated if you want:

(\p{P})

This should find any punctuation.