Fastest way to remove white spaces in string

I'm trying to fetch multiple email addresses seperated by "," within string from database table, but it's also returning me whitespaces, and I want to remove the whitespace quickly.

The following code does remove whitespace, but it also becomes slow whenever I try to fetch large number email addresses in a string like to 30000, and then try to remove whitespace between them. It takes more than four to five minutes to remove those spaces.

 Regex Spaces =
        new Regex(@"\s+", RegexOptions.Compiled);
txtEmailID.Text = MultipleSpaces.Replace(emailaddress),"");

Could anyone please tell me how can I remove the whitespace within a second even for large number of email address?


Solution 1:

I would build a custom extension method using StringBuilder, like:

public static string ExceptChars(this string str, IEnumerable<char> toExclude)
{
    StringBuilder sb = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        char c = str[i];
        if (!toExclude.Contains(c))
            sb.Append(c);
    }
    return sb.ToString();
}

Usage:

var str = s.ExceptChars(new[] { ' ', '\t', '\n', '\r' });

or to be even faster:

var str = s.ExceptChars(new HashSet<char>(new[] { ' ', '\t', '\n', '\r' }));

With the hashset version, a string of 11 millions of chars takes less than 700 ms (and I'm in debug mode)

EDIT :

Previous code is generic and allows to exclude any char, but if you want to remove just blanks in the fastest possible way you can use:

public static string ExceptBlanks(this string str)
{
    StringBuilder sb = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        char c = str[i];
        switch (c)
        {
            case '\r':
            case '\n':
            case '\t':
            case ' ':
                continue;
            default:
                sb.Append(c);
                break;
        }
    }
    return sb.ToString();
}

EDIT 2 :

as correctly pointed out in the comments, the correct way to remove all the blanks is using char.IsWhiteSpace method :

public static string ExceptBlanks(this string str)
{
    StringBuilder sb = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        char c = str[i];
        if(!char.IsWhiteSpace(c))
            sb.Append(c);
    }
    return sb.ToString();
}

Solution 2:

Given the implementation of string.Replaceis written in C++ and part of the CLR runtime I'm willing to bet

email.Replace(" ","").Replace("\t","").Replace("\n","").Replace("\r","");

will be the fastest implementation. If you need every type of whitespace, you can supply the hex value the of unicode equivalent.

Solution 3:

With linq you can do it simply:

emailaddress = new String(emailaddress
                                     .Where(x=>x!=' ' && x!='\r' && x!='\n')
                                     .ToArray());

I didn't compare it with stringbuilder approaches, but is much more faster than string based approaches. Because it does not create many copy of strings (string is immutable and using it directly causes to dramatically memory and speed problems), so it's not going to use very big memory and not going to slow down the speed (except one extra pass through the string at first).

Solution 4:

You should try String.Trim(). It will trim all spaces from start to end of a string

Or you can try this method from linked topic: [link]

    public static unsafe string StripTabsAndNewlines(string s)
    {
        int len = s.Length;
        char* newChars = stackalloc char[len];
        char* currentChar = newChars;

        for (int i = 0; i < len; ++i)
        {
            char c = s[i];
            switch (c)
            {
                case '\r':
                case '\n':
                case '\t':
                    continue;
                default:
                    *currentChar++ = c;
                    break;
            }
        }
        return new string(newChars, 0, (int)(currentChar - newChars));
    }