Split a string that has white spaces, unless they are enclosed within "quotes"?

To make things simple:

string streamR = sr.ReadLine();  // sr.Readline results in:
                                 //                         one "two two"

I want to be able to save them as two different strings, remove all spaces EXCEPT for the spaces found between quotation marks. Therefore, what I need is:

string 1 = one
string 2 = two two

So far what I have found that works is the following code, but it removes the spaces within the quotes.

//streamR.ReadLine only has two strings
  string[] splitter = streamR.Split(' ');
    str1 = splitter[0];
    // Only set str2 if the length is >1
    str2 = splitter.Length > 1 ? splitter[1] : string.Empty;

The output of this becomes

one
two

I have looked into Regular Expression to split on spaces unless in quotes however I can't seem to get regex to work/understand the code, especially how to split them so they are two different strings. All the codes there give me a compiling error (I am using System.Text.RegularExpressions)


Solution 1:

string input = "one \"two two\" three \"four four\" five six";
var parts = Regex.Matches(input, @"[\""].+?[\""]|[^ ]+")
                .Cast<Match>()
                .Select(m => m.Value)
                .ToList();

Solution 2:

You can even do that without Regex: a LINQ expression with String.Split can do the job.

You can split your string before by " then split only the elements with even index in the resulting array by .

var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();

For the string:

This is a test for "Splitting a string" that has white spaces, unless they are "enclosed within quotes"

It gives the result:

This
is
a
test
for
Splitting a string
that
has
white
spaces,
unless
they
are
enclosed within quotes

UPDATE

string myString = "WordOne \"Word Two\"";
var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();

Console.WriteLine(result[0]);
Console.WriteLine(result[1]);
Console.ReadKey();

UPDATE 2

How do you define a quoted portion of the string?

We will assume that the string before the first " is non-quoted.

Then, the string placed between the first " and before the second " is quoted. The string between the second " and the third " is non-quoted. The string between the third and the fourth is quoted, ...

The general rule is: Each string between the (2*n-1)th (odd number) " and (2*n)th (even number) " is quoted. (1)

What is the relation with String.Split?

String.Split with the default StringSplitOption (define as StringSplitOption.None) creates an list of 1 string and then add a new string in the list for each splitting character found.

So, before the first ", the string is at index 0 in the splitted array, between the first and second ", the string is at index 1 in the array, between the third and fourth, index 2, ...

The general rule is: The string between the nth and (n+1)th " is at index n in the array. (2)

The given (1) and (2), we can conclude that: Quoted portion are at odd index in the splitted array.

Solution 3:

As custom parser might be more suitable for this.

This is something I wrote once when I had a specific (and very strange) parsing requirement that involved parenthesis and spaces, but it is generic enough that it should work with virtually any delimiter and text qualifier.

public static IEnumerable<String> ParseText(String line, Char delimiter, Char textQualifier)
{

    if (line == null)
        yield break;

    else
    {
        Char prevChar = '\0';
        Char nextChar = '\0';
        Char currentChar = '\0';

        Boolean inString = false;

        StringBuilder token = new StringBuilder();

        for (int i = 0; i < line.Length; i++)
        {
            currentChar = line[i];

            if (i > 0)
                prevChar = line[i - 1];
            else
                prevChar = '\0';

            if (i + 1 < line.Length)
                nextChar = line[i + 1];
            else
                nextChar = '\0';

            if (currentChar == textQualifier && (prevChar == '\0' || prevChar == delimiter) && !inString)
            {
                inString = true;
                continue;
            }

            if (currentChar == textQualifier && (nextChar == '\0' || nextChar == delimiter) && inString)
            {
                inString = false;
                continue;
            }

            if (currentChar == delimiter && !inString)
            {
                yield return token.ToString();
                token = token.Remove(0, token.Length);
                continue;
            }

            token = token.Append(currentChar);

        }

        yield return token.ToString();

    } 
}

The usage would be:

var parsedText = ParseText(streamR, ' ', '"');

Solution 4:

You can use the TextFieldParser class that is part of the Microsoft.VisualBasic.FileIO namespace. (You'll need to add a reference to Microsoft.VisualBasic to your project.):

string inputString = "This is \"a test\" of the parser.";

using (MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(inputString)))
{
    using (Microsoft.VisualBasic.FileIO.TextFieldParser tfp = new TextFieldParser(ms))
    {
        tfp.Delimiters = new string[] { " " };
        tfp.HasFieldsEnclosedInQuotes = true;
        string[] output = tfp.ReadFields();

        for (int i = 0; i < output.Length; i++)
        {
            Console.WriteLine("{0}:{1}", i, output[i]);
        }
    }
}

Which generates the output:

0:This
1:is
2:a test
3:of
4:the
5:parser.