C# Regex Split - commas outside quotes
Solution 1:
You could split on all commas, that do have an even number of quotes following them , using the following Regex to find them:
",(?=(?:[^']*'[^']*')*[^']*$)"
You'd use it like
var result = Regex.Split(samplestring, ",(?=(?:[^']*'[^']*')*[^']*$)");
Solution 2:
//this regular expression splits string on the separator character NOT inside double quotes.
//separatorChar can be any character like comma or semicolon etc.
//it also allows single quotes inside the string value: e.g. "Mike's Kitchen","Jane's Room"
Regex regx = new Regex(separatorChar + "(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
string[] line = regx.Split(string to split);
Solution 3:
although I too like a challenge some of the time, but this actually isn't one. please read this article http://secretgeek.net/csv_trouble.asp and then go on and use http://www.filehelpers.com/
[Edit1, 3]: or maybe this article can help too (the link only shows some VB.Net sample code but still, you can use it with C# too!): http://msdn.microsoft.com/en-us/library/cakac7e6.aspx
I've tried to do the sample for C# (add reference to Microsoft.VisualBasic to your project)
using System;
using System.IO;
using Microsoft.VisualBasic.FileIO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
TextReader reader = new StringReader("('ABCDEFG', 123542, 'XYZ 99,9')");
TextFieldParser fieldParser = new TextFieldParser(reader);
fieldParser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
fieldParser.SetDelimiters(",");
String[] currentRow;
while (!fieldParser.EndOfData)
{
try
{
currentRow = fieldParser.ReadFields();
foreach(String currentField in currentRow)
{
Console.WriteLine(currentField);
}
}
catch (MalformedLineException e)
{
Console.WriteLine("Line {0} is not valid and will be skipped.", e);
}
}
}
}
}
[Edit2]: found another one which could be of help here: http://www.codeproject.com/KB/database/CsvReader.aspx
-- reinhard
Solution 4:
I had a problem where it wasn't capturing empty columns. I modified it as such to get empty string results
var results = Regex.Split(source, "[,]{1}(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");