Extract all strings between two strings
I'm trying to develop a method that will match all strings between two strings:
I've tried this but it returns only the first match:
string ExtractString(string s, string start,string end)
{
// You should check for errors in real-world code, omitted for brevity
int startIndex = s.IndexOf(start) + start.Length;
int endIndex = s.IndexOf(end, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
}
Let's suppose we have this string
String Text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2"
I would like a c# function doing the following :
public List<string> ExtractFromString(String Text,String Start, String End)
{
List<string> Matched = new List<string>();
.
.
.
return Matched;
}
// Example of use
ExtractFromString("A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2","A1","A2")
// Will return :
// FIRSTSTRING
// SECONDSTRING
// THIRDSTRING
Thank you for your help !
private static List<string> ExtractFromBody(string body, string start, string end)
{
List<string> matched = new List<string>();
int indexStart = 0;
int indexEnd = 0;
bool exit = false;
while (!exit)
{
indexStart = body.IndexOf(start);
if (indexStart != -1)
{
indexEnd = indexStart + body.Substring(indexStart).IndexOf(end);
matched.Add(body.Substring(indexStart + start.Length, indexEnd - indexStart - start.Length));
body = body.Substring(indexEnd + end.Length);
}
else
{
exit = true;
}
}
return matched;
}
Here is a solution using RegEx. Don't forget to include the following using statement.
using System.Text.RegularExpressions
It will correctly return only text between the start and end strings given.
Will not be returned:
akslakhflkshdflhksdf
Will be returned:
FIRSTSTRING
SECONDSTRING
THIRDSTRING
It uses the regular expression pattern [start string].+?[end string]
The start and end strings are escaped in case they contain regular expression special characters.
private static List<string> ExtractFromString(string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
You could make that into an extension method of String like this:
public static class StringExtensionMethods
{
public static List<string> EverythingBetween(this string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
}
Usage:
string source = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
string start = "A1";
string end = "A2";
List<string> results = source.EverythingBetween(start, end);
text.Split(new[] {"A1", "A2"}, StringSplitOptions.RemoveEmptyEntries);
You can split the string into an array using the start identifier in following code:
String str = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
String[] arr = str.Split("A1");
Then iterate through your array and remove the last 2 characters of each string (to remove the A2). You'll also need to discard the first array element as it will be empty assuming the string starts with A1.
Code is untested, currently on a mobile