How to split String with some separator but without removing that separator in Java? [duplicate]
I'm facing problem in splitting String
.
I want to split a String
with some separator but without losing that separator.
When we use somestring.split(String separator)
method in Java it splits the String
but removes the separator part from String
. I don't want this to happen.
I want result like below:
String string1="Ram-sita-laxman";
String seperator="-";
string1.split(seperator);
Output:
[Ram, sita, laxman]
but I want the result like the one below instead:
[Ram, -sita, -laxman]
Is there a way to get output like this?
string1.split("(?=-)");
This works because split
actually takes a regular expression. What you're actually seeing is a "zero-width positive lookahead".
I would love to explain more but my daughter wants to play tea party. :)
Edit: Back!
To explain this, I will first show you a different split
operation:
"Ram-sita-laxman".split("");
This splits your string on every zero-length string. There is a zero-length string between every character. Therefore, the result is:
["", "R", "a", "m", "-", "s", "i", "t", "a", "-", "l", "a", "x", "m", "a", "n"]
Now, I modify my regular expression (""
) to only match zero-length strings if they are followed by a dash.
"Ram-sita-laxman".split("(?=-)");
["Ram", "-sita", "-laxman"]
In that example, the ?=
means "lookahead". More specifically, it mean "positive lookahead". Why the "positive"? Because you can also have negative lookahead (?!
) which will split on every zero-length string that is not followed by a dash:
"Ram-sita-laxman".split("(?!-)");
["", "R", "a", "m-", "s", "i", "t", "a-", "l", "a", "x", "m", "a", "n"]
You can also have positive lookbehind (?<=
) which will split on every zero-length string that is preceded by a dash:
"Ram-sita-laxman".split("(?<=-)");
["Ram-", "sita-", "laxman"]
Finally, you can also have negative lookbehind (?<!
) which will split on every zero-length string that is not preceded by a dash:
"Ram-sita-laxman".split("(?<!-)");
["", "R", "a", "m", "-s", "i", "t", "a", "-l", "a", "x", "m", "a", "n"]
These four expressions are collectively known as the lookaround expressions.
Bonus: Putting them together
I just wanted to show an example I encountered recently that combines two of the lookaround expressions. Suppose you wish to split a CapitalCase identifier up into its tokens:
"MyAwesomeClass" => ["My", "Awesome", "Class"]
You can accomplish this using this regular expression:
"MyAwesomeClass".split("(?<=[a-z])(?=[A-Z])");
This splits on every zero-length string that is preceded by a lower case letter ((?<=[a-z])
) and followed by an upper case letter ((?=[A-Z])
).
This technique also works with camelCase identifiers.
It's a bit dodgy, but you could introduce a dummy separator using a replace function. I don't know the Java methods, but in C# it could be something like:
string1.Replace("-", "#-").Split("#");
Of course, you'd need to pick a dummy separator that's guaranteed not to be anywhere else in the string.
A way to do this is to split your string, then add your separator at the beginning of each extracted string except the first one.
Adam hit the nail on the head! I used his answer to figure out how to insert filename text from the file dialog browser into a rich text box. The problem I ran into was when I was adding a new line at the "\" in the file string. The string.split command was splitting at the \ and deleting it. After using a mixture of Adam's code I was able to create a new line after each \ in the file name.
Here is the code I used:
OpenFileDialog fd = new OpenFileDialog();
fd.Multiselect = true;
fd.ShowDialog();
foreach (string filename in fd.FileNames)
{
string currentfiles = uxFiles.Text;
string value = "\r\n" + filename;
//This line allows the Regex command to split after each \ in the filename.
string[] lines = Regex.Split(value, @"(?<=\\)");
foreach (string line in lines)
{
uxFiles.Text = uxFiles.Text + line + "\r\n";
}
}
Enjoy!
Walrusking