Java split is eating my characters

Use zero-width matching assertions:

    String str = "la$le\\$li$lo";
    System.out.println(java.util.Arrays.toString(
        str.split("(?<!\\\\)\\$")
    )); // prints "[la, le\$li, lo]"

The regex is essentially

(?<!\\)\$

It uses negative lookbehind to assert that there is not a preceding \.

See also

  • regular-expressions.info/Lookarounds

More examples of splitting on assertions

Simple sentence splitting, keeping punctuation marks:

    String str = "Really?Wow!This.Is.Awesome!";
    System.out.println(java.util.Arrays.toString(
        str.split("(?<=[.!?])")
    )); // prints "[Really?, Wow!, This., Is., Awesome!]"

Splitting a long string into fixed-length parts, using \G

    String str = "012345678901234567890";
    System.out.println(java.util.Arrays.toString(
        str.split("(?<=\\G.{4})")
    )); // prints "[0123, 4567, 8901, 2345, 6789, 0]"

Using a lookbehind/lookahead combo:

    String str = "HelloThereHowAreYou";
    System.out.println(java.util.Arrays.toString(
        str.split("(?<=[a-z])(?=[A-Z])")
    )); // prints "[Hello, There, How, Are, You]"

Related questions

  • Can you use zero-width matching regex in String split?
  • Backreferences in lookbehind
  • How do I convert CamelCase into human-readable names in Java?

The reason a$ and i$ are getting removed is that the regexp [^\\]\$ matches any character that is not '\' followed by '$'. You need to use zero width assertions

This is the same problem people have trying to find q not followed by u.

A first cut at the proper regexp is /(?<!\\)\$/ ( "(?<!\\\\)\\$" in java )

class Test {
 public static void main(String[] args) {
  String regexp = "(?<!\\\\)\\$";
  System.out.println( java.util.Arrays.toString( "1a$1e\\$li$lo".split(regexp) ) );
 }
}

Yields:
[1a, 1e\$li, lo]


You can try first replacing "\$" with another string, such as the URL Encoding for $ ("%24"), and then splitting:

String splits[] = str.replace("\$","%24").split("[^\\\\]\\$");
for(String str : splits){
   str = str.replace("%24","\$");
}

More generally, if str is constructed by something like

str = a + "$" + b + "$" + c

Then you can URLEncode a, b and c before appending them together

import java.net.URLEncoder.encode;
...
str = encode(a) + "$" + encode(b) + "$" + encode(c)