How can I iterate through the unicode codepoints of a Java String?

So I know about String#codePointAt(int), but it's indexed by the char offset, not by the codepoint offset.

I'm thinking about trying something like:

using String#charAt(int) to get the char at an index
testing whether the char is in the high-surrogates range
- if so, use String#codePointAt(int) to get the codepoint, and increment the index by 2
- if not, use the given char value as the codepoint, and increment the index by 1

But my concerns are

I'm not sure whether codepoints which are naturally in the high-surrogates range will be stored as two char values or one
this seems like an awful expensive way to iterate through characters
someone must have come up with something better.

Yes, Java uses a UTF-16-esque encoding for internal representations of Strings, and, yes, it encodes characters outside the Basic Multilingual Plane (BMP) using the surrogacy scheme.

If you know you'll be dealing with characters outside the BMP, then here is the canonical way to iterate over the characters of a Java String:

final int length = s.length();
for (int offset = 0; offset < length; ) {
   final int codepoint = s.codePointAt(offset);

   // do something with the codepoint

   offset += Character.charCount(codepoint);
}

Java 8 added CharSequence#codePoints which returns an IntStream containing the code points. You can use the stream directly to iterate over them:

string.codePoints().forEach(c -> ...);

or with a for loop by collecting the stream into an array:

for(int c : string.codePoints().toArray()){
    ...
}

These ways are probably more expensive than Jonathan Feinbergs's solution, but they are faster to read/write and the performance difference will usually be insignificant.

How can I download HTML source in C#

Show datalist labels but submit the actual value

How to differentiate single click event and double click event?

jQuery disable SELECT options based on Radio selected (Need support for all browsers)

Overload a C++ function according to the return value

Merge lists that share common elements

Is there a way to store a function in a list or dictionary so that when the index (or key) is called it fires off the stored function?

Absolute URLs omitting the protocol (scheme) in order to preserve the one of the current page

MIN/MAX vs ORDER BY and LIMIT

Remove header and footer from window.print()

Android Studio marks R in red with error message "cannot resolve symbol R", but build succeeds

“Deprecation warning: moment construction falls back to js Date” when trying to convert RFC2822 date in moment.js