What is the best way to get the first letter from a string in Java, returned as a string of length 1?
Assume the following:
String example = "something";
String firstLetter = "";
Are there differences to be aware of with the following ways of assigning firstLetter
that could impact performance; which would be best, and why?
firstLetter = String.valueOf(example.charAt(0));
firstLetter = Character.toString(example.charAt(0));
firstLetter = example.substring(0, 1);
The reason the first letter is being returned as a String
is that this is being run in Hadoop, and a string is required to assign to a Text
type, firstLetter
will be output as a key
from a map()
method, for example:
public class FirstLetterMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
String line = new String();
Text firstLetter = new Text();
IntWritable wordLength = new IntWritable();
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
line = value.toString();
for (String word : line.split("\\W+")){
if (word.length() > 0) {
// ---------------------------------------------
// firstLetter assignment
firstLetter.set(String.valueOf(word.charAt(0)).toLowerCase());
// ---------------------------------------------
wordLength.set(word.length());
context.write(firstLetter, wordLength);
}
}
}
}
Solution 1:
Performance wise substring(0, 1)
is better as found by following:
String example = "something";
String firstLetter = "";
long l=System.nanoTime();
firstLetter = String.valueOf(example.charAt(0));
System.out.println("String.valueOf: "+ (System.nanoTime()-l));
l=System.nanoTime();
firstLetter = Character.toString(example.charAt(0));
System.out.println("Character.toString: "+ (System.nanoTime()-l));
l=System.nanoTime();
firstLetter = example.substring(0, 1);
System.out.println("substring: "+ (System.nanoTime()-l));
Output:
String.valueOf: 38553
Character.toString: 30451
substring: 8660
Solution 2:
Long story short, it probably doesn't matter. Use whichever you think looks nicest.
Longer answer, using Oracle's Java 7 JDK specifically, since this isn't defined at the JLS:
String.valueOf
or Character.toString
work the same way, so use whichever you feel looks nicer. In fact, Character.toString
simply calls String.valueOf
(source).
So the question is, should you use one of those or String.substring
. Here again it doesn't matter much. String.substring
uses the original string's char[]
and so allocates one object fewer than String.valueOf
. This also prevents the original string from being GC'ed until the one-character string is available for GC (which can be a memory leak), but in your example, they'll both be available for GC after each iteration, so that doesn't matter. The allocation you save also doesn't matter -- a char[1]
is cheap to allocate, and short-lived objects (as the one-char string will be) are cheap to GC, too.
If you have a large enough data set that the three are even measurable, substring
will probably give a slight edge. Like, really slight. But that "if... measurable" contains the real key to this answer: why don't you just try all three and measure which one is fastest?
Solution 3:
String whole = "something";
String first = whole.substring(0, 1);
System.out.println(first);
Solution 4:
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import java.util.concurrent.TimeUnit;
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1)
@Fork(value = 1)
@Measurement(iterations = 5, time = 1)
public class StringFirstCharBenchmark {
private String source;
@Setup
public void init() {
source = "MALE";
}
@Benchmark
public String substring() {
return source.substring(0, 1);
}
@Benchmark
public String indexOf() {
return String.valueOf(source.indexOf(0));
}
}
Results:
+----------------------------------------------------------------------+
| Benchmark Mode Cnt Score Error Units |
+----------------------------------------------------------------------+
| StringFirstCharBenchmark.indexOf avgt 5 23.777 ? 5.788 ns/op |
| StringFirstCharBenchmark.substring avgt 5 11.305 ? 1.411 ns/op |
+----------------------------------------------------------------------+