Checking for a not null, not blank String in Java
I am trying to check if a Java String is not null
, not empty and not whitespace.
In my mind, this code should have been quite up for the job.
public static boolean isEmpty(String s) {
if ((s != null) && (s.trim().length() > 0))
return false;
else
return true;
}
As per documentation, String.trim()
should work thus:
Returns a copy of the string, with leading and trailing whitespace omitted.
If this
String
object represents an empty character sequence, or the first and last characters of character sequence represented by thisString
object both have codes greater than'\u0020'
(the space character), then a reference to thisString
object is returned.
However, apache/commons/lang/StringUtils.java
does it a little differently.
public static boolean isBlank(String str) {
int strLen;
if (str == null || (strLen = str.length()) == 0) {
return true;
}
for (int i = 0; i < strLen; i++) {
if ((Character.isWhitespace(str.charAt(i)) == false)) {
return false;
}
}
return true;
}
As per documentation, Character.isWhitespace()
:
Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
- It is a Unicode space character (
SPACE_SEPARATOR
,LINE_SEPARATOR
, orPARAGRAPH_SEPARATOR
) but is not also a non-breaking space ('\u00A0'
,'\u2007'
,'\u202F'
).- It is
'\t'
, U+0009 HORIZONTAL TABULATION.- It is
'\n'
, U+000A LINE FEED.- It is
'\u000B'
, U+000B VERTICAL TABULATION.- It is
'\f'
, U+000C FORM FEED.- It is
'\r'
, U+000D CARRIAGE RETURN.- It is
'\u001C'
, U+001C FILE SEPARATOR.- It is
'\u001D'
, U+001D GROUP SEPARATOR.- It is
'\u001E'
, U+001E RECORD SEPARATOR.- It is
'\u001F'
, U+001F UNIT SEPARATOR.
If I am not mistaken - or might be I am just not reading it correctly - the String.trim()
should take away any of the characters that are being checked by Character.isWhiteSpace()
. All of them see to be above '\u0020'
.
In this case, the simpler isEmpty
function seems to be covering all the scenarios that the lengthier isBlank
is covering.
- Is there a string that will make the
isEmpty
andisBlank
behave differently in a test case? - Assuming there are none, is there any other consideration because of which I should choose
isBlank
and not useisEmpty
?
For those interested in actually running a test, here are the methods and unit tests.
public class StringUtil {
public static boolean isEmpty(String s) {
if ((s != null) && (s.trim().length() > 0))
return false;
else
return true;
}
public static boolean isBlank(String str) {
int strLen;
if (str == null || (strLen = str.length()) == 0) {
return true;
}
for (int i = 0; i < strLen; i++) {
if ((Character.isWhitespace(str.charAt(i)) == false)) {
return false;
}
}
return true;
}
}
And unit tests
@Test
public void test() {
String s = null;
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s)) ;
s = "";
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s));
s = " ";
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s)) ;
s = " ";
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s)) ;
s = " a ";
assertTrue(StringUtil.isEmpty(s)==false) ;
assertTrue(StringUtil.isBlank(s)==false) ;
}
Update: It was a really interesting discussion - and this is why I love Stack Overflow and the folks here. By the way, coming back to the question, we got:
- A program showing which all characters will make the behave differently. The code is at https://ideone.com/ELY5Wv. Thanks @Dukeling.
- A performance related reason for choosing the standard
isBlank()
. Thanks @devconsole. - A comprehensive explanation by @nhahtdh. Thanks mate.
Solution 1:
Is there a string that will make the
isEmpty
andisBlank
behave differently in a test case?
Note that Character.isWhitespace
can recognize Unicode characters and return true
for Unicode whitespace characters.
Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space character (
SPACE_SEPARATOR
,LINE_SEPARATOR
, orPARAGRAPH_SEPARATOR
) but is not also a non-breaking space ('\u00A0'
,'\u2007'
,'\u202F'
).
[...]
On the other hand, trim()
method would trim all control characters whose code points are below U+0020 and the space character (U+0020).
Therefore, the two methods would behave differently at presence of a Unicode whitespace character. For example: "\u2008"
. Or when the string contains control characters that are not consider whitespace by Character.isWhitespace
method. For example: "\002"
.
If you were to write a regular expression to do this (which is slower than doing a loop through the string and check):
-
isEmpty()
would be equivalent to.matches("[\\x00-\\x20]*")
-
isBlank()
would be equivalent to.matches("\\p{javaWhitespace}*")
(The isEmpty()
and isBlank()
method both allow for null
String reference, so it is not exactly equivalent to the regex solution, but putting that aside, it is equivalent).
Note that \p{javaWhitespace}
, as its name implied, is Java-specific syntax to access the character class defined by Character.isWhitespace
method.
Assuming there are none, is there any other consideration because of which I should choose
isBlank
and not useisEmpty
?
It depends. However, I think the explanation in the part above should be sufficient for you to decide. To sum up the difference:
-
isEmpty()
will consider the string is empty if it contains only control characters1 below U+0020 and space character (U+0020) -
isBlank
will consider the string is empty if it contains only whitespace characters as defined byCharacter.isWhitespace
method, which includes Unicode whitespace characters.
1 There is also the control character at U+007F DELETE
, which is not trimmed by trim()
method.
Solution 2:
The purpose of the two standard methods is to distinguish between this two cases:
org.apache.common.lang.StringUtils.isBlank(" ")
(will return true).
org.apache.common.lang.StringUtils.isEmpty(" ")
(will return false).
Your custom implementation of isEmpty()
will return true.
UPDATE:
org.apache.common.lang.StringUtils.isEmpty()
is used to find if the String is length 0 or null.org.apache.common.lang.StringUtils.isBlank()
takes it a step forward. It not only checks if the String is length 0 or null, but also checks if it is only a whitespace string.
In your case, you're trimming the String in your isEmpty
method. The only difference that can occur now can't occur (the case you gives it " "
) because you're trimming it (Removing the trailing whitespace - which is in this case is like removing all spaces).
Solution 3:
I would choose isBlank()
over isEmpty()
because trim()
creates a new String object that has to be garbage collected later. isBlank()
on the other hand does not create any objects.
Solution 4:
You could take a look at JSR 303 Bean Validtion wich contains the Annotatinos @NotEmpty
and @NotNull
. Bean Validation is cool because you can seperate validation issues from the original intend of the method.