Checking for a not null, not blank String in Java

I am trying to check if a Java String is not null, not empty and not whitespace.

In my mind, this code should have been quite up for the job.

public static boolean isEmpty(String s) {
    if ((s != null) && (s.trim().length() > 0))
        return false;
    else
        return true;
}

As per documentation, String.trim() should work thus:

Returns a copy of the string, with leading and trailing whitespace omitted.

If this String object represents an empty character sequence, or the first and last characters of character sequence represented by this String object both have codes greater than '\u0020' (the space character), then a reference to this String object is returned.

However, apache/commons/lang/StringUtils.java does it a little differently.

public static boolean isBlank(String str) {
    int strLen;
    if (str == null || (strLen = str.length()) == 0) {
        return true;
    }
    for (int i = 0; i < strLen; i++) {
        if ((Character.isWhitespace(str.charAt(i)) == false)) {
            return false;
        }
    }
    return true;
}

As per documentation, Character.isWhitespace():

Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:

  • It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
  • It is '\t', U+0009 HORIZONTAL TABULATION.
  • It is '\n', U+000A LINE FEED.
  • It is '\u000B', U+000B VERTICAL TABULATION.
  • It is '\f', U+000C FORM FEED.
  • It is '\r', U+000D CARRIAGE RETURN.
  • It is '\u001C', U+001C FILE SEPARATOR.
  • It is '\u001D', U+001D GROUP SEPARATOR.
  • It is '\u001E', U+001E RECORD SEPARATOR.
  • It is '\u001F', U+001F UNIT SEPARATOR.

If I am not mistaken - or might be I am just not reading it correctly - the String.trim() should take away any of the characters that are being checked by Character.isWhiteSpace(). All of them see to be above '\u0020'.

In this case, the simpler isEmpty function seems to be covering all the scenarios that the lengthier isBlank is covering.

  1. Is there a string that will make the isEmpty and isBlank behave differently in a test case?
  2. Assuming there are none, is there any other consideration because of which I should choose isBlank and not use isEmpty?

For those interested in actually running a test, here are the methods and unit tests.

public class StringUtil {

    public static boolean isEmpty(String s) {
        if ((s != null) && (s.trim().length() > 0))
            return false;
        else
            return true;
    }

    public static boolean isBlank(String str) {
        int strLen;
        if (str == null || (strLen = str.length()) == 0) {
            return true;
        }
        for (int i = 0; i < strLen; i++) {
            if ((Character.isWhitespace(str.charAt(i)) == false)) {
                return false;
            }
        }
        return true;
    }
}

And unit tests

@Test
public void test() {
    
    String s = null; 
    assertTrue(StringUtil.isEmpty(s)) ;
    assertTrue(StringUtil.isBlank(s)) ;
    
    s = ""; 
    assertTrue(StringUtil.isEmpty(s)) ;
    assertTrue(StringUtil.isBlank(s)); 
    
    s = " "; 
    assertTrue(StringUtil.isEmpty(s)) ;
    assertTrue(StringUtil.isBlank(s)) ;
    
    s = "   "; 
    assertTrue(StringUtil.isEmpty(s)) ;
    assertTrue(StringUtil.isBlank(s)) ;
    
    s = "   a     "; 
    assertTrue(StringUtil.isEmpty(s)==false) ;    
    assertTrue(StringUtil.isBlank(s)==false) ;       
    
}

Update: It was a really interesting discussion - and this is why I love Stack Overflow and the folks here. By the way, coming back to the question, we got:

  • A program showing which all characters will make the behave differently. The code is at https://ideone.com/ELY5Wv. Thanks @Dukeling.
  • A performance related reason for choosing the standard isBlank(). Thanks @devconsole.
  • A comprehensive explanation by @nhahtdh. Thanks mate.

Solution 1:

Is there a string that will make the isEmpty and isBlank behave differently in a test case?

Note that Character.isWhitespace can recognize Unicode characters and return true for Unicode whitespace characters.

Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:

  • It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').

  • [...]

On the other hand, trim() method would trim all control characters whose code points are below U+0020 and the space character (U+0020).

Therefore, the two methods would behave differently at presence of a Unicode whitespace character. For example: "\u2008". Or when the string contains control characters that are not consider whitespace by Character.isWhitespace method. For example: "\002".

If you were to write a regular expression to do this (which is slower than doing a loop through the string and check):

  • isEmpty() would be equivalent to .matches("[\\x00-\\x20]*")
  • isBlank() would be equivalent to .matches("\\p{javaWhitespace}*")

(The isEmpty() and isBlank() method both allow for null String reference, so it is not exactly equivalent to the regex solution, but putting that aside, it is equivalent).

Note that \p{javaWhitespace}, as its name implied, is Java-specific syntax to access the character class defined by Character.isWhitespace method.

Assuming there are none, is there any other consideration because of which I should choose isBlank and not use isEmpty?

It depends. However, I think the explanation in the part above should be sufficient for you to decide. To sum up the difference:

  • isEmpty() will consider the string is empty if it contains only control characters1 below U+0020 and space character (U+0020)

  • isBlank will consider the string is empty if it contains only whitespace characters as defined by Character.isWhitespace method, which includes Unicode whitespace characters.

1 There is also the control character at U+007F DELETE, which is not trimmed by trim() method.

Solution 2:

The purpose of the two standard methods is to distinguish between this two cases:

org.apache.common.lang.StringUtils.isBlank(" ") (will return true).

org.apache.common.lang.StringUtils.isEmpty(" ") (will return false).

Your custom implementation of isEmpty() will return true.


UPDATE:

  • org.apache.common.lang.StringUtils.isEmpty() is used to find if the String is length 0 or null.

  • org.apache.common.lang.StringUtils.isBlank() takes it a step forward. It not only checks if the String is length 0 or null, but also checks if it is only a whitespace string.

In your case, you're trimming the String in your isEmpty method. The only difference that can occur now can't occur (the case you gives it " ") because you're trimming it (Removing the trailing whitespace - which is in this case is like removing all spaces).

Solution 3:

I would choose isBlank() over isEmpty() because trim() creates a new String object that has to be garbage collected later. isBlank() on the other hand does not create any objects.

Solution 4:

You could take a look at JSR 303 Bean Validtion wich contains the Annotatinos @NotEmpty and @NotNull. Bean Validation is cool because you can seperate validation issues from the original intend of the method.