What are the different methods to parse strings in Java? [closed]

For parsing player commands, I've most often used the split method to split a string by delimiters and then to then just figure out the rest by a series of ifs or switches. What are some different ways of parsing strings in Java?


Solution 1:

I really like regular expressions. As long as the command strings are fairly simple, you can write a few regexes that could take a few pages of code to manually parse.

I would suggest you check out http://www.regular-expressions.info for a good intro to regexes, as well as specific examples for Java.

Solution 2:

I assume you're trying to make the command interface as forgiving as possible. If this is the case, I suggest you use an algorithm similar to this:

  1. Read in the string
    • Split the string into tokens
    • Use a dictionary to convert synonyms to a common form
    • For example, convert "hit", "punch", "strike", and "kick" all to "hit"
    • Perform actions on an unordered, inclusive base
    • Unordered - "punch the monkey in the face" is the same thing as "the face in the monkey punch"
    • Inclusive - If the command is supposed to be "punch the monkey in the face" and they supply "punch monkey", you should check how many commands this matches. If only one command, do this action. It might even be a good idea to have command priorities, and even if there were even matches, it would perform the top action.

Solution 3:

Parsing manually is a lot of fun... at the beginning:)

In practice if commands aren't very sophisticated you can treat them the same way as those used in command line interpreters. There's a list of libraries that you can use: http://java-source.net/open-source/command-line. I think you can start with apache commons CLI or args4j (uses annotations). They are well documented and really simple in use. They handle parsing automatically and the only thing you need to do is to read particular fields in an object.

If you have more sophisticated commands, then maybe creating a formal grammar would be a better idea. There is a very good library with graphical editor, debugger and interpreter for grammars. It's called ANTLR (and the editor ANTLRWorks) and it's free:) There are also some example grammars and tutorials.

Solution 4:

I would look at Java migrations of Zork, and lean towards a simple Natural Language Processor (driven either by tokenizing or regex) such as the following (from this link):

    public static boolean simpleNLP( String inputline, String keywords[])
    {
        int i;
        int maxToken = keywords.length;
        int to,from;
        if( inputline.length() = inputline.length()) return false; // check for blank and empty lines
        while( to >=0 )
        {
            to = inputline.indexOf(' ',from);
            if( to > 0){
                lexed.addElement(inputline.substring(from,to));
                from = to;
                while( inputline.charAt(from) == ' '
                && from = keywords.length) { status = true; break;}
            }
        }
        return status;
    }

...

Anything which gives a programmer a reason to look at Zork again is good in my book, just watch out for Grues.

...

Solution 5:

Sun itself recommends staying away from StringTokenizer and using the String.spilt method instead.

You'll also want to look at the Pattern class.