What's the best way to validate an XML file against an XSD file?

I'm generating some xml files that needs to conform to an xsd file that was given to me. What's the best way to verify they conform?


The Java runtime library supports validation. Last time I checked this was the Apache Xerces parser under the covers. You should probably use a javax.xml.validation.Validator.

import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import java.net.URL;
import org.xml.sax.SAXException;
//import java.io.File; // if you use File
import java.io.IOException;
...
URL schemaFile = new URL("http://host:port/filename.xsd");
// webapp example xsd: 
// URL schemaFile = new URL("http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd");
// local file example:
// File schemaFile = new File("/location/to/localfile.xsd"); // etc.
Source xmlFile = new StreamSource(new File("web.xml"));
SchemaFactory schemaFactory = SchemaFactory
    .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
  Schema schema = schemaFactory.newSchema(schemaFile);
  Validator validator = schema.newValidator();
  validator.validate(xmlFile);
  System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
  System.out.println(xmlFile.getSystemId() + " is NOT valid reason:" + e);
} catch (IOException e) {}

The schema factory constant is the string http://www.w3.org/2001/XMLSchema which defines XSDs. The above code validates a WAR deployment descriptor against the URL http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd but you could just as easily validate against a local file.

You should not use the DOMParser to validate a document (unless your goal is to create a document object model anyway). This will start creating DOM objects as it parses the document - wasteful if you aren't going to use them.


Here's how to do it using Xerces2. A tutorial for this, here (req. signup).

Original attribution: blatantly copied from here:

import org.apache.xerces.parsers.DOMParser;
import java.io.File;
import org.w3c.dom.Document;

public class SchemaTest {
  public static void main (String args[]) {
      File docFile = new File("memory.xml");
      try {
        DOMParser parser = new DOMParser();
        parser.setFeature("http://xml.org/sax/features/validation", true);
        parser.setProperty(
             "http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", 
             "memory.xsd");
        ErrorChecker errors = new ErrorChecker();
        parser.setErrorHandler(errors);
        parser.parse("memory.xml");
     } catch (Exception e) {
        System.out.print("Problem parsing the file.");
     }
  }
}

We build our project using ant, so we can use the schemavalidate task to check our config files:

<schemavalidate> 
    <fileset dir="${configdir}" includes="**/*.xml" />
</schemavalidate>

Now naughty config files will fail our build!

http://ant.apache.org/manual/Tasks/schemavalidate.html


Since this is a popular question, I will point out that java can also validate against "referred to" xsd's, for instance if the .xml file itself specifies XSD's in the header, using xsi:schemaLocation or xsi:noNamespaceSchemaLocation (or xsi for particular namespaces) ex:

<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="http://www.example.com/document.xsd">
  ...

or schemaLocation (always a list of namespace to xsd mappings)

<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.example.com/my_namespace http://www.example.com/document.xsd">
  ...

The other answers work here as well, because the .xsd files "map" to the namespaces declared in the .xml file, because they declare a namespace, and if matches up with the namespace in the .xml file, you're good. But sometimes it's convenient to be able to have a custom resolver...

From the javadocs: "If you create a schema without specifying a URL, file, or source, then the Java language creates one that looks in the document being validated to find the schema it should use. For example:"

SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();

and this works for multiple namespaces, etc. The problem with this approach is that the xmlsns:xsi is probably a network location, so it'll by default go out and hit the network with each and every validation, not always optimal.

Here's an example that validates an XML file against any XSD's it references (even if it has to pull them from the network):

  public static void verifyValidatesInternalXsd(String filename) throws Exception {
    InputStream xmlStream = new new FileInputStream(filename);
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(true);
    factory.setNamespaceAware(true);
    factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
                 "http://www.w3.org/2001/XMLSchema");
    DocumentBuilder builder = factory.newDocumentBuilder();
    builder.setErrorHandler(new RaiseOnErrorHandler());
    builder.parse(new InputSource(xmlStream));
    xmlStream.close();
  }

  public static class RaiseOnErrorHandler implements ErrorHandler {
    public void warning(SAXParseException e) throws SAXException {
      throw new RuntimeException(e);
    }
    public void error(SAXParseException e) throws SAXException {
      throw new RuntimeException(e);
    }
    public void fatalError(SAXParseException e) throws SAXException {
      throw new RuntimeException(e);
    }
  }

You can avoid pulling referenced XSD's from the network, even though the xml files reference url's, by specifying the xsd manually (see some other answers here) or by using an "XML catalog" style resolver. Spring apparently also can intercept the URL requests to serve local files for validations. Or you can set your own via setResourceResolver, ex:

Source xmlFile = new StreamSource(xmlFileLocation);
SchemaFactory schemaFactory = SchemaFactory
                                .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema();
Validator validator = schema.newValidator();
validator.setResourceResolver(new LSResourceResolver() {
  @Override
  public LSInput resolveResource(String type, String namespaceURI,
                                 String publicId, String systemId, String baseURI) {
    InputSource is = new InputSource(
                           getClass().getResourceAsStream(
                          "some_local_file_in_the_jar.xsd"));
                          // or lookup by URI, etc...
    return new Input(is); // for class Input see 
                          // https://stackoverflow.com/a/2342859/32453
  }
});
validator.validate(xmlFile);

See also here for another tutorial.

I believe the default is to use DOM parsing, you can do something similar with SAX parser that is validating as well saxReader.setEntityResolver(your_resolver_here);


Using Java 7 you can follow the documentation provided in package description.

// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File("mySchema.xsd"));
Schema schema = factory.newSchema(schemaFile);

// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();

// validate the DOM tree
try {
    validator.validate(new StreamSource(new File("instance.xml"));
} catch (SAXException e) {
    // instance document is invalid!
}