Is DocumentBuilder thread safe?
The current code base that I am looking at uses the DOM parser. The following code fragment is duplicated in 5 methods :
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
If a method that contains the above code is called in a loop or the method is called multiple times in the application, we are bearing the overhead of creating a new DocumentBuilderFactory instance and a new DocumentBuilder instance for each call to such a method.
Would it be a good idea to create a singleton wrapper around the DocumentBuilder factory and DocumentBuilder instances as shown below :
public final class DOMParser {
private DocumentBuilderFactory = new DocumentBuilderFactory();
private DocumentBuilder builder;
private static DOMParser instance = new DOMParser();
private DOMParser() {
builder = factory.newDocumentBuilder();
}
public Document parse(InputSource xml) {
return builder.parser(xml);
}
}
Are there any problems that can arise if the above singleton is shared across multiple threads? If not, will there be any performance gain by using the above approach of creating the DocumentBuilderFactory and the DocumentBuilder instances only once throughout the lifetime of the application?
Edit :
The only time we can face a problem is if DocumentBuilder saves some state information while parsing an XML file which can affect the parsing of the next XML file.
See the comments section for other questions about the same matter. Short answer for your question: no, it's not ok to put these classes in a singleton. Neither DocumentBuilderFactory nor DocumentBuilder are guaranteed to be thread safe. If you have several threads parsing XML, make sure each thread has its own version of DoumentBuilder. You only need one of them per thread since you can reuse a DocumentBuilder after you reset it.
EDIT A small snippet to show that using same DocumentBuilder is bad. With java 1.6_u32 and 1.7_u05 this code fails with org.xml.sax.SAXException: FWK005 parse may not be called while parsing
. Uncomment synchronization on builder, and it works fine:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
final DocumentBuilder builder = factory.newDocumentBuilder();
ExecutorService exec = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
exec.submit(new Runnable() {
public void run() {
try {
// synchronized (builder) {
InputSource is = new InputSource(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\" ?><俄语>данные</俄语>"));
builder.parse(is);
builder.reset();
// }
} catch (Exception e) {
e.printStackTrace();
}
}
});
}
exec.shutdown();
So here's your answer - do not call DocumentBuilder.parse()
from multiple threads. Yes, this behavior might be JRE specific, if you're using IBM java or JRockit or give it a different DocumentBuilderImpl, it might work fine, but for default xerces implementation - it does not.
The JAXP Specification (V 1.4) says:
It is expected that the newSAXParser method of a SAXParserFactory implementation, the newDocumentBuilder method of a DocumentBuilderFactory and the newTransformer method of a TransformerFactory will be thread safe without side effects. This means that an application programmer should expect to be able to create transformer instances in multiple threads at once from a shared factory without side effects or problems.
https://jaxp.java.net/docs/spec/html/#plugabililty-thread-safety
So, for example, you should be able to create a single DocumentBuilderFactory instance via DocumentBuilderFactory.newInstance and then use that single factory to create a DocumentBuilder per thread via DocumentBuilderFactory.newDocumentBuilder. You could also create a pool of DocumentBuilders.
I can't find anywhere that says that, for example, the static method DocumentBuilderFactory.newInstance is thread-safe. The implementation appears thread-safe in that there is some method synchronization being done, but the spec specifically says that DocumentBuilderFactory.newDocumentBuilder is thread safe.