Is there any difference between 'valid xml' and 'well formed xml'?
There is a difference, yes.
XML that adheres to the XML standard is considered well formed, while xml that adheres to a DTD is considered valid.
Well-formed vs Valid XML
Well-formed means that a textual object meets the W3C requirements for being XML.
Valid means that well-formed XML meets additional requirements given by a specified schema.
Official Definitions
Per the W3C Recommendation for XML:
[Definition: A data object is an XML document if it is well-formed, as defined in this specification. In addition, the XML document is valid if it meets certain further constraints.]
Observations:
- A document that is not well-formed is not XML. (Well-formed XML is commonly used but technically redundant.)
- Being valid implies being well-formed.
- Being well-formed does not imply being valid.
- Although the W3C Recommendation for XML defines validity to be against a DTD, conventional use allows the term to be applied for conformance to XML schemas specified via XSD, RELAX NG, Schematron, or other methods.
Examples of what causes a document to be...
Not well-formed:
- An element lacks a closing tag (and is not self-closing).
- Elements overlap without proper nesting:
<a><b></a></b>
- An attribute value is missing a closing quote that matches the opening quote.
-
<
or&
are used in content rather than<
or&
. - Multiple root elements exist.
- Multiple XML declarations exist, or an XML declaration appears other than at the top of the document.
Invalid
- An element or attribute is missing but required by the XML schema.
- An element or attribute is used but undefined by the XML schema.
- The content of an element does not match the content specified by the XML schema.
- The value of an attribute does not match the type specified by the XML schema.
Namespace-Well-Formed
Technically, colon characters are permitted in component names in XML. However, colons should only be used in names for namespace purposes:
Note:
The Namespaces in XML Recommendation [XML Names] assigns a meaning to names containing colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character.
Therefore, another term, namespace-well-formed, is defined in the Namespaces in XML 1.0 W3C Recommendation that implies all of the XML rules for well-formedness plus those governing namespaces and namespace prefixes.
Colloquially, the term well-formed is often used where namespace-well-formed would be more precise. However, this is a minor technical manner of less practical consequence than the distinction between well-formed vs valid XML described in this answer.
Valid XML is XML that succeeds validation against a DTD.
Well formed XML is XML that has all tags closed in the proper order and, if it has a declaration, it has it first thing in the file with the proper attributes.
In other words, validity refers to semantics, well-formedness refers to syntax.
So you can have invalid well formed XML.
As others have said, well-formed XML conforms to the XML spec, and valid XML conforms to a given schema.
Another way to put it is that well-formed XML is lexically correct (it can be parsed), while valid XML is grammatically correct (it can be matched to a known vocabulary and grammar).
An XML document cannot be valid until it is well-formed. All XML documents are held to the same standard for well-formedness (an RFC put out by the W3). One XML document can be valid against some schemas, and invalid against others. There are a number of schema languages, many of which are themselves XML-based.
Well-Formed XML is XML that meets the syntactic requirements of the language. Not missing any closing tags, having all your singleton tags use <whatever />
instead of just <whatever>
, and having your closing tags in the right order.
Valid XML is XML that uses a DTD and complies with all its requirements. So if you use an attribute improperly, you violate the DTD and aren't valid.
All valid XML is well-formed, but not all well-formed XML is valid.