Recursion in an XML schema?
I need to create an XML schema that validates a tree structure of an XML document. I don't know exactly the occurrences or depth level of the tree.
XML example:
<?xml version="1.0" encoding="utf-8"?>
<node>
<attribute/>
<node>
<attribute/>
<node/>
</node>
</node>
Which is the best way to validate it? Recursion?
if you need a recursive type declaration, here is an example that might help:
<xs:schema id="XMLSchema1"
targetNamespace="http://tempuri.org/XMLSchema1.xsd"
elementFormDefault="qualified"
xmlns="http://tempuri.org/XMLSchema1.xsd"
xmlns:mstns="http://tempuri.org/XMLSchema1.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
<xs:element name="node" type="nodeType"></xs:element>
<xs:complexType name="nodeType">
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element name="node" type="nodeType"></xs:element>
</xs:sequence>
</xs:complexType>
</xs:schema>
As you can see, this defines a recursive schema with only one node named "node" which can be as deep as desired.
XSD does indeed allow for recursion of elements. Here is a sample for you
<xsd:element name="section">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="title"/>
<xsd:element ref="para" maxOccurs="unbounded"/>
<xsd:element ref="section" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
As you can see the section element contains a child element that is of type section.
The other solutions work great for making root elements recursive. However, in order to make a non-root element recursive without turning it into a valid root element in the process, a slightly different approach is needed.
Let's say you want to define an XML message format for exchanging structured data between nodes in a distributed application. It contains the following elements:
-
<message>
- the root element; -
<from>
- the message's origin; -
<to>
- the message's destination; -
<type>
- the data structure type encoded in the message; -
<data>
- the data contained in the message.
In order to support complex data types, <data>
is a recursive element. This makes possible to write messages as below, for sending e.g. a geometry_msgs/TwistStamped
message to a flying drone specifying its linear and angular (i.e. rotating) speeds:
<?xml version="1.0" encoding="utf-8"?>
<message xmlns="https://stackoverflow.com/message/1.0.0">
<from>controller:8080</from>
<to>drone:8080</to>
<type>geometry_msgs/TwistStamped</type>
<data name="header">
<data name="seq">0</data>
<data name="stamp">
<data name="sec">1</data>
<data name="nsec">0</data>
</data>
<data name="frame_id">base_link</data>
</data>
<data name="twist">
<data name="linear">
<data name="x">1.0</data>
<data name="y">0</data>
<data name="z">1.0</data>
</data>
<data name="angular">
<data name="x">0.3</data>
<data name="y">0</data>
<data name="z">0</data>
</data>
</data>
</message>
We can easily write an XML schema to validate this format:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="https://stackoverflow.com/message/1.0.0"
elementFormDefault="qualified"
xmlns="https://stackoverflow.com/message/1.0.0"
>
<xs:element name="data">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element ref="data" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="name" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="message">
<xs:complexType>
<xs:sequence>
<xs:element name="from" type="xs:string"/>
<xs:element name="to" type="xs:string"/>
<xs:element name="type" type="xs:string"/>
<xs:element ref="data" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The problem with the schema above is that it makes <data>
a root element, which means it also validates the document below:
<?xml version="1.0" encoding="utf-8"?>
<data xmlns="https://stackoverflow.com/message/1.0.0" name="twist">
<data name="header">
<data name="seq">0</data>
<data name="stamp">
<data name="sec">1</data>
<data name="nsec">0</data>
</data>
<data name="frame_id">base_link</data>
</data>
<data name="twist">
<data name="linear">
<data name="x">1.0</data>
<data name="y">0</data>
<data name="z">1.0</data>
</data>
<data name="angular">
<data name="x">0.3</data>
<data name="y">0</data>
<data name="z">0</data>
</data>
</data>
</data>
In order to avoid this side-effect, instead of defining the <data>
element directly at the global level, we first define a data
type, then define a data
element of that type inside message
:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="https://stackoverflow.com/message/1.0.0"
elementFormDefault="qualified"
xmlns="https://stackoverflow.com/message/1.0.0"
>
<xs:complexType name="data" mixed="true">
<xs:sequence>
<xs:element name="data" type="data" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="name" type="xs:string" use="required"/>
</xs:complexType>
<xs:element name="message">
<xs:complexType>
<xs:sequence>
<xs:element name="from" type="xs:string"/>
<xs:element name="to" type="xs:string"/>
<xs:element name="type" type="xs:string"/>
<xs:element name="data" type="data" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Notice that we end up having to define the <data>
element twice — once inside the data
type, and again inside <element>
— but apart a little work duplication this is of no consequence.