Recursion in an XML schema?

I need to create an XML schema that validates a tree structure of an XML document. I don't know exactly the occurrences or depth level of the tree.

XML example:

<?xml version="1.0" encoding="utf-8"?>
<node>
  <attribute/>
  <node>
    <attribute/>
    <node/>      
  </node>
</node> 

Which is the best way to validate it? Recursion?


if you need a recursive type declaration, here is an example that might help:

<xs:schema id="XMLSchema1"
    targetNamespace="http://tempuri.org/XMLSchema1.xsd"
    elementFormDefault="qualified"
    xmlns="http://tempuri.org/XMLSchema1.xsd"
    xmlns:mstns="http://tempuri.org/XMLSchema1.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
  <xs:element name="node" type="nodeType"></xs:element>

  <xs:complexType name="nodeType">    
    <xs:sequence minOccurs="0" maxOccurs="unbounded">
      <xs:element name="node" type="nodeType"></xs:element>
    </xs:sequence>
  </xs:complexType>

</xs:schema>

As you can see, this defines a recursive schema with only one node named "node" which can be as deep as desired.


XSD does indeed allow for recursion of elements. Here is a sample for you

<xsd:element name="section">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element ref="title"/>
      <xsd:element ref="para" maxOccurs="unbounded"/>
      <xsd:element ref="section" minOccurs="0" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>

As you can see the section element contains a child element that is of type section.


The other solutions work great for making root elements recursive. However, in order to make a non-root element recursive without turning it into a valid root element in the process, a slightly different approach is needed.

Let's say you want to define an XML message format for exchanging structured data between nodes in a distributed application. It contains the following elements:

  • <message> - the root element;
  • <from> - the message's origin;
  • <to> - the message's destination;
  • <type> - the data structure type encoded in the message;
  • <data> - the data contained in the message.

In order to support complex data types, <data> is a recursive element. This makes possible to write messages as below, for sending e.g. a geometry_msgs/TwistStamped message to a flying drone specifying its linear and angular (i.e. rotating) speeds:

<?xml version="1.0" encoding="utf-8"?>

<message xmlns="https://stackoverflow.com/message/1.0.0">
  <from>controller:8080</from>
  <to>drone:8080</to>
  <type>geometry_msgs/TwistStamped</type>
  <data name="header">
    <data name="seq">0</data>
    <data name="stamp">
      <data name="sec">1</data>
      <data name="nsec">0</data>
    </data>
    <data name="frame_id">base_link</data>
  </data>
  <data name="twist">
    <data name="linear">
      <data name="x">1.0</data>
      <data name="y">0</data>
      <data name="z">1.0</data>
    </data>
    <data name="angular">
      <data name="x">0.3</data>
      <data name="y">0</data>
      <data name="z">0</data>
    </data>
  </data>
</message>

We can easily write an XML schema to validate this format:

<?xml version="1.0" encoding="utf-8"?>

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="https://stackoverflow.com/message/1.0.0"
  elementFormDefault="qualified"
  xmlns="https://stackoverflow.com/message/1.0.0"
>
  <xs:element name="data">
    <xs:complexType mixed="true">
      <xs:sequence>
        <xs:element ref="data" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="name" type="xs:string" use="required"/>
    </xs:complexType>
  </xs:element>

  <xs:element name="message">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="from" type="xs:string"/>
        <xs:element name="to" type="xs:string"/>
        <xs:element name="type" type="xs:string"/>
        <xs:element ref="data" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

The problem with the schema above is that it makes <data> a root element, which means it also validates the document below:

<?xml version="1.0" encoding="utf-8"?>

<data xmlns="https://stackoverflow.com/message/1.0.0" name="twist">
  <data name="header">
    <data name="seq">0</data>
    <data name="stamp">
      <data name="sec">1</data>
      <data name="nsec">0</data>
    </data>
    <data name="frame_id">base_link</data>
  </data>
  <data name="twist">
    <data name="linear">
      <data name="x">1.0</data>
      <data name="y">0</data>
      <data name="z">1.0</data>
    </data>
    <data name="angular">
      <data name="x">0.3</data>
      <data name="y">0</data>
      <data name="z">0</data>
    </data>
  </data>
</data>

In order to avoid this side-effect, instead of defining the <data> element directly at the global level, we first define a data type, then define a data element of that type inside message:

<?xml version="1.0" encoding="utf-8"?>

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="https://stackoverflow.com/message/1.0.0"
  elementFormDefault="qualified"
  xmlns="https://stackoverflow.com/message/1.0.0"
>
  <xs:complexType name="data" mixed="true">
    <xs:sequence>
      <xs:element name="data" type="data" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="name" type="xs:string" use="required"/>
  </xs:complexType>

  <xs:element name="message">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="from" type="xs:string"/>
        <xs:element name="to" type="xs:string"/>
        <xs:element name="type" type="xs:string"/>
        <xs:element name="data" type="data" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Notice that we end up having to define the <data> element twice — once inside the data type, and again inside <element> — but apart a little work duplication this is of no consequence.