XML Attributes vs Elements [duplicate]

There is an article titled "Principles of XML design: When to use elements versus attributes" on IBM's website.

Though there doesn't appear to be many hard and fast rules, there are some good guidelines mentioned in the posting. For instance, one of the recommendations is to use elements when your data must not be normalized for white space as XML processors can normalize data within an attribute thereby modifying the raw text.

I find myself referring to this article from time to time as I develop various XML structures. Hopefully this will be helpful to others as well.

edit - From the site:

Principle of core content

If you consider the information in question to be part of the essential material that is being expressed or communicated in the XML, put it in an element. For human-readable documents this generally means the core content that is being communicated to the reader. For machine-oriented records formats this generally means the data that comes directly from the problem domain. If you consider the information to be peripheral or incidental to the main communication, or purely intended to help applications process the main communication, use attributes. This avoids cluttering up the core content with auxiliary material. For machine-oriented records formats, this generally means application-specific notations on the main data from the problem-domain.

As an example, I have seen many XML formats, usually home-grown in businesses, where document titles were placed in an attribute. I think a title is such a fundamental part of the communication of a document that it should always be in element content. On the other hand, I have often seen cases where internal product identifiers were thrown as elements into descriptive records of the product. In some of these cases, attributes were more appropriate because the specific internal product code would not be of primary interest to most readers or processors of the document, especially when the ID was of a very long or inscrutable format.

You might have heard the principle data goes in elements, metadata in attributes. The above two paragraphs really express the same principle, but in more deliberate and less fuzzy language.

Principle of structured information

If the information is expressed in a structured form, especially if the structure may be extensible, use elements. On the other hand: If the information is expressed as an atomic token, use attributes. Elements are the extensible engine for expressing structure in XML. Almost all XML processing tools are designed around this fact, and if you break down structured information properly into elements, you'll find that your processing tools complement your design, and that you thereby gain productivity and maintainability. Attributes are designed for expressing simple properties of the information represented in an element. If you work against the basic architecture of XML by shoehorning structured information into attributes you may gain some specious terseness and convenience, but you will probably pay in maintenance costs.

Dates are a good example: A date has fixed structure and generally acts as a single token, so it makes sense as an attribute (preferably expressed in ISO-8601). Representing personal names, on the other hand, is a case where I've seen this principle surprise designers. I see names in attributes a lot, but I have always argued that personal names should be in element content. A personal name has surprisingly variable structure (in some cultures you can cause confusion or offense by omitting honorifics or assuming an order of parts of names). A personal name is also rarely an atomic token. As an example, sometimes you may want to search or sort by a forename and sometimes by a surname. I should point out that it is just as problematic to shoehorn a full name into the content of a single element as it is to put it in an attribute.


Personally I like using attributes for simple single-valued properties. Elements are (obviously) more suitable for complex types or repeated values.

For single-valued properties, attributes lead to more compact XML and simpler addressing in most APIs.


One of the better thought-out element vs attribute arguments comes from the UK GovTalk guidelines. This defines the modelling techniques used for government-related XML exchanges, but it stands on its own merits and is worth considering.

Schemas MUST be designed so that elements are the main holders of information content in the XML instances. Attributes are more suited to holding ancillary metadata – simple items providing more information about the element content. Attributes MUST NOT be used to qualify other attributes where this could cause ambiguity.

Unlike elements, attributes cannot hold structured data. For this reason, elements are preferred as the principal holders of information content. However, allowing the use of attributes to hold metadata about an element's content (for example, the format of a date, a unit of measure or the identification of a value set) can make an instance document simpler and easier to understand.

A date of birth might be represented in a message as:

 <DateOfBirth>1975-06-03</DateOfBirth> 

However, more information might be required, such as how that date of birth has been verified. This could be defined as an attribute, making the element in a message look like:

<DateOfBirth VerifiedBy="View of Birth Certificate">1975-06-03</DateOfBirth> 

The following would be inappropriate:

<DateOfBirth VerifiedBy="View of Birth Certificate" ValueSet="ISO 8601" Code="2">1975-06-03</DateOfBirth>   

It is not clear here whether the Code is qualifying the VerifiedBy or the ValueSet attribute. A more appropriate rendition would be:

 <DateOfBirth>    
   <VerifiedBy Code="2">View of Birth Certificate</VerifiedBy>     
   <Value ValueSet="ISO 8601">1975-06-03</Value>
 </DateOfBirth>

It's largely a matter of preference. I use Elements for grouping and attributes for data where possible as I see this as more compact than the alternative.

For example I prefer.....

<?xml version="1.0" encoding="utf-8"?>
<data>
    <people>
        <person name="Rory" surname="Becker" age="30" />
        <person name="Travis" surname="Illig" age="32" />
        <person name="Scott" surname="Hanselman" age="34" />
    </people>
</data>

...Instead of....

<?xml version="1.0" encoding="utf-8"?>
<data>
    <people>
        <person>
            <name>Rory</name>
            <surname>Becker</surname>
            <age>30</age>
        </person>
        <person>
            <name>Travis</name>
            <surname>Illig</surname>
            <age>32</age>
        </person>
        <person>
            <name>Scott</name>
            <surname>Hanselman</surname>
            <age>34</age>
        </person>
    </people>
</data>

However if I have data which does not represent easily inside of say 20-30 characters or contains many quotes or other characters that need escaping then I'd say it's time to break out the elements... possibly with CData blocks.

<?xml version="1.0" encoding="utf-8"?>
<data>
    <people>
        <person name="Rory" surname="Becker" age="30" >
            <comment>A programmer whose interested in all sorts of misc stuff. His Blog can be found at http://rorybecker.blogspot.com and he's on twitter as @RoryBecker</comment>
        </person>
        <person name="Travis" surname="Illig" age="32" >
            <comment>A cool guy for who has helped me out with all sorts of SVn information</comment>
        </person>
        <person name="Scott" surname="Hanselman" age="34" >
            <comment>Scott works for MS and has a great podcast available at http://www.hanselminutes.com </comment>
        </person>
    </people>
</data>

As a general rule, I avoid attributes altogether. Yes, attributes are more compact, but elements are more flexible, and flexibility is one of the most important advantages of using a data format like XML. What is a single value today can become a list of values tomorrow.

Also, if everything's an element, you never have to remember how you modeled any particular bit of information. Not using attributes means you have one less thing to think about.