If YAML ain't markup language, what is it?

Here's the real story... :)

Clark, Oren and I started working on YAML in April 2001. Oren and Clark were part of the SML mailing list, which was trying to make XML simpler. I had just written a data serialization language for Perl called Data::Denter. Clark contacted me to tell me about an idea they had called YAML, which looked similar to Data::Denter syntax. Clark already had acquired yaml.org.

After a few months of us working together, I pointed out that YAML (which most definitely stood for Yet Another Markup Language at that time) was not really a markup language (marking up various elements of a text document) but a serialization language (textual representation of typed/cyclical data graphs). We all liked the name YAML, so we backronymed it to mean YAML Ain't Markup Language.

http://yaml.org/spec/ starts with:

YAML™ (rhymes with “camel”) is a human-friendly, cross language, Unicode based data serialization language designed around the common native data structures of agile programming languages.

I couldn't have said it better myself… :


So, a markup language presumes a base text, typically human readable, and then special indicators or "markup" which direct processing. The idea comes from an editor, who would take a printed-version of someone's manuscript, and "mark it up" to show where new lines should go, edits, etc.

In this manner, SGML is a meta-language for declaring markup languages, and HTML is a markup language. In 1996-7, when XML came on the scene, it was sold as a simplified SGML meta-language for creating markup languages. In XML (and SGML), you have elements to "mark" a portion of text, and then attributes that modify the marking. Over time, XML was used for much more than document markup though, people used it for data serialization -- even though it was never designed to do such a thing. Of course, it was the big problem to be solved.

YAML and JSON appeared on the scene and focused on data serialization, not document markup. In these languages, there simply isn't a core document text.
Hence, YAML Ain't Markup Language is quite an accurate differentiator from XML.


XML inherited the "ML" part of its name from HTML and SGML, which are "markup" languages in that what they describe is a stream of plain text together with markup instructions such as "this piece of the text should be bold" or "this piece of the text is a heading". That is, those particular parts of the text are marked up as being bold, or a heading.

Later, some people took to writing XML that consisted only of tags and attributes, with no plain text for those tags to mark up. (Opinions and styles differ as to whether that is an appropriate use of XML). When used that way, XML becomes a language for writing down tree-structured data.

YAML is "not a" markup language because its data model contains only the tree structure, with no notion of an underlying linear text that the tree structure applies to. There's nothing to mark up there -- or put differently, the data represented by a YAML stream is not markup. In contrast, the data represented by an XML tag is markup, or at least according to some points of view is supposed to be. (In both cases, the representation of said data contains some markup, such as colons and indentations in YAML or '=' and quotes in XML, but that is not the point).


Here's a quote from a page about YAML:

I suppose the very first question on readers' minds has to be, "why the name YAML?" There are a number of tools that have cutely adopted acronyms of the form "YA*", to mean "Yet Another XXX." In the arms race of open-source wit, YAML eschews its implied acronym, instead settling on the recursive "YAML Ain't Markup Language." There is a certain sense to this, however: YAML does what markup languages do, but without requiring any, well, markup.

The name was chosen because it requires much less markup than other than other traditional languages, such as XML. It distinguishes it as more data-oriented rather than markup-oriented.