What characters are allowed in an HTML attribute name?

In HTML attribute name=value pairs, what are the characters allowed for the 'name' portion? ..... Looking at some common attributes it appears that only letters (a-z and A-Z) are used, but what other chars could be allowed as well?... maybe digits (0-9), hyphens (-), and periods (.) ... is there any spec for this?


Solution 1:

It depends what you mean by "allowed". Each tag has a fixed list of attribute names which are valid, and in html they are case insensitive. In one important sense, only these characters in the correct sequence are "allowed".

Another way of looking at it, is what characters will browsers treat as a valid attribute name. The best advice here comes from the parser spec of HTML 5, which can be found here: https://html.spec.whatwg.org/multipage/syntax.html#attributes-2

It says that all characters except tab, line feed, form feed, space, solidus, greater than sign, quotation mark, apostrophe and equals sign will be treated as part of the attribute name. Personally, I wouldn't attempt pushing the edge cases of this though.

Solution 2:

Assuming you're talking about XHTML, the XML rules apply.

See http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name

Names and Tokens

[4]     NameStartChar      ::=      ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[4a]    NameChar       ::=      NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
[5]     Name       ::=      NameStartChar (NameChar)*
[6]     Names      ::=      Name (#x20 Name)*
[7]     Nmtoken    ::=      (NameChar)+
[8]     Nmtokens       ::=      Nmtoken (#x20 Nmtoken)*

Solution 3:

Since this question was asked, the web has evolved quite a bit. It's likely that authors of Web Components (custom elements) are landing here trying to learn what valid names can be used when defining attributes on custom elements.

There are several answers here that are partially correct, so I'm going to try to aggregate them and update them based on recent specs.

First, in HTML5, attribute names can start with most characters and are much more permissive than in previous versions of HTML. @S.Lott 's answer is correct for HTML 2 and XHTML, but not for HTML5.

For HTML5: (spec)

Attribute names must consist of one or more characters other than the space characters, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute names, even those for foreign elements, may be written with any mix of lower- and uppercase letters that are an ASCII case-insensitive match for the attribute's name.

That being said, other commenters here are correct, when using an attribute on a built-in element that's not in it's list of valid attributes, you're technically violating the spec. Browser authors have a lot of tolerance for this though, so in practice it doesn't do (much?) harm. A lot of libraries exploit this to enhance regular HTML tags, which causes some confusion, since it's technically not valid HTML. HTML5 provides a mechanism for custom data in attributes by using the data- attribute naming convention.

These rules are different for custom elements.

Custom element authors are welcome to implement any sort of attribute they like to their element, the names of the attributes are more restrictive than HTML5 though. In fact, the spec requires that the attribute name follow the XML Name restrictions:

The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names because they are more useful as delimiters in contexts where XML names are used outside XML documents; providing this group gives those contexts hard guarantees about what cannot be part of an XML name. The character #x037E, GREEK QUESTION MARK, is excluded because when normalized it becomes a semicolon, which could change the meaning of entity references.

Names and Tokens

[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

[4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

[5] Name ::= NameStartChar (NameChar)*

[6] Names ::= Name (#x20 Name)*

[7] Nmtoken ::= (NameChar)+

[8] Nmtokens ::= Nmtoken (#x20 Nmtoken)*

So, for custom element names you can use upper/lower alphanumeric, "_" underscore, ":" colon, or any of the unicode characters called out in the spec, as a start character, then dashes "-", dots ".", alpha etc... as body characters.