What <html lang=""> attribute value should I use for a mixed language page?
I usually use this: <html lang="en">
.
However, I am working on a website that will use two languages and mix them up sometimes in the same sentence or heading.
How would the above code look in this case? Can I use <html lang="lang1 lang2">
?
Solution 1:
As far as I can tell from reading the HTML5 spec the lang
attribute:
value must be a valid BCP 47 language tag, or the empty string
Source: http://www.w3.org/TR/html5/dom.html#the-lang-and-xml:lang-attributes
There's no mention in the spec of an array of language strings and every example I've found uses a single language string.
This makes sense since really a given section can only be in one language unless we're creating a new hybrid language.
Since the lang attribute is valid on all HTML elements you can wrap your language specific code in a new tag in order to indicate its language.
<html lang="en">
[...]
<body>
<h1>I am a heading <span lang="de-DE">Eine Überschrift</span></h1>
</body>
</html>
Solution 2:
As I understand it you should be able to use <html lang="mul">
to indicate Multiple languages.
Choose subtags from the IANA Language Subtag Registry.
Source; https://www.w3.org/TR/2007/NOTE-i18n-html-tech-lang-20070412/#ri20030112.224623362
There is a subtag in the list named Subtag: mul
Source: http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
However I don't think you will be able to specify exactly which languages you're mixing in the html element. However, as Jamie wrote, you can specify different lang attributes for different elements at the page.
There do exist four special language codes within ISO 639-3 and all of them are also valid within the IANA subtag registry; https://en.wikipedia.org/wiki/ISO_639-3#Special_codes
However, I doubt this have good support from search engines as Google.
Solution 3:
Adding this answer in April 2020 to provide the latest guidance from the W3C (W3.org) ...
Firstly, no, you cannot use <html lang="lang1 lang2">
since it will not validate properly. This is the result when validating via the W3's Nu Html Checker with more than one language (English and Swahili) in the language attribute of the html
tag. This error will result with or without comma(s):
Error: Bad value
en fr
for attributelang
on elementhtml
: The language subtagen swh
is not a valid language subtag.
<html lang="en swh">↩</html>
Below is the latest based on the W3C's Declaring language in HTML if you want to declare the language of polyglot web pages with more than one language:
QUICK ANSWER
Always use a language attribute on the
html
tag to declare the default language of the text in the page. When the page contains content in another language, add a language attribute to an element surrounding that content.Use the
lang
attribute for pages served as HTML, and thexml:lang
attribute > for pages served as XML. For XHTML 1.x and HTML5 polyglot documents, use both together.Use language tags from the IANA Language Subtag Registry. You can find subtags using > the unofficial Language Subtag Lookup tool.
Use nested elements to take care of content and attribute values on the same element that are in different languages.
What if element content and attribute values are in different languages?
In the image below from the W3C's site, the link text shows the language of the target page (Spanish) using the language of the target page ("Español"), but an associated title attribute contains a hint in the language of the current page ("Spanish" in English):
The markup for the above should look like follows, where the span
element inherits the default en
setting of the html
element:
<span title="Spanish"><a lang="es" href="qa-html-language-declarations.es">Español</a></span>
What if there's no element to hang your attribute on?
If you want to specify the language of some content but there is no markup around it, use an element such as span
or div
around the content. Here is an example:
<p>You'd say that in Chinese as <span lang="zh-Hans">中国科学院文献情报中心</span>.</p>
How can you specify metadata for more than one audience language?
Get the server to send the information in the HTTP Content-Language header. If your intended audience speaks more than one language, the HTTP header allows you to use a comma-separated list of languages.
Here is an example of an HTTP header that declares the resource to be a mixture of English, Hindi and Punjabi:
Content-Language: en, hi, pa
Note that this approach is not effective if your page is accessed from a hard drive, disk or other non-server based location. There is currently no widely recognized way of using this kind of metadata inside the page.
In the past, many people used a meta
element with the http-equiv
attribute set to Content-Language
. Due to long-standing confusion and inconsistent implementations of this element, the HTML5 specification made this non-conforming in HTML, so you should no longer use it.
See these links for the details:
- Declaring language in HTML
- Authoring HTML: Language declarations
- Internationalization techniques: Authoring HTML & CSS