NOTE-html-lan-19980313 7.65 KB
<!doctype html public '-//W3C//DTD HTML 4.0 Transitional//EN'>



<HTML>

<HEAD>
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
   <META HTTP-EQUIV="Content-Language" CONTENT="en">
   <META NAME="Author" LANG="es" CONTENT="Manuel Tomas CARRASCO BENITEZ">
   <TITLE>Primary Language in HTML</TITLE>
</HEAD>

<BODY BGCOLOR="#FFFFFF">

<H3 ALIGN='RIGHT'>
 <A href='http://www.w3.org/'>
 <img border='0' align='left' alt='W3C' src='http://www.w3.org/Icons/WWW/w3c_home'>
 </A>
</H3>

<H1 ALIGN=CENTER>Primary Language in HTML</H1>

<H3 ALIGN=CENTER>World Wide Web Consortium Note 13-March-1998</H3>

<DL>
<DT>This version:</DT>
<DD>
<A HREF="http://www.w3.org/TR/1998/NOTE-html-lan-19980313.html">http://www.w3.org/TR/1998/NOTE-html-lan-19980313.html</A></DD>

<DT>Latest Version:</DT>
<DD>
<A HREF="http://www.w3.org/TR/NOTE-html-lan">http://www.w3.org/TR/NOTE-html-lan</A></DD>
</DL>

<DL>
<DT>Editor:</DT>
<DD>
M.T. Carrasco Benitez
<A HREF="#CAR">[CAR]</A>
<A HREF="mailto:manuel.carrasco@emea.eudra.org">&lt;manuel.carrasco@emea.eudra.org></A></DD>
</DL>

<H2>Status of this document</H2>
This document is a NOTE made available by the W3 Consortium for discussion
only. This indicates no endorsement of its content, nor that the Consortium
has had any editorial control in its preparation, nor that the Consortium
has, is, or will be allocating any resources to the issues addressed by
the NOTE.
<P>
This document recommends how to mark the <EM>primary language(s)</EM>
in a HTML document.
It could be considered a clarification of the
<EM>HTML 4.0 Specification</EM>
<A HREF="#HTML40">[HTML40]</A>;
in particular,
it is not in contradiction with the HTML 4.0 Specification.
The objective is to have a
<EM>best practice</EM> in this field; at present there is some confusion.

<H2>Abstract</H2>
In HTML elements,
the <CODE>lang</CODE> attribute specifies the natural language.
This document is mostly concerned with how to specify the primary language(s)
(there could be more than one)
and the <EM>base language</EM>
(there is only one)
in HTML documents.

<H2>Overview</H2>
Most of the existing documents are monolingual.
<EM>Linguistic versions</EM>
(e.g., translations) of the same text are often kept as separated documents.
This is indeed the most sensible approach.
<P>
Some documents are bilingual and few are trilingual or n-lingual.
Bilingual documents are usually short;
i.e, a few paragraphs.
N-lingual documents are usually very short; a few sentences.
<P>
The main reason for the existence of n-lingual documents is political;
i.e., in certain situations it is not politically correct to assume a base
language. A common practice is to have one small document
that is a menu of languages.
For example,
the Europa server of the European Commission
<A HREF="#EUR">[EUR]</A>.
<P>
Another approach to choose the language is to set the client (e.g.,
the browser) to the preferred language(s).
The client will transmit the language(s) in the Accept-Language field of HTTP.
Immediately, the server will send an appropriate document.
For example, the Spanish version will
be presented if the language preferences (in the browser) are Spanish and
French and the document is available (in the server)
in French, German and Spanish.

<H2>Where to specify the primary language(s)</H2>
There should be <STRONG>one</STRONG> recommended place
to specify the primary language(s).
It is recommended that the primary language(s) be specified in a META element.
For example:

<PRE>
&lt;HTML>
&lt;HEAD>
&lt;META HTTP-EQUIV="Content-Language" Content="fr">
&lt;TITLE><SPAN lang=fr>Mon doc</SPAN>&lt;/TITLE>
&lt;/HEAD>
&lt;BODY>
<SPAN lang=fr>Je suis un Berlinois</span>.
&lt;/BODY>
&lt;/HTML>
</PRE>

<P>
The value of the <CODE>Content</CODE> attribute
of the META element is the same as the
value of the <CODE>Content-Language</CODE> header in HTTP;
i.e,
a comma-separated list of language codes.
For example:
<P>
<CODE>
&lt;META HTTP-EQUIV="Content-Language" Content="fr,en">
</CODE>
<P>
These language codes are the same used in the <CODE>lang</CODE>
attribute of some HTML elements.
For example:
<P>
<CODE>
&lt;BODY LANG=fr&gt;
</CODE>
<P>
The language codes are defined in
<A HREF="#RFC1766">[RFC1766]</A>.
See also
<A HREF="http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.1.1">
<EM>8.1.1 Language codes</EM>
</A>
of the HTML 4.0 Specification <A HREF="#HTML40">[HTML40]</A>
and
<A HREF="#RFC2068">[RFC2068]</A>.
<P>
The order of the languages in the Content-Language is significant.
The first language in the list is the base language of the document;
i.e., any text not re-specified with the <CODE>lang</CODE> attribute is in
the base language.
<P>
The META should not be marked with more than one language in
documents with minor fragments in other languages.
The rules to specify a document as
monolingual, bilingual or n-lingual are the same as for printed books.
<P>
The reason for recommending META as opposed to the HTML element with
the <CODE>lang</CODE> attribute are:
<UL>
<LI>
N-lingual document could be specified.
For example, a bilingual French/Spanish document can be specified.
</LI>
<LI>
The language(s) would be transmitted in the Content-Language field of HTTP
header.
</LI>
</UL>
<P>
A <CODE>lang</CODE> attribute in the HTML element overrides the language
specified in the META element.
The inheritance rules are in
<A HREF="http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.1.2">
<EM>
8.1.2 Language information and text direction
</EM>
</A>
of the HTML 4.0 Specification
<A HREF="#HTML40">[HTML40]</A>.

<H2>Acknowledgment</H2>
The recommendations are the rough consensus from the mailing list
www-international@w3.org
<A HREF="#LIST">[LIST]</A>
of the W3C and a meeting during the Unicode Conference in Mainz in March 1997.
<P>
In particular, thanks to
<BR>
<UL>
<LI> Bert Bos from the W3C,
<A HREF="http://www.w3.org/People/Bos/">
http://www.w3.org/People/Bos/
</A>

<LI> Martin D&uuml;rst from the W3C,
<A HREF="http://www.w3.org/People/W3Cpeople.html#Durst/">
http://www.w3.org/People/W3Cpeople.html#Durst
</A>
</UL>

<H2>References</H2>

<DL>
<DT>
<A NAME="CAR"></A>[CAR]
<DD>
M.T. Carrasco Benitez.
<A HREF="HTTP://dragoman.org">http://dragoman.org/</A>

<DT>
<A NAME="EUR"></A>[EUR]
<DD>
Europa. <A HREF="http://europa.eu.int">http://europa.eu.int/</A>

<DT><A NAME="HTML40"></A>[HTML40]
<DD>
HTML 4.0 Specification.
<A HREF="http://www.w3.org/TR/REC-html40/">http://www.w3.org/TR/REC-html40/</A>
<BR>
In particular:
<BR>
<A HREF="http://www.w3.org/TR/REC-html40/intro/intro.html#h-2.3.1">2.3.1
Internationalization</A>
<BR>
<A HREF="http://www.w3.org/TR/REC-html40/charset.html#h-5.1">5.1 The
Document Character Set</A>
<BR>
<A HREF="http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.4">7.4.4
Meta data</A>
<BR>
<A HREF="http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8">8
Language information and text direction</A>

<DT>
<A NAME="LIST">[LIST]</A>
<DD>
<A HREF="http://www.w3.org/International/O-misc-mlists.html">
http://www.w3.org/International/O-misc-mlists.html
</A>

<DT>
<A NAME="RFC1766">[RFC1766]</A>
<DD>
<EM>Tags for the Identification of Languages</EM>, H. Alvestrand, March 1995.
<BR>
Available at
<A HREF="http://ds.internic.net/rfc/rfc1766.txt">http://ds.internic.net/rfc/rfc1766.txt</A>

<DT>
<A NAME="RFC2068">[RFC2068]</A>
<DD>
<EM>Hypertext Transfer Protocol -- HTTP/1.1</EM>,
R. Fielding, J. Gettys, J. Mogul, H. Frystyk Nielsen and T. Berners-Lee,
January 1997.
<BR>
Available at
<A HREF="http://ds.internic.net/rfc/rfc2068.txt">http://ds.internic.net/rfc/rfc2068.txt</A>
<BR>
In particular:
<BR>
3.10 Language Tags
<BR>
12 Content Negotiation
<BR>
12.3 Transparent Negotiation
<BR>
14.4 Accept-Language
<BR>
14.13 Content-Laguage
<BR>
14.43 Vary
<BR>
15.7 Privacy Issues Connected to Accept Headers
<BR>
</DL>

</BODY>
</HTML>