Subversion Repositories specifications

Compare Revisions

Rev 361 → Rev 362

1.0/tags/Draft_01/openid-value-lang.xml New file
0,0 → 1,565
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- ***** File Inclusion ***** -->
<!-- The parameter value is the name of the file to be included which can also be a URI.
In the case of local files the XML_LIBRARY environment variable provides a search
path of directories in which the file may be located. See section 4.1.2 of README -->
<!-- include="n/a" -->
 
<!-- ***** Rigor Control ***** -->
<!-- Try to enforce the ID-nits conventions and DTD validity -->
<?rfc strict="no" ?>
 
<!-- ***** Rendering Control ***** -->
<!-- Put the famous header block on the first page -->
<?rfc topblock="yes" ?>
<!-- Include boilerplate from Section 10.4(d) of [1] (Bradner, S., "The Internet Standards
Process - Revision 3," October 1996.) -->
<?rfc iprnotified="no" ?>
 
<!-- Use anchors as symbolic tags rather than numbers for references -->
<?rfc symrefs="yes" ?>
<!-- Sort references according to symbolic tags - irrelevant if symrefs="no" -->
<?rfc sortrefs="yes" ?>
 
<!-- Items useful for reviewing document -->
<!-- Render <cref> information -->
<?rfc comments="no" ?>
<!-- If comments is "yes", then render comments inline; otherwise render them in an
"Editorial Comments" section" -->
<?rfc inline="no" ?>
<!-- Insert editing marks for ease of discussing draft versions.
Editing marks are strings such as <29> printed at the beginning of the blank line before
each paragrpah of text. -->
<?rfc editing="no" ?>
 
<!-- Items useful when using xml2rfc to produce technical documents other than RFCs and I-Ds -->
<!-- Produce a private memo rather than an RFC or Internet-Draft.
The value of the PI is used as the title of the document.
Omits the topblock and standard boiler plate when . -->
<?rfc private="Draft" ?>
<!-- Override the center footer string -->
<?rfc footer="" ?>
<!-- Override the leftmost header string -->
<?rfc header="" ?>
 
<!-- ***** Table of Contents Control ***** -->
<!-- Generate a table-of-contents -->
<?rfc toc="yes" ?>
<!-- Control whether the word "Appendix" appears in the table of contents. -->
<?rfc tocappendix="yes" ?>
<!-- If toc is "yes", then this determines the depth of the table of contents. -->
<?rfc tocdepth="3" ?>
<!-- If toc is "yes", then setting this to "yes" will indent subsections in
the table-of-contents. -->
<?rfc tocindent="yes" ?>
<!-- If toc is "yes", then setting this to "no" will make it a little less compact. -->
<?rfc tocompact="yes" ?>
<!-- Affects horizontal spacing in the table-of-content. -->
<?rfc tocnarrow="yes" ?>
 
<!-- ***** Format Control ***** -->
<!-- Automatically force page breaks to avoid widows and orphans (not perfect). -->
<?rfc autobreaks="yes" ?>
<!-- Put two spaces instead of one after each colon (":") in txt or nroff files. -->
<?rfc colonspace="no" ?>
<!-- When producing a txt/nroff file, try to conserve vertical whitespace
(the default value was "no" up to v1.30; from v1.31 the default is the current value
of the rfcedstyle PI).
Will default to (rfcedstyle) in future. -->
<?rfc compact="no" ?>
<!-- If compact is "yes", then you can make things a little less compact by setting this
to "no" (the default value is the current value of the compact PI). -->
<?rfc subcompact="no" ?>
<!-- An integer hint indicating how many contiguous lines are needed at this point in
the output.
Can appear as many times as necessary in the source. -->
<!-- needLines="0" -->
 
<!-- ***** HTML Specials ***** -->
<!-- When producing a html file, use the image in this file. -->
<?rfc background="" ?>
<!-- Automatically replaces input sequences such as |*text| by,
e.g., <strong>text</strong> in html output. -->
<?rfc emoticonic="no" ?>
<!-- Generate mailto: URL, as appropriate. -->
<?rfc linkmailto="yes" ?>
<!-- When producing a html file, produce multiple files for a slide show. -->
<?rfc slides="no" ?>
<!-- When producing a html file, use the <object> html element with inner replacement content
instead of the <img> html element, when a source xml element includes an src attribute. -->
<?rfc useobject="no" ?>
 
<!-- ***** Debugging ***** -->
<!-- Value is a string like "35:file.xml" or just "35" (file name then defaults to the
containing file's real name or to the latest linefile specification that changed it) that
will be used to override xml2rfc's reckoning of the current input position
(right after this PI) for warning and error reporting purposes
(line numbers are 1-based)" -->
<!-- linefile="n/a" -->
<!-- During processing pass 2, print the value to standard output at that point
in processing" -->
<!-- typeout="n/a" -->
 
 
 
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY version "1.0">
<!ENTITY axns "http://openid.net/srv/ax/1.0">
]>
 
<rfc category="std" ipr="none" docName="openid-value-lang-1_0-01.xml">
<front>
<title>Language Tags for OpenID Values</title>
<author initials = "M." surname="Wahl" fullname="Mark Wahl">
<organization>Informed Control Inc.</organization>
<address>
<postal>
<street>PO Box 90626</street>
<city>Austin</city><region>TX</region>
<code>78709</code>
<country>US</country>
</postal>
<email>mark.wahl@informed-control.com</email>
</address>
</author>
<date month="September" year="2007"/>
<area>Applications</area>
<keyword>identity</keyword>
<keyword>OpenID</keyword>
<keyword>schema</keyword>
<abstract>
<t>
This document defines a mechanism for transferring language tags
associated with UTF-8 string values in OpenID protocols, in
particular representing languages of attribute values in the
OpenID Attribute Exchange protocol.
</t>
</abstract>
</front>
<middle>
 
<section title="Introduction" toc='include'><t>
It is often desirable to be able to indicate the (human) language
associated with protocol elements exchanged in an identity system.
Language tags are especially useful when they can be associated with
specific values that are part of a set in order to provide the
receiver with a choice of values depending on the language of the
user; in particular, language tags can be associated with values
of an attribute.
 
For example, LDAP implementations use the mechanism described in
<xref target="RFC3866">RFC 3866</xref> to transfer language tags with
values in the LDAP protocol. For another example, protocols based on XML can encode language tags using the xml:lang encoding, e.g.
<figure>
<artwork>
 
&lt;attribute name="SecretQuestion"&gt;
&lt;value xml:lang="en-GB"&gt;What colour is your hair?&lt;/value&gt;
&lt;value xml:lang="en-US"&gt;What color is your hair?&lt;/value&gt;
&lt;/attribute&gt;
 
</artwork>
</figure>
</t>
 
<t>As OpenID is neither an XML-based protocol nor uses LDAP attribute descriptions,
a new mechanism is needed to associate language tags with values in
OpenID protocols.</t>
<t>
This document defines a mechanism by which a party in an identity system
using the OpenID protocols can associate a language tag with a string.
The input to the mechanism is a language tag and a string value.
The output from the mechanism is a <xref target="RFC3629">UTF-8</xref> encoding
of a combination of the language tag and the value.
</t>
<t>The initial use for this mechanism is in associating language tags
with string-valued attribute values in the
<xref target="OpenID.attribute-1.0">OpenID Attribute Exchange protocol</xref>.
</t>
 
<t>
This document does not specify mechanisms for
 
<list style="symbols">
<t>expressing or determining the set of languages or locales recognized by the user,</t>
<t>allowing an RP to request only values in a particular language tag be returned from a fetch request, or</t>
<t>associating language tags with binary or XML-valued attributes</t>
</list>
</t>
 
<t>
The key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in
<xref target="RFC2119">RFC 2119</xref>.
</t>
</section>
 
<section title="Language Tag" toc='include'>
 
<t>A language tag is a string of characters which represent the name of a language. </t>
 
<t>Language tags are described in <xref target="RFC3066">RFC 3066 (BCP 47)</xref>. Language tags are written using the characters LATIN SMALL LETTER a-z, DIGIT 0-9, and HYPHEN, e.g. fr, en-us or i-klingon, and are two or more characters long.</t>
 
<t>The encoding of a language tag uses Plane 14 characters as defined in <xref target="RFC2482">RFC 2482</xref>. The characters described in RFC 2482 were subsequently integrated into Unicode and ISO 10646. The language tag is encoded as a string of two or more Plane 14 characters taken from the set {U+E002D,U+E0030,U+E0031,...,U+E0039,U+E0061,U+E0062,...,U+E007A}. There is one Plane 14 character for each DIGIT character and one for each LATIN SMALL LETTER character. The number of a Plane 14 character is 0xE0000 plus the number of the HYPHEN character, the DIGIT character or the LATIN SMALL LETTER character. </t>
 
<figure>
<preamble>The Unicode plane 14 characters used in this encoding are:
</preamble>
<artwork>
U+E0001 LANGUAGE TAG
U+E002D TAG HYPHEN-MINUS
U+E0030 TAG DIGIT ZERO
...
U+E0039 TAG DIGIT NINE
U+E0061 TAG LATIN SMALL LETTER A
...
U+E007A TAG LATIN SMALL LETTER Z
U+E007F CANCEL TAG
 
</artwork>
</figure>
 
<t>For example, a language tag of "EN" would be converted to lower case latin characters ("en") and then 0xE0000 added to each character, to form the two characters
<figure>
<artwork>
U+E0065 # TAG LATIN SMALL LETTER E
U+E006E # TAG LATIN SMALL LETTER N
</artwork>
</figure>
 
<figure>
<preamble>
The bytes of the UTF-8 encoding of this two character long language tag are
</preamble>
<artwork>
F3 A0 81 A5 # TAG LATIN SMALL LETTER E
F3 A0 81 AE # TAG LATIN SMALL LETTER N
</artwork>
</figure>
</t>
 
</section>
 
<section title="OpenID AX Attribute Value Transfer Encoding" toc='include'>
<t>It is assumed that the many values of attributes transferred using OpenID
AX will not have language tags, but that a few attributes will have values
with language tags. It is assumed that the syntax of these values are strings
in the Unicode/ISO 10646 character set.
</t>
 
<t>
A value without a language tag is transferred as described in
<xref target="OpenID.attribute-1.0" />.
</t>
 
<t>
A combination of an attribute value with a language tag is transferred in OpenID AX with the value parameter consisting of the concatenation of the following bytes:
<list style="symbols">
<t>the bytes of the UTF-8 encoding of the Unicode character LANGUAGE TAG (0xF3 0xA0 0x80 0x81)</t>
<t>the bytes of the UTF-8 encoding of two or more plane 14 characters encoding the language</t>
<t>the bytes of the UTF-8 encoding of the underlying attribute value</t>
<t>the bytes of the UTF-8 encoding of the Unicode character CANCEL TAG (0xF3 0xA0 0x81 0xBF)</t>
</list>
 
</t>
 
<t>A CANCEL TAG character is explicitly included after the value in order to avoid problems with implementations that are not language tag aware from inadvertantly concatenating language-tagged values with other strings that are not in that language.</t>
 
<t>Implementations of the OpenID Attribute Exchange protocol which
accept Store requests MUST allow the values being stored to have associated
language tags, when permitted by the attribute definition.
</t>
 
<section title="Example" toc='include'><t>
Suppose a user had an attribute representing their "film préféré"
(favorite movie), which has four values:
 
<list style="hanging">
<t hangText="(no language)">
2001
</t>
<t hangText="(French)">
Amélie
</t>
<t hangText="(English)">
Delicatessen
</t>
<t hangText="(no language)">
M
</t>
 
</list>
 
</t>
 
<t>
<figure>
<preamble>A fetch would return the following bytes for this attribute:
</preamble>
 
<artwork>
openid.ax.type.fav_movie=http://example.fr/schema#FilmPr??f??r??
openid.ax.count.fav_movie=4
openid.ax.value.fav_movie.1=2001
openid.ax.value.fav_movie.2=????????????Am??lie????
openid.ax.value.fav_movie.3=????????????Delicatessen????
openid.ax.value.fav_movie.4=M
 
</artwork>
<postamble>
(In the preceeding figure, a question mark indicates a byte for which there
is not a printing ASCII character.)
</postamble>
</figure>
 
</t>
 
<t>
<figure>
<preamble>The value of openid.ax.value.fav_movie.2 ("Amélie" in French), the UTF-8 encoding of ten Unicode characters, is the following 23 bytes:
</preamble>
<artwork>
f3 a0 80 81 # LANGUAGE TAG
f3 a0 81 a6 # TAG LATIN SMALL LETTER F
f3 a0 81 b2 # TAG LATIN SMALL LETTER R
41 # A
6d # m
c3 a1 # eacute
6c # l
69 # i
65 # e
f3 a0 81 bf # CANCEL TAG
</artwork>
</figure>
</t>
 
<t>
<figure>
<preamble>The value of openid.ax.value.fav_movie.3 ("Delicatessen" in English) is the following bytes:
</preamble>
<artwork>
f3 a0 80 81 # LANGUAGE TAG
f3 a0 81 a5 # TAG LATIN SMALL LETTER E
f3 a0 81 ae # TAG LATIN SMALL LETTER N
44 65 6c 69 63 61 74 65 73 73 65 6e
f3 a0 81 bf # CANCEL TAG
</artwork>
</figure>
</t>
 
</section>
</section>
<section title="Use with other OpenID specifications" toc='include'>
<t>
This encoding mechanism is currently not defined to operate with values transferred in the OpenID Simple Registration Extension.
 
</t>
</section>
<section title="Sample Implementation" toc='include'>
 
<figure>
<preamble>
The following Java function attaches a language tag to a value. It is implemented using UTF-16 surrogate characters.
</preamble>
<artwork>
 
public class LanguageTag {
public static final char SUPP_CHAR_0 = 0xDB40;
public static final char SUPP_CHAR_1 = 0xDC00;
 
public static final char TAG_MAX = 0x7F;
public static final char TAG_LANG = 0x01;
public static final char TAG_CANCEL = 0x7F;
 
/**
* @param tagInAscii the ASCII letters of the language tag, e.g. "fr"
* @param rest the value being tagged
*/
public static byte[] getBytes(String tagInAscii,String rest)
throws UnsupportedEncodingException {
String s = add(tagInAscii,rest);
return s.getBytes("UTF-8");
}
 
/**
* returns a new String consisting of the language tag wrapping the string rest.
* @param tagInAscii the ASCII letters of the language tag, e.g. "fr"
* @param rest the value being tagged
*/
public static String add(String tagInAscii,String rest) {
if (tagInAscii == null || tagInAscii.length() == 0) return rest;
String tl =tagInAscii.toLowerCase();
 
StringBuffer sb = new StringBuffer();
addLeadingTag(sb,tagInAscii);
sb.append(rest);
addTrailingTag(sb);
return sb.toString();
}
 
private static void addLeadingTag(StringBuffer sb,String tagInAscii) {
char c0 = SUPP_CHAR_0;
char c1 = SUPP_CHAR_1 + TAG_LANG;
sb.append(c0);
sb.append(c1);
int tl = tagInAscii.length();
for (int i = 0; i &lt; tl;i++) {
char cx = tagInAscii.charAt(i);
if (cx &lt;= 0x20 || cx &gt;= 0x7F) continue;
c1 = (char)(SUPP_CHAR_1 + cx);
sb.append(c0);
sb.append(c1);
}
}
 
private static void addTrailingTag(StringBuffer sb) {
char c0 = SUPP_CHAR_0;
char c1 = SUPP_CHAR_1 + TAG_CANCEL;
sb.append(c0);
sb.append(c1);
}
 
}
 
</artwork>
</figure>
 
</section>
<section title="Security Considerations" toc='include'>
<t>The language tag representation mechanism used in this document is not known to raise any additional security concerns beyond that discussed in RFC 3066.
</t>
</section>
</middle>
<back>
<references title='Normative References'>
 
<reference anchor='RFC2482'>
<front><title>Language Tagging in Unicode Plain Text</title>
<author initials='K.' surname='Whistler'><organization>Sybase</organization></author>
<author initials='G.' surname='Adams'><organization>Spyglass</organization></author>
</front>
<seriesInfo name='RFC' value='2482' />
</reference>
 
<reference anchor='RFC2119'>
<front>
<title abbrev='RFC Key Words'>Key words for use in RFCs to Indicate Requirement Levels</title>
<author initials='S.' surname='Bradner' fullname='Scott Bradner'>
<organization>Harvard University</organization>
<address>
<postal>
<street>1350 Mass. Ave.</street>
<street>Cambridge</street>
<street>MA 02138</street>
</postal>
<phone>- +1 617 495 3864</phone>
<email>sob@harvard.edu</email></address></author>
<date year='1997' month='March' />
<area>General</area>
<keyword>keyword</keyword>
<abstract>
<t>
In many standards track documents several words are used
to signify the requirements in the specification. These
words are often capitalized. This document defines
these words as they should be interpreted in IETF
documents. Authors who follow these guidelines should
incorporate this phrase near the beginning of their
document:
<list>
<t>
The key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC
2119.
</t></list></t>
<t>
Note that the force of these words is modified by the
requirement level of the document in which they are
used.
</t>
</abstract></front>
<seriesInfo name='BCP' value='14' />
<seriesInfo name='RFC' value='2119' />
<format type='TXT' octets='4723' target='ftp://ftp.isi.edu/in-notes/rfc2119.txt' />
<format type='HTML' octets='16553' target='http://xml.resource.org/public/rfc/html/rfc2119.html' />
<format type='XML' octets='5703' target='http://xml.resource.org/public/rfc/xml/rfc2119.xml' />
</reference>
<reference anchor='RFC3629'>
<front>
<title>UTF-8, a transformation format of ISO 10646</title>
<author initials='F.' surname='Yergeau' fullname='F. Yergeau'>
<organization /></author>
<date year='2003' month='November' />
<abstract>
<t>
ISO/IEC 10646-1 defines a large character set called the
Universal Character Set (UCS) which encompasses most of
the world's writing systems. The originally proposed
encodings of the UCS, however, were not compatible with
many current applications and protocols, and this has
led to the development of UTF-8, the object of this
memo. UTF-8 has the characteristic of preserving the
full US-ASCII range, providing compatibility with file
systems, parsers and other software that rely on
US-ASCII values but are transparent to other
values. This memo obsoletes and replaces RFC 2279.
</t>
</abstract>
</front>
<seriesInfo name='STD' value='63' />
<seriesInfo name='RFC' value='3629' />
<format type='TXT' octets='33856' target='ftp://ftp.isi.edu/in-notes/rfc3629.txt' />
</reference>
<reference anchor="OpenID.attribute-1.0">
<front>
<title>OpenID Attribute Exchange 1.0 - Draft 07</title>
<author initials='D.' surname='Hardt'>
<organization>Sxip Identity</organization></author>
<author initials='J.' surname='Bufu' >
<organization>Sxip Identity</organization></author>
<author initials='J.' surname='Hoyt' >
<organization>JanRain</organization></author>
<date year='2007' month='August' />
</front>
<format type='HTML' target='http://openid.net/specs/openid-attribute-exchange-1_0-07.html'/></reference>
 
<reference anchor='RFC3066'>
<front><title>Tags for the Identification of Languages</title><author initials='H' surname='Alvestrand'><organization>Cisco Systems</organization></author>
</front>
<seriesInfo name='RFC' value='3066' />
<seriesInfo name='BCP' value='47' />
</reference>
 
</references>
<references title='Informative References'>
 
<reference anchor='RFC3866'>
<front><title>Language Tags and Ranges in the Lightweight Directory Access Protocol (LDAP)</title>
<author initials="K" surname="Zeilenga"><organization>OpenLDAP Foundation</organization></author>
</front>
<seriesInfo name='RFC' value='3866' />
</reference>
 
</references>
 
<section title='Copyright'><t>
Copyright (C) Informed Control Inc. (2007).
 
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR
A PARTICULAR PURPOSE.
</t></section>
 
</back>
</rfc>
 
 
 
1.0/tags/Draft_01/openid-value-lang.html New file
0,0 → 1,576
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en"><head><title>Draft: Language Tags for OpenID Values</title>
<meta http-equiv="Expires" content="Thu, 06 Sep 2007 16:04:48 +0000">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="description" content="Language Tags for OpenID Values">
<meta name="keywords" content="identity, OpenID, schema">
<meta name="generator" content="xml2rfc v1.32 (http://xml.resource.org/)">
<style type='text/css'><!--
body {
font-family: verdana, charcoal, helvetica, arial, sans-serif;
font-size: small; color: #000; background-color: #FFF;
margin: 2em;
}
h1, h2, h3, h4, h5, h6 {
font-family: helvetica, monaco, "MS Sans Serif", arial, sans-serif;
font-weight: bold; font-style: normal;
}
h1 { color: #900; background-color: transparent; text-align: right; }
h3 { color: #333; background-color: transparent; }
 
td.RFCbug {
font-size: x-small; text-decoration: none;
width: 30px; height: 30px; padding-top: 2px;
text-align: justify; vertical-align: middle;
background-color: #000;
}
td.RFCbug span.RFC {
font-family: monaco, charcoal, geneva, "MS Sans Serif", helvetica, verdana, sans-serif;
font-weight: bold; color: #666;
}
td.RFCbug span.hotText {
font-family: charcoal, monaco, geneva, "MS Sans Serif", helvetica, verdana, sans-serif;
font-weight: normal; text-align: center; color: #FFF;
}
 
table.TOCbug { width: 30px; height: 15px; }
td.TOCbug {
text-align: center; width: 30px; height: 15px;
color: #FFF; background-color: #900;
}
td.TOCbug a {
font-family: monaco, charcoal, geneva, "MS Sans Serif", helvetica, sans-serif;
font-weight: bold; font-size: x-small; text-decoration: none;
color: #FFF; background-color: transparent;
}
 
td.header {
font-family: arial, helvetica, sans-serif; font-size: x-small;
vertical-align: top; width: 33%;
color: #FFF; background-color: #666;
}
td.author { font-weight: bold; font-size: x-small; margin-left: 4em; }
td.author-text { font-size: x-small; }
 
/* info code from SantaKlauss at http://www.madaboutstyle.com/tooltip2.html */
a.info {
/* This is the key. */
position: relative;
z-index: 24;
text-decoration: none;
}
a.info:hover {
z-index: 25;
color: #FFF; background-color: #900;
}
a.info span { display: none; }
a.info:hover span.info {
/* The span will display just on :hover state. */
display: block;
position: absolute;
font-size: smaller;
top: 2em; left: -5em; width: 15em;
padding: 2px; border: 1px solid #333;
color: #900; background-color: #EEE;
text-align: left;
}
 
a { font-weight: bold; }
a:link { color: #900; background-color: transparent; }
a:visited { color: #633; background-color: transparent; }
a:active { color: #633; background-color: transparent; }
 
p { margin-left: 2em; margin-right: 2em; }
p.copyright { font-size: x-small; }
p.toc { font-size: small; font-weight: bold; margin-left: 3em; }
table.toc { margin: 0 0 0 3em; padding: 0; border: 0; vertical-align: text-top; }
td.toc { font-size: small; font-weight: bold; vertical-align: text-top; }
 
ol.text { margin-left: 2em; margin-right: 2em; }
ul.text { margin-left: 2em; margin-right: 2em; }
li { margin-left: 3em; }
 
/* RFC-2629 <spanx>s and <artwork>s. */
em { font-style: italic; }
strong { font-weight: bold; }
dfn { font-weight: bold; font-style: normal; }
cite { font-weight: normal; font-style: normal; }
tt { color: #036; }
tt, pre, pre dfn, pre em, pre cite, pre span {
font-family: "Courier New", Courier, monospace; font-size: small;
}
pre {
text-align: left; padding: 4px;
color: #000; background-color: #CCC;
}
pre dfn { color: #900; }
pre em { color: #66F; background-color: #FFC; font-weight: normal; }
pre .key { color: #33C; font-weight: bold; }
pre .id { color: #900; }
pre .str { color: #000; background-color: #CFF; }
pre .val { color: #066; }
pre .rep { color: #909; }
pre .oth { color: #000; background-color: #FCF; }
pre .err { background-color: #FCC; }
 
/* RFC-2629 <texttable>s. */
table.all, table.full, table.headers, table.none {
font-size: small; text-align: center; border-width: 2px;
vertical-align: top; border-collapse: collapse;
}
table.all, table.full { border-style: solid; border-color: black; }
table.headers, table.none { border-style: none; }
th {
font-weight: bold; border-color: black;
border-width: 2px 2px 3px 2px;
}
table.all th, table.full th { border-style: solid; }
table.headers th { border-style: none none solid none; }
table.none th { border-style: none; }
table.all td {
border-style: solid; border-color: #333;
border-width: 1px 2px;
}
table.full td, table.headers td, table.none td { border-style: none; }
 
hr { height: 1px; }
hr.insert {
width: 80%; border-style: none; border-width: 0;
color: #CCC; background-color: #CCC;
}
--></style>
</head>
<body>
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<table summary="layout" width="66%" border="0" cellpadding="0" cellspacing="0"><tr><td><table summary="layout" width="100%" border="0" cellpadding="2" cellspacing="1">
<tr><td class="header">Draft</td><td class="header">M. Wahl</td></tr>
<tr><td class="header">&nbsp;</td><td class="header">Informed Control Inc.</td></tr>
<tr><td class="header">&nbsp;</td><td class="header">September 6, 2007</td></tr>
</table></td></tr></table>
<h1><br />Language Tags for OpenID Values</h1>
 
<h3>Abstract</h3>
 
<p>
This document defines a mechanism for transferring language tags
associated with UTF-8 string values in OpenID protocols, in
particular representing languages of attribute values in the
OpenID Attribute Exchange protocol.
 
</p><a name="toc"></a><br /><hr />
<h3>Table of Contents</h3>
<p class="toc">
<a href="#anchor1">1.</a>&nbsp;
Introduction<br />
<a href="#anchor2">2.</a>&nbsp;
Language Tag<br />
<a href="#anchor3">3.</a>&nbsp;
OpenID AX Attribute Value Transfer Encoding<br />
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#anchor4">3.1.</a>&nbsp;
Example<br />
<a href="#anchor5">4.</a>&nbsp;
Use with other OpenID specifications<br />
<a href="#anchor6">5.</a>&nbsp;
Sample Implementation<br />
<a href="#anchor7">6.</a>&nbsp;
Security Considerations<br />
<a href="#rfc.references1">7.</a>&nbsp;
References<br />
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#rfc.references1">7.1.</a>&nbsp;
Normative References<br />
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#rfc.references2">7.2.</a>&nbsp;
Informative References<br />
<a href="#anchor10">Appendix&nbsp;A.</a>&nbsp;
Copyright<br />
<a href="#rfc.authors">&#167;</a>&nbsp;
Author's Address<br />
</p>
<br clear="all" />
 
<a name="anchor1"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<a name="rfc.section.1"></a><h3>1.&nbsp;
Introduction</h3>
 
<p>
It is often desirable to be able to indicate the (human) language
associated with protocol elements exchanged in an identity system.
Language tags are especially useful when they can be associated with
specific values that are part of a set in order to provide the
receiver with a choice of values depending on the language of the
user; in particular, language tags can be associated with values
of an attribute.
 
For example, LDAP implementations use the mechanism described in
<a class='info' href='#RFC3866'>RFC 3866<span> (</span><span class='info'>Zeilenga, K., &ldquo;Language Tags and Ranges in the Lightweight Directory Access Protocol (LDAP),&rdquo; .</span><span>)</span></a> [RFC3866] to transfer language tags with
values in the LDAP protocol. For another example, protocols based on XML can encode language tags using the xml:lang encoding, e.g.
</p>
<div style='display: table; width: 0; margin-left: 3em; margin-right: auto'><pre>
 
&lt;attribute name="SecretQuestion"&gt;
&lt;value xml:lang="en-GB"&gt;What colour is your hair?&lt;/value&gt;
&lt;value xml:lang="en-US"&gt;What color is your hair?&lt;/value&gt;
&lt;/attribute&gt;
 
</pre></div><p>
 
 
</p>
<p>As OpenID is neither an XML-based protocol nor uses LDAP attribute descriptions,
a new mechanism is needed to associate language tags with values in
OpenID protocols.
</p>
<p>
This document defines a mechanism by which a party in an identity system
using the OpenID protocols can associate a language tag with a string.
The input to the mechanism is a language tag and a string value.
The output from the mechanism is a <a class='info' href='#RFC3629'>UTF-8<span> (</span><span class='info'>Yergeau, F., &ldquo;UTF-8, a transformation format of ISO 10646,&rdquo; November&nbsp;2003.</span><span>)</span></a> [RFC3629] encoding
of a combination of the language tag and the value.
 
</p>
<p>The initial use for this mechanism is in associating language tags
with string-valued attribute values in the
<a class='info' href='#OpenID.attribute-1.0'>OpenID Attribute Exchange protocol<span> (</span><span class='info'>Hardt, D., Bufu, J., and J. Hoyt, &ldquo;OpenID Attribute Exchange 1.0 - Draft 07,&rdquo; August&nbsp;2007.</span><span>)</span></a> [OpenID.attribute&#8209;1.0].
 
</p>
<p>
This document does not specify mechanisms for
 
</p>
<ul class="text">
<li>expressing or determining the set of languages or locales recognized by the user,
</li>
<li>allowing an RP to request only values in a particular language tag be returned from a fetch request, or
</li>
<li>associating language tags with binary or XML-valued attributes
</li>
</ul><p>
 
</p>
<p>
The key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in
<a class='info' href='#RFC2119'>RFC 2119<span> (</span><span class='info'>Bradner, S., &ldquo;Key words for use in RFCs to Indicate Requirement Levels,&rdquo; March&nbsp;1997.</span><span>)</span></a> [RFC2119].
 
</p>
<a name="anchor2"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<a name="rfc.section.2"></a><h3>2.&nbsp;
Language Tag</h3>
 
<p>A language tag is a string of characters which represent the name of a language.
</p>
<p>Language tags are described in <a class='info' href='#RFC3066'>RFC 3066 (BCP 47)<span> (</span><span class='info'>Alvestrand, H., &ldquo;Tags for the Identification of Languages,&rdquo; .</span><span>)</span></a> [RFC3066]. Language tags are written using the characters LATIN SMALL LETTER a-z, DIGIT 0-9, and HYPHEN, e.g. fr, en-us or i-klingon, and are two or more characters long.
</p>
<p>The encoding of a language tag uses Plane 14 characters as defined in <a class='info' href='#RFC2482'>RFC 2482<span> (</span><span class='info'>Whistler, K. and G. Adams, &ldquo;Language Tagging in Unicode Plain Text,&rdquo; .</span><span>)</span></a> [RFC2482]. The characters described in RFC 2482 were subsequently integrated into Unicode and ISO 10646. The language tag is encoded as a string of two or more Plane 14 characters taken from the set {U+E002D,U+E0030,U+E0031,...,U+E0039,U+E0061,U+E0062,...,U+E007A}. There is one Plane 14 character for each DIGIT character and one for each LATIN SMALL LETTER character. The number of a Plane 14 character is 0xE0000 plus the number of the HYPHEN character, the DIGIT character or the LATIN SMALL LETTER character.
</p>
<p>The Unicode plane 14 characters used in this encoding are:
 
</p><div style='display: table; width: 0; margin-left: 3em; margin-right: auto'><pre>
U+E0001 LANGUAGE TAG
U+E002D TAG HYPHEN-MINUS
U+E0030 TAG DIGIT ZERO
...
U+E0039 TAG DIGIT NINE
U+E0061 TAG LATIN SMALL LETTER A
...
U+E007A TAG LATIN SMALL LETTER Z
U+E007F CANCEL TAG
 
</pre></div>
<p>For example, a language tag of "EN" would be converted to lower case latin characters ("en") and then 0xE0000 added to each character, to form the two characters
</p>
<div style='display: table; width: 0; margin-left: 3em; margin-right: auto'><pre>
U+E0065 # TAG LATIN SMALL LETTER E
U+E006E # TAG LATIN SMALL LETTER N
</pre></div><p>
 
 
 
<p>
The bytes of the UTF-8 encoding of this two character long language tag are
 
</p><div style='display: table; width: 0; margin-left: 3em; margin-right: auto'><pre>
F3 A0 81 A5 # TAG LATIN SMALL LETTER E
F3 A0 81 AE # TAG LATIN SMALL LETTER N
</pre></div>
 
 
<a name="anchor3"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<a name="rfc.section.3"></a><h3>3.&nbsp;
OpenID AX Attribute Value Transfer Encoding</h3>
 
<p>It is assumed that the many values of attributes transferred using OpenID
AX will not have language tags, but that a few attributes will have values
with language tags. It is assumed that the syntax of these values are strings
in the Unicode/ISO 10646 character set.
 
</p>
<p>
A value without a language tag is transferred as described in
<a class='info' href='#OpenID.attribute-1.0'>[OpenID.attribute&#8209;1.0]<span> (</span><span class='info'>Hardt, D., Bufu, J., and J. Hoyt, &ldquo;OpenID Attribute Exchange 1.0 - Draft 07,&rdquo; August&nbsp;2007.</span><span>)</span></a>.
 
</p>
<p>
A combination of an attribute value with a language tag is transferred in OpenID AX with the value parameter consisting of the concatenation of the following bytes:
</p>
<ul class="text">
<li>the bytes of the UTF-8 encoding of the Unicode character LANGUAGE TAG (0xF3 0xA0 0x80 0x81)
</li>
<li>the bytes of the UTF-8 encoding of two or more plane 14 characters encoding the language
</li>
<li>the bytes of the UTF-8 encoding of the underlying attribute value
</li>
<li>the bytes of the UTF-8 encoding of the Unicode character CANCEL TAG (0xF3 0xA0 0x81 0xBF)
</li>
</ul><p>
 
 
</p>
<p>A CANCEL TAG character is explicitly included after the value in order to avoid problems with implementations that are not language tag aware from inadvertantly concatenating language-tagged values with other strings that are not in that language.
</p>
<p>Implementations of the OpenID Attribute Exchange protocol which
accept Store requests MUST allow the values being stored to have associated
language tags, when permitted by the attribute definition.
 
</p>
<a name="anchor4"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<a name="rfc.section.3.1"></a><h3>3.1.&nbsp;
Example</h3>
 
<p>
Suppose a user had an attribute representing their "film préféré"
(favorite movie), which has four values:
 
</p>
<blockquote class="text"><dl>
<dt>(no language)</dt>
<dd>
2001
 
</dd>
<dt>(French)</dt>
<dd>
Amélie
 
</dd>
<dt>(English)</dt>
<dd>
Delicatessen
 
</dd>
<dt>(no language)</dt>
<dd>
M
 
</dd>
</dl></blockquote><p>
 
 
</p>
<p>
 
<p>A fetch would return the following bytes for this attribute:
 
</p><div style='display: table; width: 0; margin-left: 3em; margin-right: auto'><pre>
openid.ax.type.fav_movie=http://example.fr/schema#FilmPr??f??r??
openid.ax.count.fav_movie=4
openid.ax.value.fav_movie.1=2001
openid.ax.value.fav_movie.2=????????????Am??lie????
openid.ax.value.fav_movie.3=????????????Delicatessen????
openid.ax.value.fav_movie.4=M
 
</pre></div>
<p>
(In the preceeding figure, a question mark indicates a byte for which there
is not a printing ASCII character.)
 
</p>
 
 
 
<p>
 
<p>The value of openid.ax.value.fav_movie.2 ("Amélie" in French), the UTF-8 encoding of ten Unicode characters, is the following 23 bytes:
 
</p><div style='display: table; width: 0; margin-left: 3em; margin-right: auto'><pre>
f3 a0 80 81 # LANGUAGE TAG
f3 a0 81 a6 # TAG LATIN SMALL LETTER F
f3 a0 81 b2 # TAG LATIN SMALL LETTER R
41 # A
6d # m
c3 a1 # eacute
6c # l
69 # i
65 # e
f3 a0 81 bf # CANCEL TAG
</pre></div>
 
 
<p>
 
<p>The value of openid.ax.value.fav_movie.3 ("Delicatessen" in English) is the following bytes:
 
</p><div style='display: table; width: 0; margin-left: 3em; margin-right: auto'><pre>
f3 a0 80 81 # LANGUAGE TAG
f3 a0 81 a5 # TAG LATIN SMALL LETTER E
f3 a0 81 ae # TAG LATIN SMALL LETTER N
44 65 6c 69 63 61 74 65 73 73 65 6e
f3 a0 81 bf # CANCEL TAG
</pre></div>
 
 
<a name="anchor5"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<a name="rfc.section.4"></a><h3>4.&nbsp;
Use with other OpenID specifications</h3>
 
<p>
This encoding mechanism is currently not defined to operate with values transferred in the OpenID Simple Registration Extension.
 
 
</p>
<a name="anchor6"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<a name="rfc.section.5"></a><h3>5.&nbsp;
Sample Implementation</h3>
 
<p>
The following Java function attaches a language tag to a value. It is implemented using UTF-16 surrogate characters.
 
</p><div style='display: table; width: 0; margin-left: 3em; margin-right: auto'><pre>
 
public class LanguageTag {
public static final char SUPP_CHAR_0 = 0xDB40;
public static final char SUPP_CHAR_1 = 0xDC00;
 
public static final char TAG_MAX = 0x7F;
public static final char TAG_LANG = 0x01;
public static final char TAG_CANCEL = 0x7F;
 
/**
* @param tagInAscii the ASCII letters of the language tag, e.g. "fr"
* @param rest the value being tagged
*/
public static byte[] getBytes(String tagInAscii,String rest)
throws UnsupportedEncodingException {
String s = add(tagInAscii,rest);
return s.getBytes("UTF-8");
}
 
/**
* returns a new String consisting of the language tag wrapping the string rest.
* @param tagInAscii the ASCII letters of the language tag, e.g. "fr"
* @param rest the value being tagged
*/
public static String add(String tagInAscii,String rest) {
if (tagInAscii == null || tagInAscii.length() == 0) return rest;
String tl =tagInAscii.toLowerCase();
 
StringBuffer sb = new StringBuffer();
addLeadingTag(sb,tagInAscii);
sb.append(rest);
addTrailingTag(sb);
return sb.toString();
}
 
private static void addLeadingTag(StringBuffer sb,String tagInAscii) {
char c0 = SUPP_CHAR_0;
char c1 = SUPP_CHAR_1 + TAG_LANG;
sb.append(c0);
sb.append(c1);
int tl = tagInAscii.length();
for (int i = 0; i &lt; tl;i++) {
char cx = tagInAscii.charAt(i);
if (cx &lt;= 0x20 || cx &gt;= 0x7F) continue;
c1 = (char)(SUPP_CHAR_1 + cx);
sb.append(c0);
sb.append(c1);
}
}
 
private static void addTrailingTag(StringBuffer sb) {
char c0 = SUPP_CHAR_0;
char c1 = SUPP_CHAR_1 + TAG_CANCEL;
sb.append(c0);
sb.append(c1);
}
 
}
 
</pre></div>
<a name="anchor7"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<a name="rfc.section.6"></a><h3>6.&nbsp;
Security Considerations</h3>
 
<p>The language tag representation mechanism used in this document is not known to raise any additional security concerns beyond that discussed in RFC 3066.
 
</p>
<a name="rfc.references"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<a name="rfc.section.7"></a><h3>7.&nbsp;
References</h3>
 
<a name="rfc.references1"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<h3>7.1.&nbsp;Normative References</h3>
<table width="99%" border="0">
<tr><td class="author-text" valign="top"><a name="OpenID.attribute-1.0">[OpenID.attribute-1.0]</a></td>
<td class="author-text">Hardt, D., Bufu, J., and J. Hoyt, &ldquo;<a href="http://openid.net/specs/openid-attribute-exchange-1_0-07.html">OpenID Attribute Exchange 1.0 - Draft 07</a>,&rdquo; August&nbsp;2007.</td></tr>
<tr><td class="author-text" valign="top"><a name="RFC2119">[RFC2119]</a></td>
<td class="author-text"><a href="mailto:sob@harvard.edu">Bradner, S.</a>, &ldquo;<a href="ftp://ftp.isi.edu/in-notes/rfc2119.txt">Key words for use in RFCs to Indicate Requirement Levels</a>,&rdquo; BCP&nbsp;14, RFC&nbsp;2119, March&nbsp;1997 (<a href="ftp://ftp.isi.edu/in-notes/rfc2119.txt">TXT</a>, <a href="http://xml.resource.org/public/rfc/html/rfc2119.html">HTML</a>, <a href="http://xml.resource.org/public/rfc/xml/rfc2119.xml">XML</a>).</td></tr>
<tr><td class="author-text" valign="top"><a name="RFC2482">[RFC2482]</a></td>
<td class="author-text">Whistler, K. and G. Adams, &ldquo;<a href="ftp://ftp.isi.edu/in-notes/rfc2482.txt">Language Tagging in Unicode Plain Text</a>,&rdquo; RFC&nbsp;2482.</td></tr>
<tr><td class="author-text" valign="top"><a name="RFC3066">[RFC3066]</a></td>
<td class="author-text">Alvestrand, H., &ldquo;<a href="ftp://ftp.isi.edu/in-notes/rfc3066.txt">Tags for the Identification of Languages</a>,&rdquo; RFC&nbsp;3066, BCP&nbsp;47.</td></tr>
<tr><td class="author-text" valign="top"><a name="RFC3629">[RFC3629]</a></td>
<td class="author-text">Yergeau, F., &ldquo;<a href="ftp://ftp.isi.edu/in-notes/rfc3629.txt">UTF-8, a transformation format of ISO 10646</a>,&rdquo; STD&nbsp;63, RFC&nbsp;3629, November&nbsp;2003.</td></tr>
</table>
 
<a name="rfc.references2"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<h3>7.2.&nbsp;Informative References</h3>
<table width="99%" border="0">
<tr><td class="author-text" valign="top"><a name="RFC3866">[RFC3866]</a></td>
<td class="author-text">Zeilenga, K., &ldquo;<a href="ftp://ftp.isi.edu/in-notes/rfc3866.txt">Language Tags and Ranges in the Lightweight Directory Access Protocol (LDAP)</a>,&rdquo; RFC&nbsp;3866.</td></tr>
</table>
 
<a name="anchor10"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<a name="rfc.section.A"></a><h3>Appendix A.&nbsp;
Copyright</h3>
 
<p>
Copyright (C) Informed Control Inc. (2007).
 
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR
A PARTICULAR PURPOSE.
 
</p>
<a name="rfc.authors"></a><br /><hr />
<table summary="layout" cellpadding="0" cellspacing="2" class="TOCbug" align="right"><tr><td class="TOCbug"><a href="#toc">&nbsp;TOC&nbsp;</a></td></tr></table>
<h3>Author's Address</h3>
<table width="99%" border="0" cellpadding="0" cellspacing="0">
<tr><td class="author-text">&nbsp;</td>
<td class="author-text">Mark Wahl</td></tr>
<tr><td class="author-text">&nbsp;</td>
<td class="author-text">Informed Control Inc.</td></tr>
<tr><td class="author-text">&nbsp;</td>
<td class="author-text">PO Box 90626</td></tr>
<tr><td class="author-text">&nbsp;</td>
<td class="author-text">Austin, TX 78709</td></tr>
<tr><td class="author-text">&nbsp;</td>
<td class="author-text">US</td></tr>
<tr><td class="author" align="right">Email:&nbsp;</td>
<td class="author-text"><a href="mailto:mark.wahl@informed-control.com">mark.wahl@informed-control.com</a></td></tr>
</table>
</body></html>