===============
1. What is IDN
===============
IDN stands for Internationalized Domain Names.
An IDN is an Internet domain name that contains non-ASCII characters. Such domain names could
contain letters with diacritics, as required by many European languages, or characters from
non-Latin scripts such as Arabic or Chinese. However, the standard for domain names does not
allow such characters, nor could such domain names be handled by the existing DNS and name
resolver infrastructure.
http://www.icann.org/topics/idn.html defines Internationalized Domain Names in Applications (IDNA),
in that non-ASCII domain names should be converted into standard ASCII domain names by applications
while preserving the stability of the domain name systems. IDNA specifies how this conversion is
to be done.
Per http://www.icann.org/topics/idn.html, the three RFCs: 3490, 3491, and 3492 together define the
IDNA protocol/behavior.
The 3 RFCs boil down to the following summaries:
(1) Non-ASCII domain names should be converted to a standard ASCII-based form, aka,
ASCII Compatible Encoding(ACE) by applications before handing them to DNS and name
resolver systems.
(2) An IDNA-enabled application should be able to convert between the restricted-ASCII
and non-ASCII representations of a domain, using the ASCII form in cases where
it is needed (such as for DNS lookup), but being able to present the more readable
non-ASCII form to users.
(3) Support for the two conversion functions:
A. ToASCII:
Used before sending an IDN to something that expects ASCII names (such as a resolver) or
writing an IDN into a place that expects ASCII names (such as a DNS master file).
B. ToUnicode:
Used when displaying names to users.
For example:
Unicode name ACE name
---------------------------------------
my.中文.com my.xn--fiq228c.com
my.xyz中文abc.com my.xn--xyzabc-dw7i870n.com
中文.xyz中文abc.com xn--fiq228c.xn--xyzabc-dw7i870n.com
Note:
- The conversion is on "label(dot separated segments) basis", not on individual characters nor the entire domain name.
- Let's say {X}{Y}{Z} are non-ASCII characters, ACE for {X} is not necessarily a substring of ACE for {X}{Y} or ACE for {X}{Y}{Z}.
links:
http://www.icann.org/topics/idn.html
http://en.wikipedia.org/wiki/Internationalized_domain_name
================================
2. IDN Support in Zimbra Server
================================
http://bugzilla.zimbra.com/show_bug.cgi?id=14225
2.1 Support for ToASCII and ToUnicode Conversions
Adapted third party GNU IDN Java lib (http://www.gnu.org/software/libidn) for the two
conversion functions.
2.2 Stored values in LDAP
IDN domain names and email addresses are converted to ACE and stored in LDAP in ACE.
(Note, for domain names and email addresses that do not contain non-ASCII characters, i.e.,
not containing IDN, their ACE value is identical to the unicode value.)
For the following reasons:
1. Each domain label are stored in the dc attribute in LDAP. dc is of syntax IA5String,
which is "ASCII character set 7-bit" and does not allow non-ASCII characters.
2. ACE in domain names and email addresses are the real routable names, they need to be
seen by components like Postfix when accessed from LDAP.
3. For consistency, IDNs are converted to ACE and stored in ACE across the board for all
attributes that contain domain names or email addresses.
2.3 Support in account/admin SOAP Request/Response
1. SOAP Request
Both Unicode names and ACE names will be honored by all our SOAP requests that
identify accounts, calendar resources, distribution lists, aliases, and domains by
name. (Note, this is supported in LdapProvisioning via the Provisioning interface.
The names be opaque to callers of Provisioning.)
e.g.
(a) my.中文.cn equals to
my.xn--fiq228c.cn
(b) user1@my.中文.cn equals to
user1@my.xn--fiq228c.cn
2. SOAP Response
(A) name attribute and element in all SOAP responses are always returned in unicode(in utf8 encoding)
e.g.
(a)
...
(b)
...
(c)
...
user1@my.中文.cn
...
(B) attributes containing IDN domain names or email addresses are always returned in unicode(in utf8 encoding)
Server will convert the values from ACE to unicode and return unicode values in SOAP responses for
attributes that are either:
- of type email or emailp, (declared in zimbra-attrs.xml) or
- has idn flag (in zimbra-attrs.xml)
e.g.
(a)
my.xn--fiq228c.cn domain <=== not an idn flagged or email attribute
dcObject
organization
zimbraDomain
my.中文.cn <=== an idn flagged attribute, stored LDAP value is my.xn--fiq228c.cn
0eef5297-b212-4990-aad0-79e30825bf83
local
my
enabled
active
(b)
FALSE
...
user1@my.中文.cn <=== an email attribute, stored LDAP value is user1@my.xn--fiq228c.cn
alias-of-user1@my.中文.cn <=== an email attribute, stored LDAP value is alias-of-user1@my.xn--fiq228c.cn
ooo@yahoo.com <=== an email attribute, stored LDAP value is ooo@yahoo.com
(c)
mydomain.local <=== an idn flagged attribute, stored LDAP value is mydomain.local
(d)
my.中文.cn <=== an idn flagged attribute, stored LDAP value is my.xn--fiq228c.cn
2.4 Support in mailing (messages, calendar invites, etc) SOAP Request/Response
1. SOAP Request
Both Unicode email addresses and ACE email addresses will be honored by all our SOAP requests
that takes email address in:
{content}
Server will convert the value specified by the a attribute and convert it to ACE before handing the
email address to any down stream code/components, since ACE is the encoding that can be resolved
and routed.
e.g.
(a)
...
...
2. SOAP Response
Email addresses in the a attribute in
{content}
are always returned in Unicode(in utf8 encoding).
Server will convert email values from ACE to unicode and return unicode in SOAP response.
This conversion is done at the SOAP response boundary, after the request is handled and while
it encodes the SOAP response.
e.g.
(a)
A subject
good day
2.5 LDAP Search Filters
For SOAP SearchDirectory and zmprov searchAccounts(sa) commands, server
converts assertion values in the query to ACE, then put together the massaged
query and hand it to LDAP search.
Note: for IDN, there is a limitation that the assertion value has to
be at least the entire label if the label is not ASCII. This is because
by RFC the ToASCII and ToUnicode algorithms are not applied to the
domain name as a whole or as individual characters, but rather to
individual labels. For example, if the domain name is
www.example.com, then the labels are www, example and com, and
ToASCII or ToUnicode would be applied to each of these three
separately.
For example, for domain my.中文.cn
- examples of good queries:
(zimbraDomainName=*中文*)
- examples of bad queries:
(zimbraDomainName=*文*)
2.6 Address Book Lookup and Display
Address book SOAP commands are not treated with any ACE <-> Unicode conversion.
Emails in address book contact will be stored and returned in SOAP responses as how
they are entered in the address book - either typed by user or added automatically.
That is, if ACE is entered, it will be stored and returned in ACE; if Unicode is
entered, it will be stored and returned in Unicode.
When email address of contact is used to address mails, it will be converted to ACE by
the server as described in 2.4.
2.6 Filter(sieve) Rules
We use jSieve for filtering mails for designated actions. During filter processing of
of headers, if the header is From, To, CC, Bcc, Reply-To, or Sender, we convert domains
into Unicode and return both Unicode and ACE names for each IDN domain.
===================
3. Client Behavior
===================
In Web client and admin console
- Input: all email addresses and/or domain names can be entered and recognized
in both unicode or ACE(ASCII Compatible Encoding).
- Output: all email addresses and domain names should be displayed in unicode, except for
address book contacts. Address book contacts are displayed in the original text
as when they are entered. See 2.6.
3.1 Web Client
http://bugzilla.zimbra.com/show_bug.cgi?id=20428
3.2 Admin Console
3.3 zmprov
- Input: all email addresses and/or domain names can be entered and recognized
in both unicode or ACE(ASCII Compatible Encoding).
- Output:
- For SOAP (zmprov -s) interface, attribute values are displayed in Unicode.
- FOr LDAP (zmprov -l) interface, attribute values are displayed in whatever is stored in LDAP, that is, ACE.