=============== 1. What is IDN =============== IDN stands for Internationalized Domain Names. An IDN is an Internet domain name that contains non-ASCII characters. Such domain names could contain letters with diacritics, as required by many European languages, or characters from non-Latin scripts such as Arabic or Chinese. However, the standard for domain names does not allow such characters, nor could such domain names be handled by the existing DNS and name resolver infrastructure. http://www.icann.org/topics/idn.html defines Internationalized Domain Names in Applications (IDNA), in that non-ASCII domain names should be converted into standard ASCII domain names by applications while preserving the stability of the domain name systems. IDNA specifies how this conversion is to be done. Per http://www.icann.org/topics/idn.html, the three RFCs: 3490, 3491, and 3492 together define the IDNA protocol/behavior. The 3 RFCs boil down to the following summaries: (1) Non-ASCII domain names should be converted to a standard ASCII-based form, aka, ASCII Compatible Encoding(ACE) by applications before handing them to DNS and name resolver systems. (2) An IDNA-enabled application should be able to convert between the restricted-ASCII and non-ASCII representations of a domain, using the ASCII form in cases where it is needed (such as for DNS lookup), but being able to present the more readable non-ASCII form to users. (3) Support for the two conversion functions: A. ToASCII: Used before sending an IDN to something that expects ASCII names (such as a resolver) or writing an IDN into a place that expects ASCII names (such as a DNS master file). B. ToUnicode: Used when displaying names to users. For example: Unicode name ACE name --------------------------------------- my.中文.com my.xn--fiq228c.com my.xyz中文abc.com my.xn--xyzabc-dw7i870n.com 中文.xyz中文abc.com xn--fiq228c.xn--xyzabc-dw7i870n.com Note: - The conversion is on "label(dot separated segments) basis", not on individual characters nor the entire domain name. - Let's say {X}{Y}{Z} are non-ASCII characters, ACE for {X} is not necessarily a substring of ACE for {X}{Y} or ACE for {X}{Y}{Z}. links: http://www.icann.org/topics/idn.html http://en.wikipedia.org/wiki/Internationalized_domain_name ================================ 2. IDN Support in Zimbra Server ================================ http://bugzilla.zimbra.com/show_bug.cgi?id=14225 2.1 Support for ToASCII and ToUnicode Conversions Adapted third party GNU IDN Java lib (http://www.gnu.org/software/libidn) for the two conversion functions. 2.2 Stored values in LDAP IDN domain names and email addresses are converted to ACE and stored in LDAP in ACE. (Note, for domain names and email addresses that do not contain non-ASCII characters, i.e., not containing IDN, their ACE value is identical to the unicode value.) For the following reasons: 1. Each domain label are stored in the dc attribute in LDAP. dc is of syntax IA5String, which is "ASCII character set 7-bit" and does not allow non-ASCII characters. 2. ACE in domain names and email addresses are the real routable names, they need to be seen by components like Postfix when accessed from LDAP. 3. For consistency, IDNs are converted to ACE and stored in ACE across the board for all attributes that contain domain names or email addresses. 2.3 Support in account/admin SOAP Request/Response 1. SOAP Request Both Unicode names and ACE names will be honored by all our SOAP requests that identify accounts, calendar resources, distribution lists, aliases, and domains by name. (Note, this is supported in LdapProvisioning via the Provisioning interface. The names be opaque to callers of Provisioning.) e.g. (a) my.中文.cn equals to my.xn--fiq228c.cn (b) user1@my.中文.cn equals to user1@my.xn--fiq228c.cn 2. SOAP Response (A) name attribute and element in all SOAP responses are always returned in unicode(in utf8 encoding) e.g. (a) ... (b) ... (c) ... user1@my.中文.cn ... (B) attributes containing IDN domain names or email addresses are always returned in unicode(in utf8 encoding) Server will convert the values from ACE to unicode and return unicode values in SOAP responses for attributes that are either: - of type email or emailp, (declared in zimbra-attrs.xml) or - has idn flag (in zimbra-attrs.xml) e.g. (a) my.xn--fiq228c.cn domain <=== not an idn flagged or email attribute dcObject organization zimbraDomain my.中文.cn <=== an idn flagged attribute, stored LDAP value is my.xn--fiq228c.cn 0eef5297-b212-4990-aad0-79e30825bf83 local my enabled active (b) FALSE ... user1@my.中文.cn <=== an email attribute, stored LDAP value is user1@my.xn--fiq228c.cn alias-of-user1@my.中文.cn <=== an email attribute, stored LDAP value is alias-of-user1@my.xn--fiq228c.cn ooo@yahoo.com <=== an email attribute, stored LDAP value is ooo@yahoo.com (c) mydomain.local <=== an idn flagged attribute, stored LDAP value is mydomain.local (d) my.中文.cn <=== an idn flagged attribute, stored LDAP value is my.xn--fiq228c.cn 2.4 Support in mailing (messages, calendar invites, etc) SOAP Request/Response 1. SOAP Request Both Unicode email addresses and ACE email addresses will be honored by all our SOAP requests that takes email address in: {content} Server will convert the value specified by the a attribute and convert it to ACE before handing the email address to any down stream code/components, since ACE is the encoding that can be resolved and routed. e.g. (a) ... ... 2. SOAP Response Email addresses in the a attribute in {content} are always returned in Unicode(in utf8 encoding). Server will convert email values from ACE to unicode and return unicode in SOAP response. This conversion is done at the SOAP response boundary, after the request is handled and while it encodes the SOAP response. e.g. (a) A subject good day 2.5 LDAP Search Filters For SOAP SearchDirectory and zmprov searchAccounts(sa) commands, server converts assertion values in the query to ACE, then put together the massaged query and hand it to LDAP search. Note: for IDN, there is a limitation that the assertion value has to be at least the entire label if the label is not ASCII. This is because by RFC the ToASCII and ToUnicode algorithms are not applied to the domain name as a whole or as individual characters, but rather to individual labels. For example, if the domain name is www.example.com, then the labels are www, example and com, and ToASCII or ToUnicode would be applied to each of these three separately. For example, for domain my.中文.cn - examples of good queries: (zimbraDomainName=*中文*) - examples of bad queries: (zimbraDomainName=*文*) 2.6 Address Book Lookup and Display Address book SOAP commands are not treated with any ACE <-> Unicode conversion. Emails in address book contact will be stored and returned in SOAP responses as how they are entered in the address book - either typed by user or added automatically. That is, if ACE is entered, it will be stored and returned in ACE; if Unicode is entered, it will be stored and returned in Unicode. When email address of contact is used to address mails, it will be converted to ACE by the server as described in 2.4. 2.6 Filter(sieve) Rules We use jSieve for filtering mails for designated actions. During filter processing of of headers, if the header is From, To, CC, Bcc, Reply-To, or Sender, we convert domains into Unicode and return both Unicode and ACE names for each IDN domain. =================== 3. Client Behavior =================== In Web client and admin console - Input: all email addresses and/or domain names can be entered and recognized in both unicode or ACE(ASCII Compatible Encoding). - Output: all email addresses and domain names should be displayed in unicode, except for address book contacts. Address book contacts are displayed in the original text as when they are entered. See 2.6. 3.1 Web Client http://bugzilla.zimbra.com/show_bug.cgi?id=20428 3.2 Admin Console 3.3 zmprov - Input: all email addresses and/or domain names can be entered and recognized in both unicode or ACE(ASCII Compatible Encoding). - Output: - For SOAP (zmprov -s) interface, attribute values are displayed in Unicode. - FOr LDAP (zmprov -l) interface, attribute values are displayed in whatever is stored in LDAP, that is, ACE.