Adam M. Costello

Internationalized Domain Names (IDN)

As a participant in the IDN working group I contributed Punycode (RFC 3492 or a slightly clarified version) (formerly known by its working name AMC-ACE-Z) and some material in the IDNA specification (RFC 3490) (mainly in sections 2-5).

Currently, hostnames may contain only ASCII letters, digits, and hyphens, but the goal of IDN is to allow characters from a much larger set (Unicode). The working group has defined an architecture called IDNA (IDN in Applications), in which non-ASCII domain labels are represented by ASCII domain labels in a format called ACE (ASCII-Compatible Encoding), which begins with a particular prefix (not yet assigned). One of the building blocks of this scheme is a transformation between Unicode and ASCII, and Punycode was chosen to fill this role.

Implementations

I have not implemented IDN myself (except for the Punycode sample implementation contained in the Punycode spec), but I know of six implementations by others.

GNU libidn
Written in pure C (C89) and released under the GNU Lesser General Public License (LGPL). Simon Josefsson has provided an online demo. This library provides functions to help applications support IDN, but it also can also be combined with GNU libc, in which case it extends getaddrinfo() to be IDN-aware if a new AI_IDN flag is passed in.

JPNIC idnkit
Released under a BSD-like license. Includes not only an IDN library for applications, but also an IDN-aware resolver library, command-line utilities for converting DNS zone files and config files, and patches for the BIND 9 command-line lookup utilities.

Verisign IDN SDK
Released under the BSD license. Includes an IDN library in C and Java, and some command-line conversion tools.

IBM ICU
Released under the X license (as in the X Window System) (I think this license is equivalent to the post-1999 BSD license). This is a large set of libraries containing a wide array of Unicode text processing tools. A future version will include an IDN implementation, of which there is already an online demo.

IMC IDNA test tool
A perl implementation intended for testing, not performance. The online demo includes debugging output. The Punycode implementation neglects to ensure that integer arithmetic yields correct results.

Python
Starting with version 2.3 the Python interpreter includes built-in IDNA support, written in Python.

Related

Internationalized Mail Addresses in Applications (IMAA) was a proposal to extend the IDNA architecture to cover the local part (left side) of email addresses. Discussion has petered out, but you can still read the spec.

Obsolete

An earlier transformation under serious consideration for IDNA was DUDE (designed by others and tweaked by me), which is quite a bit simpler than Punycode, but the encoded strings are not nearly as compact. Some older ACEs I devised, which are all more complex and less efficient than Punycode and thus not worth considering, are AMC-ACE-W, AMC-ACE-V, AMC-ACE-R, AMC-ACE-O, AMC-ACE-M, and BRACE. I wrote a comparative evaluation of these and several other ACEs. (There is also FACE, which was never implemented or evaluated, but is probably close to DUDE in efficiency and close to Punycode in complexity, and thus not worth considering.)


[AMC]  Prepared by Adam M. Costello
 Last modified: 2006-Mar-24-Fri 06:29:44 GMT
[Any Browser]