Adam M.
Costello
Internationalized Domain Names (IDN)
As a participant in the IDN
working group I contributed Punycode (RFC 3492 or a slightly
clarified version) (formerly known by its working name
AMC-ACE-Z) and some material in the IDNA specification (RFC 3490) (mainly in
sections 2-5).
Currently, hostnames may contain only ASCII letters, digits, and
hyphens, but the goal of IDN is to allow characters from a much larger
set (Unicode). The working group
has defined an architecture called IDNA (IDN in Applications), in which
non-ASCII domain labels are represented by ASCII domain labels in a
format called ACE (ASCII-Compatible Encoding), which begins with a
particular prefix (not yet assigned). One of the building blocks of
this scheme is a transformation between Unicode and ASCII, and Punycode
was chosen to fill this role.
Implementations
I have not implemented IDN myself (except for the Punycode sample
implementation contained in the Punycode spec), but I know of six
implementations by others.
- GNU libidn
- Written in pure C (C89) and released under the GNU Lesser
General Public License (LGPL). Simon Josefsson has provided an online demo. This library
provides functions to help applications support IDN, but it also can
also be combined with GNU libc, in which case it extends getaddrinfo()
to be IDN-aware if a new
AI_IDN
flag is passed in.
- JPNIC idnkit
- Released under a BSD-like license. Includes not only an IDN library
for applications, but also an IDN-aware resolver library, command-line
utilities for converting DNS zone files and config files, and patches
for the BIND 9 command-line lookup utilities.
- Verisign IDN
SDK
- Released under the BSD license. Includes an IDN library in C and
Java, and some command-line conversion tools.
- IBM ICU
- Released under the X license (as in the X Window System)
(I think this license is equivalent to the post-1999 BSD
license). This is a large set of libraries containing a wide
array of Unicode text processing tools. A future version will
include an IDN implementation, of which there is already an online demo.
- IMC IDNA test tool
- A perl implementation intended for testing, not performance. The
online demo includes debugging output. The Punycode implementation
neglects to ensure that integer arithmetic yields correct results.
- Python
- Starting with version 2.3 the Python interpreter includes built-in IDNA support, written in Python.
Related
Internationalized Mail Addresses in Applications (IMAA) was a proposal
to extend the IDNA architecture to cover the local part (left side) of
email addresses. Discussion has petered out, but you can still read the
spec.
Obsolete
An earlier transformation under serious consideration for IDNA
was DUDE (designed by others and tweaked by
me), which is quite a bit simpler than Punycode, but the encoded
strings are not nearly as compact. Some older ACEs I devised, which
are all more complex and less efficient than Punycode and thus not
worth considering, are AMC-ACE-W, AMC-ACE-V, AMC-ACE-R,
AMC-ACE-O, AMC-ACE-M, and BRACE. I
wrote a comparative evaluation of these and
several other ACEs. (There is also FACE, which
was never implemented or evaluated, but is probably close to DUDE in
efficiency and close to Punycode in complexity, and thus not worth
considering.)