Handling Internationalized domain name (IDN) in scripting
IDN is helping many, but caused a lot of trouble initially for many programming languages, specially in parsing and reading their contents.
The focus of this article is on scripting languages : bash, perl, python and ruby reading contents from an IDN or simply say an URL with unicode chars in them.
The ICANN has approved many TLD out of which for this senario I shall select 'http://☃.net' (snowman!)
curl 'http://☃.net' # Awesomeness of curl!
Well, it's pretty easy to handle it in perl. Don't forget to
#!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use feature ":5.10"; use utf8; $req = HTTP::Request->new(GET => 'http://☃.net'); say $ua->request($req)->content;
Not as easy as Perl, but still manageable. Do check this BUG
#!/usr/bin/env python domain = '☃.net' url = 'http://'+unicode(domain, "utf8").encode("idna") urllib2.urlopen(url).read()
Before we proceed, it's must to know, Punycode
Punycode is an instance of Bootstring that uses particular parameter values specified by RFC 3492 to transfer encoding Internationalized Domain Names in Applications (IDNA). It uniquely and reversibly transforms a Unicode string into an ASCII string. ASCII characters in the Unicode string are represented literally, and non-ASCII characters are represented by ASCII characters that are allowed in host name labels (letters, digits, and hyphens).
unicode(domain, "utf8").encode("idna") would result in a punycode like 'xn--n3h.net'
Ruby : URI doesn't implement unicode domains! (Shame?) (BUG)
Workaround, yes you guessed it right!
$ sudo gem install addressable and don't forget encoding: utf-8.
#!/usr/bin/env ruby # encoding: utf-8 require "rubygems" require "addressable/uri" require "open-uri" url = Addressable::URI.parse('http://☃.net').normalize.site open(url).read()
That's it from me, do let me know if you find better ways of handling this! Happy Hacking!
Recent blog posts
- watir-webdriver web inspector
- gem list to gemfile
- Packing ruby2.0 on debian.
- Made it into The Guinness Book!
- to_h in ruby 2.0
- Filter elements by pattern jQuery.
- Better HTML password fields for mobile ?
- Grayscale image when user offline
- nth-child CSS pseudo-class Christmas colors
- EventEmitter in nodejs