Other resources that are worthwhile to add to this package:
* http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/
* Full Unicode diacritics removal
This is more complicated. We need the four forms as described in
the wikipedia article:
- http://en.wikipedia.org/wiki/Unicode_equivalence
- http://www.opensource.apple.com/source/gcc/gcc-5646/libcpp/makeucnid.c