first cut at testing unicode normalization