A blog by Gary Bernhardt, Creator & Destroyer of Software

Unicode Weirdness

14 Feb 2007

I'm writing some tests to verify that BitBacker doesn't explode if it sees unicode filenames. For a while, I thought that OS X's terminal wasn't unicode-aware, because non-ASCII unicode characters just showed up as "?":

grbmbp:~ grb$ ls z*
z???       z??????    z????????? z???       z???

Then I happened to pipe the ls through a grep, and the unicode characters printed correctly:

grbmbp:~ grb$ ls z* | grep '.*'
z໐
z두
z툃
z䌨
z冕

What? Well, I guess I'll take it...

While posting this, it got even more fun. All five of the characters above print normally in the terminal and Finder, but only two print normally in Opera's text edit control. I wonder how many will show up once this is published. Can you see them in your RSS reader and/or browser?

Update: After publishing, I viewed the page in Safari and the characters displayed exactly like they did in the terminal and Finder. So at least Opera didn't mangle the bytes. However, Firefox draws the first three incorrectly (the same three that Opera couldn't draw at all). Unsurprisingly, IE7 can't draw any of them.