Friday, April 25, 2014

Unicode anyone?

As you may have guessed from my previous post I've been digging into Unicode. One may think it odd that I haven't done this sooner, but when working on legacy applications it is hard to justify supporting more modern things when you spend all your time trying to make legacy software better.

Fortunately, I'm now in a position of writing fresh new code that can be built on a strong basis. As I thought through many of the things I wanted to do I fairly quickly came to the conclusion that moving forward I needed to write things using Unicode.

Should be easy. Unicode has been around for 20 years now so, unlike the mere 3 years since adoption of C++ 11, everything should work and the skids should be smooth and well greased.

Of course not!

Windows seems to like UTF-16. That's fine. Other operating systems seem to like UTF-8 or UTF-16.

Out of the box on Mac and Ubuntu I can do something like the following:

    std::cout <<  u8"Α-Ωα-ω\n";

Of course on Windows not only is the u8" syntax not yet supported, but even if you get a std::string with a proper UTF-8 encoding that won't work either.

Turns out that I can set the cmd.exe console encoding to use UTF-8, and it works great if I use printf for my string, but std::cout doesn't work. To top things off Microsoft decided to explicitly disallow UTF-8 in their std::locale implementation. So I can't tell std::cout to send things to the console as UTF-8. Instead it appears that I will need to use printf or find another obscure way of outputting Unicode in my unit test console based application.

I'm not sure what this means, but it does give hope to those worried that the machines will take over. It will likely take them several decades to figure out the mess we have made with software.


g++ is dead

Lately I've been working on a project that involves using C++ to develop libraries for Windows, Mac, Ubuntu, iOS and Android, with an eye toward quality and portability.

Recently I decided that I needed to build a Unicode class to support a forward thinking basis for things. Of course C++ is generally not Unicode friendly, but it isn't all that unfriendly either. Especially with C++ 11.

So I dig in and start learning and coding and came up with a first pass of my class on Windows using Visual Studio 2013. So far so good.

Now let's go over to Ubuntu where where we have compilers like g++ 4.8 and Clang 3.3 that are reportedly more compliant than Visual Studio.

The first thing I notice is a missing include . The next thing I notice is that the API for std::basic_string that I was trying to emulate is well, just plain wrong! Not even close! 

I start digging and find that the GNU standard C++ library is about 6 sigmas off of supporting C++ 11. How can you claim your compiler is C++ 11 compliant but the library is not?

Fortunately, I eventually got to the point of using the new libc++ library which Apple switched to awhile ago and things work, in clang. But when I try to build for Android I can't get the experimental clang support to work. Probably because of 2 or 3 really important steps that I missed, but nonetheless I gave up.

Working through the problem I manged to hack around the deficiencies in the GNU standard library with regard to std::basic_string, but got nowhere with trying to get the codecvt stuff working. Turns out that the other option of using iconv isn't built into Android. I would have needed to jump through several major hoops to compile it and hook it in.

I eventually wrote my own conversion routines between UTF-8, UTF-16 and UTF-32.

So to the point of my post.

It is 2014. C++ 11 was ratified, in March of 2011 and formally adopted in August of 2011. So why, 3 years later, is the GNU library languishing? I would not have expected full support on day 1 or even 1 year after adoption, but 3 years? And the thing that is particularly annoying is that some of the stuff that isn't supported is trivial to correct.

My sense is that the ideas behind GNU and the "Free" software thing are fallacious, which I speculate is a major reason the Apple decided to ditch GNU and that we now see that a more open "open source" implementation in Clang and libc++ is now displacing GNU.

While I will likely continue to keep an eye on the GNU C++ compiler for awhile, it is now a distant #3 in my compiler recommendation list after Visual Studio and Clang. This leads me to the prediction that G++ will be a footnote in the history of C++ compilers. Although they could turn this around, but I'm not going to hold my breath.