Saturday, December 4, 2010

Boost: Not Ready for Professional Projects?

Over the last several weeks I’ve been doing some general code maintenance that is a necessary part of software development. One of the problems with using C++, or any include based language for that matter, is the issue of the geometric growth in number of lines of code that can potentially get included.

If you have an include file that includes 2 other files and those in turn include 2 others you can see that very quickly you can get to a point where the total number of lines of code included runs into the hundreds of thousands of lines.

I got curious about just how big of a hit this was and decided to write a simple include file analyzer to see how potentially big this was. My analyzer is pretty brain dead in that all it looks for is #include statements. It doesn’t respect the preprocessor so the numbers it spits back are pretty much worst case. But it does give some very useful results.

For example many of the stl containers like map, vector etc. tend to run about 20,000 lines and a couple of hundred includes. This was surprisingly large to me. The good news is that there is a lot of overlap, so if I include both map and vector it’s only incrementally larger.

In our work we use many of the boost constructs, like boost shared_ptr. These are some very nice and useful constructs that are an important conceptual addition to the language.

When I analyzed the boost headers I was very unpleasantly surprised at how big some things were. For example the boost shared_ptr was running about 200,000 lines. Of course some of this is platform specific code, but the point is that it is a massive hit.

As a sanity check one of my colleagues ran all of our code through the preprocessors and dumped the output into a single file. It was about 25 gigabytes of data!

While it is obvious that we should be aware of these issues, in general it is really easy to miss the cost of including header files.

Boost is providing a great service to the C++ community, but they too have fallen victim to the geometric growth of include based libraries.

Since shared_ptr is now part of tr1 we have moved over to using it. In analyzing the tr1 memory file inclusion it is an order of magnitude smaller than the boost implementation.

I write this not to say that boost is bad, but to say that you need to be aware of the cost of using it. To that end I suggest the following:

  • Don’t include any boost header in any commonly used header file. Use the Pointer to Implementation (PIMPL) idiom, forward declarations or simply don’t use boost.
  • Prefer stl solutions, although you still have a massively large header file load. But 20,000 lines is nothing compared to 200,000 lines.
  • Always check the cost of our includes. It is often a bit like boiling a frog in water in that a single change is hardly noticeable. But then one day you look up and your compile time is 30 minutes.

In any case it is my hope that the writers of boost take a look at putting it on a diet. It has become a massively bloated beast.

No comments: