Sunday, December 12, 2010

Code Invariance and C++ Templates

I have been working to improve compilation times recently using all the usual things, interfaces, pointer to implementation (PIMPL) wrappers, forward declarations, etc.

In doing this my biggest target is the boost libraries. These are orders of magnitude larger than you might think. In analyzing the amount of headers that a typical boost header includes I've found them to be 2-10 times the size of a typical STL header.

The STL headers are already bloated. For example <string> runs to about 30,000 lines typically.

First of all std::string is a template to produce 2 common variants std::string and std::wstring. While it could produce more I can't think of any that I would typically need or want.

Design Guideline:

Don't use templates when a non-template implementation is just as good or better.

If you look at the std::string API you discover that you should be able to prototype it in less than 200 lines of code. Now there are arguments for inlining for performance reasons that make inlining it a nice option.

However, most of the time I don't need the performance benefits. It is also very difficult to forward declare a string and the STL implementors don't provide us with one that we can use. They provide but no other forward declarations for any of the container classes.

The result is that any file, even a 10 line file, that I want to use a string in becomes effectively a 30,000 line file.

This is a huge price to pay when the code it generates is invariant. Now I know that compilers are smart and can cache this, and some do a pretty good job. And computers are a lot faster today. But when I end up spending several hours a day waiting for compiles, productivity suffers.

The point here is that templates are effectively programs that generate code. That code only needs to change when the template changes and the generated code needs only to be done once for each combination of types that you are using.

With the design of C++ this is done once for each compilation unit. It is redone anytime a compilation unit is invalid. This leads to an N^2 algorithm on top of needing to parse thousands of lines of code.

There are other invariants in compilation as well. In fact any code that hasn't changed and won't generate different machine code is invariant. Compilers and programmers move invariant code out of repetitive tasks.

We need a compiler design that does the same thing.

Today we see typical compilation times of minutes and hours when in reality they could often be milliseconds. Current if I change 1 character in a header that is included in all of my code then all of it needs to be recompiled. If that character results in a 1 byte difference in the generated machine code, it should be instantaneous.

Effectively this is the same problem as we used to have with typewriters. If I add 1 character I may need to retype the whole page. If I add a paragraph I may need to retype all the following pages. This is why we used to have many secretaries in businesses. Now we have word processors and we are far more efficient.

Language designers need to apply the lessons of complexity to the design of the language. Library writers need to apply the lessons of complexity to the design of libraries.

A small amount of work on the part of the STL library designers would allow us to have massive savings in the time it takes to compile.

The C++ language already gives us options for making the use of templates less invariant.

Let's take std::string as an example. While I don't think std::string as a template is even marginally a good idea, let's assume for a minute that it is. To give C++ users the most flexibility we need the following:

<string_fwd> - Provides a forward declaration of std::string in as few lines as possible so it can be used to reduce the header load in header files that only need a forward declaration.

<string_proto> - Provides the prototype for std::string without implementing any methods using as few lines of code as possible. This allows me to use std::string in compilation units without having to pay the price at compile time of compiling all the methods in string and then optimizing their inline use and then emitting the template code.

<string> - Provides the full std::string implementation as it stands now so that I can use it in cases where performance is critical and can also explicitly instantiate it where needed.

This pattern applied to all of the STL would be easy to implement and give us more ability to control how long compiles take.

While I suppose I could create a wrapper class for std::string, std::vector and the other containers are much more compelling at templates. I have been playing with doing a forward declaration and that seems to be feasible, but suffers from some extra maintenance.

With any luck someone will start listening and fix some of these issues.

Boost Filesystem Version 3

I've been looking at the boost filesystem as a possible replacement for file manipulation and directory parsing.

In doing so I discovered that the comparison operator for boost::filesystem3::path is horribly slow. I'm not entirely sure why, it may be the iterator.

In doing this I implemented a simplistic wrapper to avoid the massive include load. The filesystem 3 path.hpp include looks like it includes well over 100,000 lines of headers. My wrapper runs only about 30,000 lines almost all of which is because I'm including <string>

In my operator< comparison function I replaced the compare with a .native() string comparison and found what appears to be about a 10X improvement in performance on the Mac.

The more I dig into boost the more I'm convinced that it must be used with extreme caution.

Saturday, December 4, 2010

The Include Model: Yet Another Bad Idea

It's probably been at least 20 years now since I first recognized that the include model for programming languages was a bad idea. Others have been saying this for much longer.

First let me explain the model. The idea is that you have a set of items that you want to be the same in many contexts. For example in C++ you have a class prototype, or defines, etc. In HTML you may have style sheets and other things.

So what happens is you code up these items in a file that gets included at the point they are needed. You write it once and then use it many times. So far so good.

Now the not so obvious problem is that each include may include several others. And those in turn may include more. The result is an explosion of code. In C++ we put include guards in to prevent multiple inclusion and this helps a lot, but even so a single include can lead to thousands of lines of code.

In the C++ standard template library the inclusion of something like <vector> is about 20,000 lines of code.

So why is it that we keep making the same language design mistake?

Probably because the model is so simple.

The result is that we can end up with compilers taking several minutes or even hours to compile code.

Why should it take more than a few milliseconds to update my executable when all I did was change one line of code?

My challenge to the C++ and other standardization committees is to once and for all expunge the idea of an inclusion model for the language from all standards.

In reality all that is needed is to move to a database model of compilation. Conversion from the current text based model would be required, but in reality this is already done by compilers now.

I'm sure there are challenges, like what to do with preprocessing, but those can be overcome.

While I'm not an expert on compilers, I do know something having written a couple of them in my career. I've also talked to people who write compilers that agree that this is feasible and in fact behind the scenes a lot of stuff is done that is very close to what we need.

All that we are missing is for the standardization committees to move forward.

Let's not waste time on more language niceties until we fix the fundamental flaws.

The include model is a fatal fundamental flaw of many languages that collectively costs software companies billions of dollars in lost productivity.

As I write this I also am reminded that char * is another bad idea that came from the same guys that brought us #include. char * as a C and C++ language construct is also responsible for billions of dollars of lost productivity over the years. Think about how much time used to be wasted looking for that off by 1 error or security holes that were caused by buffer overruns in char * arrays...

It is unfortunate that some of these ideas don't get fixed sooner. I sometimes wonder if the language creators are writing code. If they were it would seem like they would be a lot more motivated to fix these issues.

I would love to hear some counter point arguments as to why the include model is still with us.

Unexpected Technology

It is the technology that is unanticipated that excites us the most. For example we have recently seen the ability to scan credit cards on a smartphone. This is probably something that many have thought of over the years, but it is something that I personally didn’t think of before it happened.

When people see technology that they didn’t anticipate it’s like a magic trick where the outcome is a total surprise to the audience.

The real trick is coming up with a bit of technological magic that is unanticipated.

Blindly Following the Advice of Respected Writers

Over the last couple of weeks I have had two occasions where I found fatally flawed arguments in the writings of very respected writers in the C++ community.

In both cases my gut told me when I first read these things that they were wrong. In one case I and others took the advice and ran with it for several years only to be burned later on by what should have been obvious.

Case 1: In Scott Meyers book on Effective STL he gives the advice to use std::for_each for three reasons. He contends that it is faster, less prone to error and more readable. However, after using this advice and seeing how these things were implemented, especially in conjunction with the use of boost::bind or tr1::bind it is clear that all three arguments are simply wrong.

First it is not faster. If you analyze how an stl iterator translates into C++ code you find that iterators are in fact almost identical to the fastest possible hand coded C++ loop that you can write. In addition he contends that the implementors may not write the fastest possible iterators and then in almost the same breath he says that the implementors know more about their implementation than we do so they can write a faster for_each.

To which I say if they can’t write a fast iterator what makes you think they will write a faster for_each.

In analyzing the for_each implementations they are a thin wrapper over a for loop. But with a side effect. The side effect is that end() is called only once before the loop. Most of the time this isn’t an issue, but sometimes it is.

The second contention is that it is less prone to error. Unfortunately, this is simply not true. The syntax is harder to decipher and because of that it is very easy to create a functor that does extra copies and so in the end it is slower.

Finally he contends that it is more readable. Which in my experience is clearly false. Stuffing too much information on one line is bad.

Consider

for(int i = 0; i < n; i++) { temp[i] = b[i]; b[i] = a[i]; a[i] = temp[i]; }

vs.

for(int i = 0; i < n; i++) {
temp[i] = b[i];
b[i] = a[i];
a[i] = temp[i];
}

Which is easier to read. To me the second is the clear choice and most people I know would agree.

Stuffing this into a std::for_each would mean writing a functor, using something like lambda operators etc. However, the more obvious solution to making this more readable is to write a function.

So in the end in the light of day the analysis, my gut, and actual experience says that it is bad advice.

Case 2: In the book C++ Coding Standards the authors make the statement that you shouldn’t have a standard for the placement of things that don’t affect the readability of the code such as curly braces. The interesting thing is if you carefully read what they say they state that they have chosen a formatting for the book to enhance readability. Which on the surface should be enough to call into question their logic.

They also suggest that you be consistent and that you use the standard in the file you are in.

The problem is that if you work with a team of people in C++ you are often adding virtual functions to a class hierarchy, which may be a symptom of bad design, but is more often than not just part of extending the capability of the software. In doing this you will often be working across a class hierarchy that is maintained by more than one person.

This leads to a clear problem. Let’s say you are adding a virtual function. Do I write it like this:

bool T::supportsMyNewFeature() const {
return true;
}


or

bool T::supportsMyNewFeature() const
{
return true;
}

If one file uses one style of curly brace formatting and another the other then I’m constantly having to switch between styles depending on the file.

If I impose my style instead of using the team standard then files quickly become inconsistent and there is no file specific style so the only reasonable option is to default to the team standard.

If I use the team standard then files that don’t use the standard quickly become inconsistent and so the only reasonable option is to default back to the team standard.

If I use the team standard then there is no problem.

There are cases where imposing a team style may not be needed. Such as cases where one person is responsible for an entire module. While this works, generally over time that module will eventually be taken over by someone else or by several people who may not share you view on formatting. They either have to use your formatting or reformat to theirs or to the team standard.

Team standards make these decisions easier.

So back to the main point of this post. Don’t believe everything you read. The writers of these books are very smart people, but they do make mistakes. Some of them are obvious when you examine them. Others may not be so obvious and lead you down a path that leads to problems. When you realize you have been mislead it is time to let everyone know and change how you do things.

BOOST_FOREACH: A Bad Implementation of a Good Idea

I am more and more often running across constructs, especially from boost, that I find to be intuitively poor.

BOOST_FOREACH is the latest one that I've looked at.

First let me state that I like the concept of for each as is implemented in many languages. C++ doesn't happen to implement it and some really smart people decided that it would be cool to create something that has the expressive niceties of for each.

One result is BOOST_FOREACH. However, the include file runs to over 67,000 lines of code and in my opinion the readability is worse.

Why?

Consider the following snippet.

BOOST_FOREACH(T & a, myClass.getAVector()) {
foo(a);
}

Here is another way to write it:

T &aVec = myClass.getAVector();
for(std::vector::iterator it = aVec.begin(); it != aVec.end(); ++it) {
foo(a);
}

And another which is more like std::for_each:

T &aVec = myClass.getAVector();
std::vector::iterator begin = aVec.begin();
std::vector::iterator end = aVec.end();
for(std::vector::iterator it = begin; it != end; ++it) {
foo(a);
}

Which is easier to understand?

Is the invariant getAVector() removed from the loop?

Is end called for each loop or once at the beginning like std::for_each?

Did you know that std::for_each is more like my second example than the first?

If there is a side-effect of calling getAVector() what is the behavior?

What happens if the size of the container changes during the loop?

Is the expanded macro efficient?

Can everyone on your team answer these questions?

Can new developers that may come into your team answer these questions?

Should we spend 67,000 lines of extra code for every compilation unit that uses BOOST_FOREACH to save a few dozen characters?

My conclusion is that BOOST_FOREACH is not more readable and the cost is not worth the header load even if it were.

I will not be using it.

Boost: Not Ready for Professional Projects?

Over the last several weeks I’ve been doing some general code maintenance that is a necessary part of software development. One of the problems with using C++, or any include based language for that matter, is the issue of the geometric growth in number of lines of code that can potentially get included.

If you have an include file that includes 2 other files and those in turn include 2 others you can see that very quickly you can get to a point where the total number of lines of code included runs into the hundreds of thousands of lines.

I got curious about just how big of a hit this was and decided to write a simple include file analyzer to see how potentially big this was. My analyzer is pretty brain dead in that all it looks for is #include statements. It doesn’t respect the preprocessor so the numbers it spits back are pretty much worst case. But it does give some very useful results.

For example many of the stl containers like map, vector etc. tend to run about 20,000 lines and a couple of hundred includes. This was surprisingly large to me. The good news is that there is a lot of overlap, so if I include both map and vector it’s only incrementally larger.

In our work we use many of the boost constructs, like boost shared_ptr. These are some very nice and useful constructs that are an important conceptual addition to the language.

When I analyzed the boost headers I was very unpleasantly surprised at how big some things were. For example the boost shared_ptr was running about 200,000 lines. Of course some of this is platform specific code, but the point is that it is a massive hit.

As a sanity check one of my colleagues ran all of our code through the preprocessors and dumped the output into a single file. It was about 25 gigabytes of data!

While it is obvious that we should be aware of these issues, in general it is really easy to miss the cost of including header files.

Boost is providing a great service to the C++ community, but they too have fallen victim to the geometric growth of include based libraries.

Since shared_ptr is now part of tr1 we have moved over to using it. In analyzing the tr1 memory file inclusion it is an order of magnitude smaller than the boost implementation.

I write this not to say that boost is bad, but to say that you need to be aware of the cost of using it. To that end I suggest the following:

  • Don’t include any boost header in any commonly used header file. Use the Pointer to Implementation (PIMPL) idiom, forward declarations or simply don’t use boost.
  • Prefer stl solutions, although you still have a massively large header file load. But 20,000 lines is nothing compared to 200,000 lines.
  • Always check the cost of our includes. It is often a bit like boiling a frog in water in that a single change is hardly noticeable. But then one day you look up and your compile time is 30 minutes.

In any case it is my hope that the writers of boost take a look at putting it on a diet. It has become a massively bloated beast.