Saturday, December 4, 2010

The Include Model: Yet Another Bad Idea

It's probably been at least 20 years now since I first recognized that the include model for programming languages was a bad idea. Others have been saying this for much longer.

First let me explain the model. The idea is that you have a set of items that you want to be the same in many contexts. For example in C++ you have a class prototype, or defines, etc. In HTML you may have style sheets and other things.

So what happens is you code up these items in a file that gets included at the point they are needed. You write it once and then use it many times. So far so good.

Now the not so obvious problem is that each include may include several others. And those in turn may include more. The result is an explosion of code. In C++ we put include guards in to prevent multiple inclusion and this helps a lot, but even so a single include can lead to thousands of lines of code.

In the C++ standard template library the inclusion of something like <vector> is about 20,000 lines of code.

So why is it that we keep making the same language design mistake?

Probably because the model is so simple.

The result is that we can end up with compilers taking several minutes or even hours to compile code.

Why should it take more than a few milliseconds to update my executable when all I did was change one line of code?

My challenge to the C++ and other standardization committees is to once and for all expunge the idea of an inclusion model for the language from all standards.

In reality all that is needed is to move to a database model of compilation. Conversion from the current text based model would be required, but in reality this is already done by compilers now.

I'm sure there are challenges, like what to do with preprocessing, but those can be overcome.

While I'm not an expert on compilers, I do know something having written a couple of them in my career. I've also talked to people who write compilers that agree that this is feasible and in fact behind the scenes a lot of stuff is done that is very close to what we need.

All that we are missing is for the standardization committees to move forward.

Let's not waste time on more language niceties until we fix the fundamental flaws.

The include model is a fatal fundamental flaw of many languages that collectively costs software companies billions of dollars in lost productivity.

As I write this I also am reminded that char * is another bad idea that came from the same guys that brought us #include. char * as a C and C++ language construct is also responsible for billions of dollars of lost productivity over the years. Think about how much time used to be wasted looking for that off by 1 error or security holes that were caused by buffer overruns in char * arrays...

It is unfortunate that some of these ideas don't get fixed sooner. I sometimes wonder if the language creators are writing code. If they were it would seem like they would be a lot more motivated to fix these issues.

I would love to hear some counter point arguments as to why the include model is still with us.

No comments: