Saturday, August 29, 2009

Converting a Monolithic Application to DLLs

Over the last several months I have embarked on a rather difficult task of converting a monolithic application into several DLLs. This application has been in need of this for about 10 years now, but we kept putting it off because it was hard and adding new features seemed to be the thing to do.

Plan from the beginning to use DLLs. It is easy to do, leads to organizational improvements, and will pay off many times over in time savings.

The hard part in all of this is to determine what to move first. This is where being familiar with your code base can pay off. The first thing we did was to start moving some of the core items. As it turned out IO code and error handling came first for us. Then we were able to hit some of the core classes.

When working on this project I kept thinking that I would hit a point where things would be easier to move. However, it seems that there is a never ending supply of couplings caused by poor choices about where code was placed.

There are several design choices that make moving code to a DLL hard. One is when base classes rely on derived classes. Intuitively this seems like it is generally a bad idea. It is amazing how easy it is for this occur though. Often the reasons for doing this don't seem horrible at the time and there often aren't any short term repercussions that make doing this obvious.

About all you have to prevent this sort of thing is peer code reviews and developing better coding habits.

Another problem is collections of classes that are all stored in a single file. Adding a new file when you create a new class isn't difficult, but it does take a couple of minutes to do and it is awfully tempting to save those couple of minutes for that little class you are creating.

Yet another problem are methods that don't really belong in a class. It is a bit of a trap to think that all methods should belong in a class. However, there are cases where you have what I would call a bridging routine that deals with two or more classes but doesn't clearly belong in either. Often what I see is the method put into one of the classes taking as an argument the other class or classes.

This creates a coupling between the classes. It is very easy to end up with couplings like this that end up linking loosely related classes. In a social network it doesn't take very many hops until you find a group of people that you don't know. It is the same with classes. These loose couplings create a network of interdependency. Social networks also often lead back to you. The same can happen with classes.

When trying to move a set of classes that are coupled like this it is very easy for a few loose couplings to spread out and pull in very large chunks of your application. A few cuts here and there is often all that is required break a single class out. The more couplings the harder it is to break out.

Obviously there need to be couplings between classes otherwise you can't have an application. In observing these couplings it occurred to me that just as in other networks there are nodes that naturally form in classes. Some classes have very few links and can be thought of a little bit as leaves. They are referred to by other classes but don't refer to any other classes.

I've observed two categories of nodes. Node classes which have methods that dispatch to many classes and base classes which have many other classes derived from them.

When moving to a DLL the base class nodes need to move early. The dispatch nodes end up needing to move late.

I've been thinking about this observation of node classes and have come to the conclusion that we should design node classes to do little more than be dispatchers to other classes. They shouldn't do any work beyond what is required to do the dispatching.

These node classes will form naturally as your classes form and start interacting. If you don't recognize the formation of one of these classes you can easily find that a node class has turned into a monolithic class that does too much. Since all roads tend to lead to these classes it is a natural reaction to put more and more stuff into them.

In object oriented design there are guidelines, like a base class should be a pure virtual. This idea is supported by the observation of node classes. Make your base classes as simple as possible. Possibly without data or any methods. When you add methods they should be only to support the required communication between classes. If you need more function in a derived class then create a new class that provides that function.

The second form of node class is the dispatcher. It is easily identified because it's implementation will include many header files. These tend to be more problematic for moving into a DLL as they have many couplings. They are also problematic for other reasons.

The problem that I've observed is that these classes are also often classes that were designed to perform a specific task and they grew into dispatchers that pulled in a lot of other classes because of poor design choices.

One trick to dealing with these classes when you port is to create a pure virtual mix-in class that defines pure virtual declarations of the methods you need to access in the DLL. In the end the set of these methods that get created can indicate a set of possible functions that should be pulled into a mix-in class that defines the communication interface.

After working on this project for awhile I'm wholly convinced that all applications should be built as libraries from the very start with a very minimal main executable to kick things off. Besides enforcing better modularity in design this also helps to cut down build and link times which can easily get out of hand.

We use a build tool called IncrediBuild to speed up the compile process. Without it we would be waiting hours to build the entire application. With it the build can complete in minutes. With good modularization you should be able to treat your DLLs like third party products that only get updated infrequently. The more of these you have the less you have to recompile when making changes.

Hopefully you find this information useful. Even with my many years of experience I was unprepared for the cost of moving things to a DLL. Hopefully you can prevent your projects from getting into this state or justify starting to work on this sooner. It will cost more and more the longer you put this off. So getting started on it early.

No comments: