Tuesday, November 3, 2009

Module isolation

Henry Ford was the first to take the concept of modularization into the full production cycle. He took the concept of building pieces of a whole into mass production and showed that by isolating the work performed by different people into smaller types of work, the quality improved, the productivity increased, and and slow downs in production could easily be dealt with. This also dramatically increased profitability.

Module isolation fits under this concept of building parts that you later assemble. But what is a module? This is a chuck of code that is a single set of behaviors. In C, this is usually a single C file and a matching header file. Similarly, C++ follows this model but may allow multiple classes in a single header or cpp file. In Java or C#, this is usually defined in a single file which is a class file.

So, what's the point of modules? Some people with whom I have worked often wonder why modularization even matters. "You don't really need that: just make all objects part of the same system so each object has ready access to other objects."

In one of the code bases in which I have recently worked, around 35% of the code base is directly inside of a global object; let's call that object global_app. This object controls the flow of all processing like updates, rendering, which entities are displayed, gui changes, etc. It also contains hooks into most other systems like rendering, gui, animation, etc. These hooks are in the form of callbacks, timers, and so on. As a result, all systems in this global_app are fundamentally intertwined and non-separable. Unit testing is nearly non-existent and bugs are tenacious, difficult to reproduce, and impossible to isolate. Most development is done inside the application on new systems which are instantly intertwined into the system and soon become difficult to test or separate. This slows development and no system truly is stable. There is also an air of mystery about the code base where strange interactions happen that are nearly impossible to foresee and very hard to understand.

In this system, the only stable code is truly the low-level stuff which is not intertwined. But, it doesn't have to be like this.

The previous codebase I helped develop was built on the basis of completely isolated code modules... and I mean isolated. This asynchronous messaging model has many benefits. Modules could be threads, simple sub-modules, classes within an existing module, or however you find most efficient. This makes the code very flexible, extensible, and configurable. There was a fundamental system constructed for sending messages between modules without breaking threading isolation so using special care for data (i.e. concurrency concerns) was completely unnecessary. The messages had a certain size constraint, but basically, you could send nearly any data to another module but not access any other module directly.

All modules had access to the "game" object during their update cycle but they had no awareness of other modules. In fact, they couldn't know when or how any message being sent was delivered. There are no hooks from one part of the system into any other. All of the messaging was configured in a setup module that controlled the flow of communication between modules. Code modules then opened all of their mail during their update, processed that mail, prepared new messages to send, and returned. These modules could then contain other sub-modules and thus animation and locomotion could both be handled in the same module. All of this is gained at no measurable performance overhead and in most cases, leads to a faster runtime.

In this system, modules for code could be tested in isolation. You could construct a minimal game with only a few other modules and link your new module to it. You could make sweeping changes to your own module without any chance of breaking someone else's module (very little chance really). In most cases, we could easily have junior programmers working side-by-side on code without too much concern because they could work in isolation from everyone else and could not break the game. Obviously, this way of development led to a very stable codebase.

Quality and flexibility are two of the main benefits of this modularization. For me, testability and provability are also important keys that you gain because you can readily test modules in isolation or in unit tests with very simple test harnesses. This model is very similar to the language called Erlang. Recently, Microsoft has added support for this development paradigm to .Net with something called Axum.

I sure hope that Ubisoft Vancouver does well and for my part, I say that engine model is brilliant.

2 comments:

Unknown said...

What's really interesting about all of this is that here in Vancouver we are actually currenlty debating about the validity of modular code bases. Many of our young blood don't see the need for this modularization and just want to hack it up. Even I would claim that some of it is a bit excessive.

What I find a problem with modular code is that it takes time to setup and usually creates many steps to be implemented when doing initial setup. You also tend to introduce dependencies when sharing systems that make it more difficult learn the code and more risky to change code. When various systems depend on shared code making changes to it risks unsabilizing many systems. Non-modular isolation, where everything is basically not shared, makes things easy to change as there little to no dependencies so your changes are very localized. Of course when you know all of the small tools at your disposal sharing modules makes developing the initial system very fast since you can reuse what others have done before you.

Where is the line? Modular code can make code dependencies more obvious to the next person in line but it also hides dependencies. Non-modular code gives the coder more freedom to experiment with out effecting others. For example I find it advantageous to remove dependencies on other libraries for system local version for this very reason, by creating a local hash table instead of using the common one allowed me to experiment on performance improvements. Of course this actually talks more about modular design as it makes it easy to introduce that isolated case without modifying what others are using.

Obviously I'm in favor of modular code but debating these will bring us more insight into where the line should be drawn so one does not make things overly modular and hurt performance

Mickey Kawick said...

In the code base where I currently work, there is almost no isolation. Every part of the code 'knows' about almost all others. This creates incredible spaghetti. Minor changes to an existing system have terrible knock-on bugs and simple game play changes often take days where they should take minutes.

I never realized the maintainence benefits of code modules until I left a company where it is widely used and started working for a company where no one does. The difference in development effort to do simple tasks is staggering.

I just finished a dialog box system that allows our server module to launch a dialog box on a client machine from data and from a script. This feature also had a 'wrold exploration component' to it. The P4 changelist I submitted contained almost 60 files and was painful to buddy-review. It also took a week and a day.

Most systems programmers would agree that this should have been at most a two-day project.

Modularization is not a 'silver bullet' but it sure helps.