Monday, April 26, 2010

Explain the conceptual distinctions between scripting languages, C++, C, and assembly

Recently, I was queried the following:
Explain the conceptual distinctions between scripting languages, C++, C,
and assembly and when it is appropriate and inappropriate to use each.

Everyone has a slightly different view on this topic, but there are a few generalizations that can be made.

Scripting languages are often languages like Lua, Python, and even Flash. They do require deep knowledge usually, but they are great for prototyping ideas, providing 'mod'able designs for external developers or users, creating GUI, and providing a first pass for testing concepts that will be later integrated into a full product. They are usually performance-intensive and often slow and they certainly require a lot more horsepower than languages like C/C++ and assembly. The general rule is: if human performance is more important than machine performance (it usually is), then write it in a scripting language. There is one major caveat: your scripting language must be relatively fast, robust enough to handle errors, and have rapid iteration. A product I've worked on has Lua code that must be processed, has no debugging capability, and will crash the game and as a result makes development of anything in Lua a soul-crushing slog.

C++ is a slightly heavier set of instructions than C meaning that generally it is slower. With that said, C++ can be made faster than C (virtual functions vs large case statements, template code, etc). In addition, developing for C++ requires much deeper knowledge of language semantics and syntax than C as well as having mysterious compiler messages because of type info and others. While C++ aims at code reuse as one of it's raison d'etre, it often doesn't achieve that aim. However, modularity is a given in C++ and initialization comes in the form of the c'tor and d'tor. The language lends itself naturally to extensible design and less maintenance. Basically, if you are doing large-scale work, need a long-term development cycle, and possibly expect a version two of your product, C++ is the most likely direction you should choose.

C is an old standby and much of the embedded world is still in C. This language is roughly 30% slower than assembly and is roughly 3x as fast to develop for than assembly (on a small scale). On larger scales (50K lines or more), C is much more manageable than assembly but not nearly as much as C++ (in general). Clearly, many projects have chosen to take all of the old C files and add the cpp extension simply to make many simple tasks easier to code. This provides no performance benefit over straight C. Because of the lack of real encapsulation in C, the lack of language supported initialization (new, delete, c'tor, d'tor), and polymorphism, C is dramatically harder to maintain on large-scale projects. But it is widely supported and loved.

Assembly sits at the core of most machines and is only slightly above the low-level machine code that pushes bits around. In truth, assembly is not needed anymore as a skill set for the overwhelming majority of computer programmers (sad to say since I developed those skills highly at one point). Computers are simply fast enough to account for any slowdown that C or C++ might introduce including even many scripting languages. Most games are built with Database access, networking, file systems, and various other slow IO. Writing anything in assembly will not make your product much faster when IO is your main bottleneck. Still, assembly code can be fast. Always profile your code and if you find a performance bottleneck that isn't bound by a hardware limitation, then consider doing that section in assembly. Never write assembly because you think that something might be slow. Many simple looping operations are lightning fast and you would be hard-pressed to write better code than the assembler.


1 comment:

Dan Cobban said...

Recently I have been studying languages so here is my two cents.

Assembly language is the lowest level language and is processor specific. This means that code written in assembly does not port to other processors. At it's core assembly language really only maps the actual instruction numbers to some label name in order to add some readability. When working with hardware assembly is the only way to create the functionality you wish and is the only time in which you should use assembly, when there is no other option.

C is a step above Assembly Language by abstracting several patterns into a more readable format. Providing a means to describe how your memory is used in a readable form. It also has the benefit of removing the complexities of the hardware from your code using common syntax no matter what the hardware platform you intend to run the application on. The C compiler is used to bind the C code by converting it to your hardware instructions. For these reasons C became standardized such that users had a reliable set of language features to depend on.

As a result of standardization C++ was created to continue abstracting widely used patterns into language features. C++ contains the entire C language and has extended features for Object Oriented Development (function overloading, polymorphism and inheritance) and features for code generation based on a pattern (templates). Also like C, C++ code is compiled to hardware instructions for native code execution.

Scripting languages break the mold of the compiled language and are instead interpreted. This means that languages like Python, lua, and perl don't compile there code into hardware instructions but instead create an intermediate instruction set that the interpreter performs the actions requested. Scripting languages are generally created to solve a specific problem, for example perl is designed to parse data by making regular expressions a language feature making it efficient good for searching and formatting different data sources (like the web). Generally scripting languages are strong for what they are written to solve and are less suited for general use (though there are notable exceptions to this, Python for example). Basically if you want flexibility and are willing to sacrifice performance for that flexibility a scripting language is a strong choice.

Modern languages are all together different though, C# and Java are both, Just In Time (JIT) compiled languages. Though both have an interpreted environment, like scripting languages, they don't operate on a fixed set of instructions. Instead they dynamically optimize the instruction being executed. Scripting languages, C/C++ and even assembly execute a fixed set of instructions that can only be changed by recompiling. C# dynamically generates hardware instructions directly and modifies the instructions when it finds it detects an incomplete solution, Java does the same only on the interpreted instructions instead of the hardware instructions. This kind of instruction generation has proven to match C/C++ in performance while providing the flexibility of scripting languages.

In the end it really doesn't matter to much what language you are using provided you have the supporting libraries you need for that language. Game developers use C/C++ mostly because all of the libraries for game consoles are created for that language. Scripting languages are integrated into such applications to allow for higher level functionality not provided by C++ for a select set of uses where performance is not critical. But what scripting system is used depends on the implementor.

Dan