Tuesday, August 5, 2008

Java's performance problems are temporary; it'll soon be as fast as C++

Right now Java is slow: slow to load, slow to start and slow to execute. Load time can be attributed in part to Java's insistence on storing each class in a separate file and using a separate HTTP request to retrieve it. Changing to an archive that stores a bunch of classes in a single file should help quite a bit. Start time is related to load time (classes are loaded the first time they're invoked, not at startup), but also includes late binding time. In essence, every time we run a Java applet we're paying a startup penalty so the developer doesn't have to do a link step. Since the Java classes don't change all that often, doing the link and placing the linked classes in an archive is generally preferable.

This leaves execution performance as the biggest problem. And a problem it is, with Java code taking an average of twenty times longer than native C++ code. Of course, an average tells you nothing about how a specific program will behave: some Java applets run nearly as fast as C++ (likely because most of their work takes place in native code run time libraries), while others run more like fifty times slower.

The best solution to this difference in performance is to translate Java source or byte code into native code. Most platforms have native code translators either already available or under development. Native translation can make a huge difference in performance for Java applications; when applied to applets in the form of just-in-time translators they provide similar gains at the expense of a small increase in startup overhead. Suddenly Java that runs twenty times as long as C++ can get a lot closer.

So how close can Java get? Will it ever reach the point where it can replace other languages for performance-critical tasks? Does it make sense to write compute-intensive codes in Java and design the Java run time to take advantage of multiprocessor architectures?

For a lot of reasons, Java is likely to always be slower than C++ and Fortran for the typical application. (A range of 50% to 300% slower than C++ has been suggested as the practical limit of Java performance improvements.) Some of these reasons are:

Reliance on pointers for objects: Every Java object access means a pointer dereference; C++ programmers have the option of either linking objects with pointers or placing one object inside the other, thereby eliminating a level of indirection.
Reliance on heap storage for objects: While basic types (int, float, etc.) can reside on the stack, objects in Java can only be allocated on the heap. This means more work for the memory manager and the garbage collector. Stack-based objects are much faster to reclaim, giving another advantage to C++.

Garbage collection: While garbage collectors have their merits (they make programming a lot easier, they are a general solution to the memory reclamation problem. And for many application there is a specific solution to memory allocation and reclamation that will outperform the general one. Where performance is critical, the programmer can probably do a better job of handling this task than even the most cleverly written garbage collector (which the GC in today's Java most definitely is not).
Run time method selection: C++ gives the programmer the choice of using virtual or nonvirtual methods. Nonvirtual methods are implemented at functions, while virtual methods require an extra level of indirection through a method table. All Java methods are virtual, which means more overhead on every method invocation. This does make life a lot easier for the programmer, who doesn't have to worry about which methods are resolved when. But there is a small price to pay.

Insistence on object orientation: Although arguably better from a design, development and maintenance standpoint, object oriented programs tend to be written as a large number of small procedures. This means more frequent method invocations, which can mean slower code. C++ can be written in a more procedural fashion. In addition, C++ programmers can use inline declarations to eliminate function calling overhead. Java does not provide such programmer control.

Thread-safety: Java was designed as a multithreaded language. Thread-aware languages need thread-safe libraries, in which each procedure allows for the possibility that it will be invoked simultaneously from multiple threads. To avoid problems should this happen, the procedure must set up access locks around critical sections of code. This extra work makes thread-safe libraries slower than their unsafe equivalents, which is why most thread-safe C and C++ libraries are also supplied in unsafe versions. Since Java's run time libraries are of necessity thread-safe, they are another source of overhead in comparison to C++.

JIT translation has to be fast: Most compilers have optimization modules that perform highly sophisticated analysis on the code they generate to make it more efficient. One problem with optimization is that it increases compilation time, sometimes by quite a lot. But for performance critical applications that take a long time to run or are run regularly, the extra compilation time is more than compensated by the improved run time performance. Now apply this analysis to Java applets and just-in-time translation: instead of incurring the overhead once at compilation time, we would see it every time the byte code applet loads and is converted to native code. Clearly we don't want to wait any longer than necessary for our applet to start. This places severe limits on the kinds of optimizations that Java JIT compilers can do. Note that this constraint does not apply to Java applications, which can be compiled to native code just like C++ programs and can have a similar degree of optimization applied to them.

Java doesn't need to be as fast as C++; just fast enough: Here's where economics come into play: once Java is fast enough for most tasks, the emphasis vendors place on performance is likely to move onto some other area where they can offer differentiation. Improving performance takes a lot of work for each small gain. So once the perception of Java as a poor performer dissipates, the marketing advantage of improving performance does the same. It's is much like the situation with C++ compilers: although C++ has features like variable references that would permit it to run faster than the equivalent C using pointers, compilers don't take advantage of that opportunity to produce faster code. In at least this regard, Marketing and customer demand drive technology.

The upshot of all this is that Java will get a lot faster, but that there are likely limits on its performance. Java won't be as fast as C++; and C++ won't be as fast as Fortran. There will still likely be a need for at least a few different languages for different requirements.

0 comments: