Result: Shared-Memory Multiprocessor Trends and the Implications for Parallel Program Performance
Further information
The last decade has produced enormous improvements in processor speeds without a corresponding improvement in bus or interconnection network speeds. As a result, the relative costs of communication and computation in shared-memory multiprocessors have changed dramatically. An important consequence of this trend is that many parallel applications, which depend on a delicate balance between the cost of communication and computation, do not execute efficiently on today's shared-memory multiprocessors. In this paper we quantify the effect of this trend in multiprocessor architecture on parallel program performance. Our experiments on bus-based, cache-coherent machines like the Sequent Symmetry, and large-scale distributed-memory machines like the BBN Butterfly, demonstrate that applications scale much better on previous-generation machines than on current machines. In addition, we show that some scalable machines support fine-grain, shared-memory programs better than some bus-based, cache-coherent machines, without significantly greater programming effort. From our experiments we conclude that communication has become a dominant source of inefficiency in shared-memory multiprocessors, with serious consequences for system software involved in scheduling and decomposition decisions. In particular, we argue that shared-memory programming models that could be implemented efficiently on the machines of yesterday do not readily port to state-of-the-art machines, and that current software trends in support of fine-grain parallel programming are at odds with hardware trends.