Global communication analysis and optimization

Chakrabarti, Soumen ; Gupta, Manish ; Choi, Jong-Deok (1996) Global communication analysis and optimization ACM SIGPLAN Notices, 31 (5). pp. 68-78. ISSN 0362-1340

[img] PDF
1MB

Official URL: http://doi.org/10.1145/249069.231391

Related URL: http://dx.doi.org/10.1145/249069.231391

Abstract

Reducing communication cost is crucial to achieving good performance on scalable parallel machines. This paper presents a new compiler algorithm for global analysis and optimization of communication in data-parallel programs. Our algorithm is distinct from existing approaches in that rather than handling loop-nests and array references one by one, it considers all communication in a procedure and their interactions under different placements before making a final decision on the placement of any communication. It exploits the flexibility resulting from this advanced analysis to eliminate redundancy, reduce the number of messages, and reduce contention for cache and communication buffers, all in a unified framework. In contrast, single loop-nest analysis often retains redundant communication, and more aggressive dataflow analysis on array sections can generate too many messages or cache and buffer contention. The algorithm has been implemented in the IBM pHPF compiler for High Performance Fortran. During compilation, the number of messages per processor goes down by as much as a factor of nine for some HPF programs. We present performance results for the IBM SP2 and a network of Sparc workstations (NOW) connected by a Myrinet switch. In many cases, the communication cost is reduced by a factor of two.

Item Type:Article
Source:Copyright of this article belongs to Association for Computing Machinery
ID Code:130977
Deposited On:02 Dec 2022 04:46
Last Modified:27 Jan 2023 10:00

Repository Staff Only: item control page