Computational models and program synthesis for parallel out-of-core computation

by Li, Zhiyong, Ph.D., Duke University, 1996, 182 pages; AAT 9704729

Abstract (Summary)

As the performance gap between processors and memory systems continues to increase, memory and I/O subsystems are increasingly becoming the major bottleneck for many I/O-intensive out-of-core applications. To address this problem, new models of parallel computation and new methods of program synthesis for out-of-core computation are needed. This thesis presents our contributions in these two areas.

We first introduce the concept of resource metrics to characterize various models of parallel computation. Based on this concept, we introduce a LogP-HMM model for modeling parallel machines with multi-level memories and a communication network. More specifically, LogP-HMM is a representative of a class of models that are formed by combining a network model with a hierarchical memory model. We introduce a variant of the LogP-HMM model, the LogP-UMH model, which combines the LogP model with the UMH model. We present several near-optimal fast Fourier transform (FFT) and sorting algorithms for both models.

We then introduce an algebraic method for synthesizing efficient parallel out-of-core programs. This method uses tensor products to represent a class of algorithms with recursive computational structures, known as block recursive algorithms. To obtain the improved performance, we synthesize programs of the block recursive algorithms for two variants of the LogP-HMM model, a single-processor multi-disk model, and a multi-processor multi-disk model. These two models share a common property, namely that out-of-core data is distributed in a block-cyclic manner. We introduce a methodology, based on tensor bases, to describe this data distribution on multiple disks. We then derive efficient programs by the following steps. First, we use a greedy or a dynamic programming algorithm to transform the tensor product formulas of block recursive algorithms into an efficient form. Second, we synthesize efficient programs by analyzing the structures of the mathematical representations for both the input computation and the data distribution.

We demonstrate the effectiveness of our approach by synthesizing parallel out-of-core FFT programs for the CM-5 with the parallel file access. The experimental results show that, the synthesized FFT programs with the parallel file access run up to 10 times faster than the FFT programs with the serial file access.

Indexing (document details)


Reif, John H.


Duke University

School Location:

United States -- North Carolina


block recursive


DAI-B 57/09, p. 5753, Mar 1997

Source type:



Computer science

Publication Number:

AAT 9704729