Computational models and program synthesis for parallel outofcore computation
by Li, Zhiyong, Ph.D., Duke University, 1996, 182 pages; AAT 9704729
Abstract (Summary)
As the performance gap between processors and memory systems continues to increase, memory and I/O subsystems are increasingly becoming the major bottleneck for many I/Ointensive outofcore applications. To address this problem, new models of parallel computation and new methods of program synthesis for outofcore computation are needed. This thesis presents our contributions in these two areas.
We first introduce the concept of resource metrics to characterize various models of parallel computation. Based on this concept, we introduce a LogPHMM model for modeling parallel machines with multilevel memories and a communication network. More specifically, LogPHMM is a representative of a class of models that are formed by combining a network model with a hierarchical memory model. We introduce a variant of the LogPHMM model, the LogPUMH model, which combines the LogP model with the UMH model. We present several nearoptimal fast Fourier transform (FFT) and sorting algorithms for both models.
We then introduce an algebraic method for synthesizing efficient parallel outofcore programs. This method uses tensor products to represent a class of algorithms with recursive computational structures, known as block recursive algorithms. To obtain the improved performance, we synthesize programs of the block recursive algorithms for two variants of the LogPHMM model, a singleprocessor multidisk model, and a multiprocessor multidisk model. These two models share a common property, namely that outofcore data is distributed in a blockcyclic manner. We introduce a methodology, based on tensor bases, to describe this data distribution on multiple disks. We then derive efficient programs by the following steps. First, we use a greedy or a dynamic programming algorithm to transform the tensor product formulas of block recursive algorithms into an efficient form. Second, we synthesize efficient programs by analyzing the structures of the mathematical representations for both the input computation and the data distribution.
We demonstrate the effectiveness of our approach by synthesizing parallel outofcore FFT programs for the CM5 with the parallel file access. The experimental results show that, the synthesized FFT programs with the parallel file access run up to 10 times faster than the FFT programs with the serial file access.
Indexing (document details)
Advisor: 

School: 
Duke University 
School Location: 
United States  North Carolina 
Keyword(s): 

Source: 
DAIB 57/09, p. 5753, Mar 1997 
Source type: 
Dissertation 
Subjects: 

Publication Number: 
AAT 9704729 
ISBN: 
97805911136010 



