A model for speedup of parallel programs

by Allen B. Downey

A version of this paper has been published as U.C. Berkeley Technical Report CSD-97-933 (gzipped postscript).

The abstract, introduction, and conclusions are below.

Abstract

We propose a new model for parallel speedup that is based on two parameters, the average parallelism of a program and its variance in parallelism. We present a way to use the model to estimate these program characteristics using only observed speedup curves (as opposed to the more detailed program knowledge otherwise required). We apply this method to speedup curves from real programs on a variety of architectures and show that the model fits the observed data well. We propose several applications for the model, including the selection of cluster sizes for parallel jobs.

Introduction

Speedup models describe the relationship between cluster size and execution time for a parallel program. These models are useful for:

Modeling parallel workloads: Many simulation studies use a speedup model to generate a stochastic workload. Since our model captures the behavior of many real programs, it lends itself to a realistic workload model.
Summarizing program behavior: If a program has run before (maybe on a range of cluster sizes), we can record past execution times and use a speedup model to summarize the historical data and estimate future execution times. These estimates are useful for scheduling and allocation.
Inference of program characteristics: The parameters of our model correspond to measureable program characteristics. Thus we hypothesize that we can infer these characteristics by fitting our model to an observed speedup curve and finding the parameters that yield the best fit.

Our speedup model is a non-linear function of two parameters: A, which is the average parallelism of a job, and \sigma which approximates the coefficient of variation of parallelism. The family of curves described by this model spans the theoretical space of speedup curves. In [7], Eager, Zahorjan and Lazowska derive upper and lower bounds for the speedup of a program on various cluster sizes (subject to simplifying assumptions about the program's behavior). When sigma = 0, our model matches the upper bound; as \sigma approaches infinity, our model approaches the lower bound asymptotically.

This model might be used differently for different applications. In Downey97 and Downey98 we use it to generate the stochastic workload we use to evaluate allocation strategies for malleable jobs. For that application, we choose the parameters A and \sigma from distributions and use them to generate speedup curves. In this paper, we work the other way around -- we use observed speedup curves to estimate the parameters of real programs. Our purpose here is to show that this model captures the behavior of numerous programs running on diverse parallel architectures. This technique is also useful for summarizing the speedup curve of a job and interpolating between speedup measurements.

Conclusions

Our proposed speedup model captures the behavior of numerous scientific applications running on a variety of parallel computers, both shared- and distributed-memory. Thus, we feel that this model is a realistic choice for modeling parallel workloads.
The parameters of our model correspond to measureable program charactericstics. Thus, our observations of these benchmarks give us some insight into the values and distibutions of these parameters in a real workload.