Introduction

On space-sharing parallel computers, it is useful to be able to predict how long a submitted job will be queued before processors are allocated to it. Some of the applications of these predictions are:

Load metrics:: They provide a measure of load that is more concrete than abstractions such as load average, allowing users to make decisions about what jobs to run, where to run them or what size problems they can solve in an allotted time.
Internal resource selection:: They allow malleable parallel jobs (jobs that do not require a specific number of processors, but can run on a range of cluster sizes) to choose a cluster size that is appropriate for the current state of the machine. This type of allocation is also called adaptive partitioning.
External resource selection:: They allow distributed jobs to choose among various computing resources on a network, based on the quality of service they expect to receive at each host. As part of the DOCT project [13] we are planning to implement the techniques proposed here to support resource selection in a distributed, heterogeneous environment.

For different applications, we will make our predictions at different times: for external resource selection, we need to predict the entire queue time from arrival to beginning of execution; for internal resource selection we will consider only the time from arrival at the head of the queue until beginning of execution, which is sometime called wait time.

The focus of this paper is internal resource selection, and thus we will be making predictions when jobs arrive at the head of the queue. In future work we will extend these techniques to include the entire time a job spends in queue, and apply those predictions to external resource selection.

Allen Downey
Fri May 30 15:09:42 PDT 1997