Internal resource selection

Next: Overview Up: Introduction Previous: Introduction

Internal resource selection

Large parallel computers are usually shared by many users. Two mechanisms that support this sharing are

Space-sharing:: The machine may be partitioned into sets of processors (clusters). Each cluster is allocated to a single job that is allowed to run to completion (RTC).
Time-sharing:: More than one job may be allocated to a cluster, in which case each job runs for some quantum of time before being preempted to allow other jobs to run.

The advantages of space-sharing are a dedicated system image (a cluster behaves like a small dedicated system), low overheads (preemption is expensive) and high parallel efficiencies (since jobs tend to run on fewer processors). One disadvantage is that RTC scheduling sometimes allows short jobs to sit in queue waiting for long jobs, resulting in poor average turnaround times. The systems we are studying, the Intel Paragon at the San Diego Supercomputer Center and the IBM SP2 at the Cornell Theory Center, are configured as space-sharing systems using RTC scheduling.

Many programs running on these systems are malleable, meaning that they can run on a range of cluster sizes. At present few if any of these jobs can change cluster sizes dynamically; once they begin execution, they do not yield or allocate processors. In this context, internal resource selection means choosing a cluster size for each job before it begins execution.

When a job arrives at the head of the queue, it may allocate some or all of the processors that are currently free, or it may wait for additional processors to become available. If it chooses to wait, we assume for now that the other jobs in queue must also wait. In other words, we assume first-come-first-served (FCFS) queueing. In future work, we will relax this assumption and address more general queueing policies.

In summary, we use the following model of computation:

There is no timesharing. Jobs allocate processors exclusively and are not preempted until they complete (or exceed their allocated time).
At least some jobs in the system are malleable, but once they begin execution, they cannot change cluster sizes dynamically.
If there are not enough free processors for a job, it waits in a first-come-first-serve (FCFS) queue.
Jobs in different partitions are independent. We do not consider communication interference.

Our approach to internal resource selection is to allow each job, when it arrives at the head of the queue, to choose the cluster size that minimizes its turnaround time, which is the sum of its queue time and run time. To make this decision, the job must know its run time on a range of cluster sizes, and predict the queue time until each cluster size is available.

In [4] we develop a speedup model that predicts jobs' run times; in this paper we present techniques for predicting queue times. We make these predictions in two steps: first, we develop a conditional lifetime model that predicts the remaining lifetime of a job based on its current age. Then we aggregate these job-by-job predictions into a queue time prediction.

Next: Overview Up: Introduction Previous: Introduction

Allen Downey
Fri May 30 15:09:42 PDT 1997