Using Queue Time Predictions for Processor Allocation

by Allen B. Downey

This paper will appear at the 3rd Workshop on Job Scheduling Strategies for Parallel Processing which will take place in conjunction with IPPS '97.

A version of this paper has been published as U.C. Berkeley Technical Report CSD-97-929 (gzipped postscript).

This is a more recent version of the paper, the final copy I submitted to the workshop (gzipped postscript). (I think it's much better).

The abstract, introduction, and conclusions are below.

Abstract

When a malleable job is submitted to a space-sharing parallel computer, it must choose often whether to begin execution on a small, available cluster, or wait in queue for more processors to become available. To make this decision, it must predict how long it will have to wait for the larger cluster. We propose statistical techniques for predicting these queue times, and develop an allocation strategy that uses these predictions. We present a workload model based on the environment we have observed at the San Diego Supercomputer Center, and use this model to drive simulations of various allocation strategies. We conclude that prediction-based allocation not only improves the average turnaround time for the jobs; it also improves the utilization of the system as a whole.

Introduction

Like many shared resources, parallel computers are susceptible to a tragedy of the commons -- individuals acting in their own interests tend to overuse and degrade the resource. Specifically, users trying to minimize runtimes for their jobs might allocate more processors than they can use efficiently. This overindulgence lowers the effective use of the system and increases the queue times of all users' jobs.

A partial solution to this problem is a programming model that supports malleable jobs, i.e., jobs that can be configured to run on clusters of various sizes. In existing systems, these jobs cannot change their cluster sizes dynamically; that is, once they begin execution, they cannot add or yield processors.

Malleable jobs improve system utilization by using fewer processors when system load is high, thereby running more efficiently and increasing the number of jobs in the system simultaneously. But malleable jobs are not a sufficient solution to the tragedy of the commons, because users have no direct incentive to restrict the cluster sizes of their jobs. Furthermore, even altruistic users might not have the information they need to make the best decisions.

One solution to this problem is a system-centric scheduler that chooses cluster sizes automatically, trying to optimize (usually heuristically) a system-wide performance metric like utilization or average turnaround time. The problem is that such systems often force users to accept decisions that are good for the system as a whole, but contrary to their immediate interests. For example, if there is a job in queue and one idle processor, a utilization-maximizing system might require the job to run, whereas the job might obtain a shorter turnaround time by waiting for more processors.

If such strategies are implemented, there are two possible outcomes: at best, users will be unsatisfied with the system; at worst, they will take steps to subvert it. Since these systems often rely on application information provided by users, it is not hard for a disgruntled user to manipulate the system for his own benefit. In anecdotal reports from supercomputer centers, this sort of user behavior is common, and not restricted to malevolent users; rather, it is an understanding in these environments that users will take advantage of loopholes in system policies.

Given that this is true, an important property for a scheduling strategy is robustness in the presence of self-interested users. As we will show in Section 5, many commonly-proposed allocation strategies do not have this property; their overall performance degrades severely if users try, naively, to improve the performance of individual jobs.

Thus our goal is to find a scheduling strategy that does not make decisions that are contrary to the interests of the users. We propose an application-centric scheduler that uses application information (run times on various cluster sizes) and system state (predicted queue times) to chooses the cluster size with the shortest expected turnaround time for each job.

This strategy optimizes the performance of individual jobs, so users have no incentive to subvert its decisions. The question, though, is what effect these local optimizations will have on the performance of the system as a whole. Using simulations based on a stochastic workload model, we show that the performance of one such strategy exceeds that of the best system-centric scheduler, improving both system utilization and average turnaround time.

Conclusions

We have proposed an allocation policy that never creates the incentive for users to subvert the system; thus, we expect it to perform well in the presence of self-interested users. We show that this application-centric policy yields global performance as good as the best system-centric policy.
Using predicted queue times to choose cluster sizes significantly reduces the turnaround time of individual jobs, even if the prediction is not very accurate. In our simulations, the average time savings per job is 13.5 minutes (the average job duration is 78 minutes). From the point of view of individual applications, our predictive allocation strategy is within 2% of optimal.
We show that it is possible to improve the performance of some jobs using only application characteristics and ignoring the state of the system, but we observe that this strategy has a disastrous effect on the overall performance of the system.