Trace-driven simulator

To evaluate the accuracy of these predictors, we ran trace-driven simulations of workloads from the Intel Paragon at the San Diego Supercomputer Center. As each job arrived at the head of the queue, we calculated both predictors based on the observed state of the system, and compared the predictions with the queue times that followed.

We obtained a trace of 24906 jobs submitted to the batch partition of the Paragon between January 1, 1995 and December 31, 1995. Using the arrival times, cluster sizes and durations of these jobs, we simulated their execution on a parallel computer with 368 nodes (the size of the batch partition at SDSC). This data and our simulator are available from http://sdsc.edu/ downey/predict.

Table 2 shows the average and median lifetimes and cluster sizes for these jobs. CV is the coefficient of variation -- the ratio of the standard deviation to the mean. The 25th and 75th percentiles are also shown.

Table 2

tabular154

The jobs in these traces are not malleable (at least not from the system's point of view). Users choose the cluster size for each job; the system cannot allocate fewer, and does not allocate more.

Although the arrival times of the jobs are taken from the traces, the schedule used by the simulator differs from the actual schedule that was executed on the Paragon:

The Paragon at SDSC assigns different priorities to each queue and schedules higher priority jobs first. Our simulator uses strict FCFS scheduling.
The real scheduler allocates processors using a modified 2-D buddy system based on power-of-two partition sizes; most, but not all, jobs are allocated a contiguous set of nodes. The simulator allocates jobs without regard to contiguity.
Most nodes have 16MB of memory, but 128 ``fat'' nodes have 32MB. In reality, some jobs can only run on the fat nodes, but the simulator ignores this distinction.
During the year, the size of the batch partition changed several times; in the simulator we held the number of processors constant.

Thus, in evaluating our predictors, the ``actual'' queue times are from the schedule generated by the simulator, not from the trace data.

In our simulations, many jobs arrive at the head of the queue and find that there are enough free processors for them to begin execution immediately. Since these jobs spend no time at the head of the queue, we do not make predictions on their behalf.

In a system with malleable jobs, though, cluster sizes are not determined a priori; they are chosen according to the state of the system (number of free processors, expected queue times, etc.). Each time a job arrives at the head of the queue, we predict the queue time for a range of cluster sizes and choose the one that best satifies the requirements of the job, perhaps by minimizing its expected turnaround time.

Next: Results: Predictor A Up: Predicting queue times on Previous: Predictor B: mean

Allen Downey
Fri May 30 15:09:42 PDT 1997