To evaluate the accuracy of these predictors, we ran trace-driven simulations of workloads from the Intel Paragon at the San Diego Supercomputer Center. As each job arrived at the head of the queue, we calculated both predictors based on the observed state of the system, and compared the predictions with the queue times that followed.
We obtained a trace of 24906 jobs submitted to the batch partition of
the Paragon between January 1, 1995 and December 31, 1995. Using the
arrival times, cluster sizes and durations of these jobs, we simulated
their execution on a parallel computer with 368 nodes (the size of the
batch partition at SDSC). This data and our simulator are available
from http://sdsc.edu/ downey/predict.
Table 2 shows the average and median lifetimes and cluster sizes for these jobs. CV is the coefficient of variation -- the ratio of the standard deviation to the mean. The 25th and 75th percentiles are also shown.
Table 2
The jobs in these traces are not malleable (at least not from the system's point of view). Users choose the cluster size for each job; the system cannot allocate fewer, and does not allocate more.
Although the arrival times of the jobs are taken from the traces, the schedule used by the simulator differs from the actual schedule that was executed on the Paragon:
Thus, in evaluating our predictors, the ``actual'' queue times are from the schedule generated by the simulator, not from the trace data.
In our simulations, many jobs arrive at the head of the queue and find that there are enough free processors for them to begin execution immediately. Since these jobs spend no time at the head of the queue, we do not make predictions on their behalf.
In a system with malleable jobs, though, cluster sizes are not determined a priori; they are chosen according to the state of the system (number of free processors, expected queue times, etc.). Each time a job arrives at the head of the queue, we predict the queue time for a range of cluster sizes and choose the one that best satifies the requirements of the job, perhaps by minimizing its expected turnaround time.