Load Balancing Paper (SIGMETRICS '96)

Exploiting Process Lifetime Distributions for Dynamic Load Balancing

by Mor Harchol-Balter and Allen B. Downey.

This paper was first presented at a poster session at SOSP '95 (Symposium on Operating Systems Principles) in December 1995.

It later appeared in the Proceeedings of ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, May 23-26, 1996, Philadelphia, PA, pages 13-24.

The paper received the ACM Sigmetrics '96 Conference award for Best Integration of Systems and Theory.

Click here for the Postscript version.

An expanded version of the paper appeared in the August 1997 issue of ACM Transactions on Computer Systems (TOCS).

For a related paper Mor and I wrote, click here.

For a related paper by Kris Bubendorfer, click here.

The simulator we used is available here. And here is the documentation for it. And here are the traces we used as input to the simulator. If you use this software or data, please let us know.


Abstract

We measure the distribution of lifetimes for UNIX processes and propose a functional form (a Pareto distribution) that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration policies, and with a non-preemptive load balancing strategy. We find that, contrary to previous reports, the performance benefits of preemptive migration are significantly greater than those of non-preemptive migration, even when the memory-transfer cost is high. Using a model of migration costs representative of current systems, we find that preemptive migration reduces the mean delay (queueing and migration) by 35 -- 50%, compared to non-preemptive migration.

Conclusions

  • Migrating a long job away from a busy host helps not only the long job, but also the many short jobs that are expected to arrive at the host in the future. A busy host is expected to receive many arrivals because of the serial correlation (``burstiness'') of the arrival process.

  • Preemptive migration outperforms non-preemptive migration even when memory-transfer costs are high, for the following reason: non-preemptive name-based strategies choose processes for migration that are expected to have long lives. If this prediction is wrong, and a process runs longer than expected, it cannot be migrated away, and many subsequent small processes will be delayed. A preemptive strategy is able to make a more accurate prediction about the duration of a process (based on the its age) and, more importantly, if the prediction is wrong, it can recover by migrating the process later.

  • Using the functional form of the distribution of process lifetimes, we have derived a criterion for the minimum time a process must age before being migrated. This criterion is parameterless and robust across a range of loads.

  • Exclusive use of mean slowdown as a metric of system performance understates the benefits of load balancing as perceived by users, and especially understates the benefits of preemptive load balancing.

  • Although preemptive migration is difficult to implement, several systems have chosen to implement it for reasons other than load balancing. Our results suggest these systems would benefit from preemptive load balancing.