Long-tailed distributions in the InternetAllen B. Downey
This page describes two papers of mine on long-tailed distributions.
I have also given a talk about this work called "Why is Internet traffic self-similar (or is it)?" The slides are here.
Lognormal and Pareto Distributions in the Internet
AbstractNumerous studies have reported long-tailed distributions for various network metrics, including file sizes, transfer times, and burst lengths. We review techniques for identifying long-tailed distributions based on a sample, propose a new technique, and apply these methods to datasets used in previous reports. We find that the evidence for long tails is inconsistent, and that lognormal and other non-long-tailed models are usually sufficient to characterize network metrics. We discuss the implications of this result for current explanations of self-similarity in network traffic.
Papers and softwareThis paper appeared in Computer Communications. It is available here in Postscript and gzipped Postscript and PDF.
One of the techniques I evaluate in the paper is a test for long-tailedness based on the curvature of the complementary cumulative distribution function (ccdf). The software I used to implement this test is available here as a gzipped tar file.
Evidence for long-tailed distributions in the Internet
AbstractWe review evidence that Internet traffic is characterized by long-tailed distributions of interarrival times, transfer times, burst sizes and burst lengths. We propose a new statistical technique for identifying long-tailed distributions, and apply it to a variety of datasets collected on the Internet. We find that there is little evidence that interarrival times and transfer times are long-tailed, but that there is some evidence for long-tailed burst sizes. We speculate on the causes of long-tailed bursts.
Papers and slidesThis paper appeared at the ACM SIGCOMM Internet Measurement Workshop in November 2001. The final version is available here in gzipped Postscript and PDF.