Long-tailed distributions in the Internet

Long-tailed distributions in the Internet

Allen B. Downey

This page describes two papers of mine on long-tailed distributions.

I have also given a talk about this work called "Why is Internet traffic self-similar (or is it)?" The slides are here.


Lognormal and Pareto Distributions in the Internet

Abstract

Numerous studies have reported long-tailed distributions for various network metrics, including file sizes, transfer times, and burst lengths. We review techniques for identifying long-tailed distributions based on a sample, propose a new technique, and apply these methods to datasets used in previous reports. We find that the evidence for long tails is inconsistent, and that lognormal and other non-long-tailed models are usually sufficient to characterize network metrics. We discuss the implications of this result for current explanations of self-similarity in network traffic.

Papers and software

This paper appeared in Computer Communications. It is available here in Postscript and gzipped Postscript and PDF.

One of the techniques I evaluate in the paper is a test for long-tailedness based on the curvature of the complementary cumulative distribution function (ccdf). The software I used to implement this test is available here as a gzipped tar file.


Evidence for long-tailed distributions in the Internet

Abstract

We review evidence that Internet traffic is characterized by long-tailed distributions of interarrival times, transfer times, burst sizes and burst lengths. We propose a new statistical technique for identifying long-tailed distributions, and apply it to a variety of datasets collected on the Internet. We find that there is little evidence that interarrival times and transfer times are long-tailed, but that there is some evidence for long-tailed burst sizes. We speculate on the causes of long-tailed bursts.

Papers and slides

This paper appeared at the ACM SIGCOMM Internet Measurement Workshop in November 2001. The final version is available here in gzipped Postscript and PDF.

The slides I presented are here in gzipped Postscript and PDF.

The poster version is here in gzipped Postscript and PDF.