Computational Modeling Fall 2005 For today, you should have: 1) finished HomeworkThirteen Today's outline: 1) Zipf's Law 2) Bak, Tang, Wiesenfeld For next time, you should: 1) do HomeworkFourteen, which I will post this afternoon Zipf! ----- An empirical law about the frequencies of words in natural languages. Turns out to apply to lots of other domains, including the popularity of web pages. Handout from last time explains the Zipf distribution. The most common way to test Zipf's Law is to plot f, frequency (the number of times something occurs) vs. r, rank (how many items are more or equally frequent) The observation is that f ~ 1/r or more generally f ~ r ^ -b Intuitively, this means that every time you double the rank, you cut the frequency in half (for b=1). For example, the current sales rank of "How to Think Like a Computer Scientist" on Amazon is 333,939. If we double our sales, we would expect to move up to ~150,000. Taking the log of both sides, we get log(f) ~ -b * log(r) So if we plot f versus r on a log-log scale, we see a straight line with slope -b. 1) Download http://wb/cm/code/Zipf1.py 2) Fill in rank_freq 3) process your favorite book and plot the output