Computational Modeling Fall 2005 For today, you should have: 1) finished HomeworkNine, Ten and Eleven! Today's outline: 0) HomeworkNine "solutions" 1) review of probability distributions 2) the Zipf distribution (in preparation for SOC, which is up next) For next time you should: 1) do HomeworkTwelve (prepare for the exam) What's on the exam? ------------------- Anything discussed in any review! Probably a question from the books (choose one of four). Anything I handed out is fair game. Discussion of modeling, theory choice, Kuhn, instrumentalism and scientific realism. Data structure selection and run time analysis. Python programming (not language details). Format: 1) open book, open notes, open Internet. 2) short answer questions (one medium length) 3) implement and/or analyze an algorithm 4) practical programming component! (this is an experiment) Processing text in Python ------------------------- import sys import string def process_file(filename): trans = string.maketrans("-'", ' ') fp = open(filename, 'r') for line in fp: line = line.translate(trans) for word in line.rstrip().split(): process_word(word) def process_word(word): word = word.strip(string.punctuation) print word def main(name, filename='', *args): process_file(filename) if __name__ == '__main__': main(*sys.argv) 1) download this code http://wb/cm/code/ReadFile.py 2) download a nice long book from gutenberg.net 3) modify ReadFile.py so that it counts the number of times each word appears and prints the words and frequencies in rank order 4) how can we plot this data to test Zipf's Law?