Computational Modeling Fall 2008 For today, you should: 1) read Chapter 4 and Barabasi/Albert paper 2) prepare an evaluation package Today: 1) chapter 4 suggestions 2) quiz 2 solutions 3) work on chapter 4 4) mini-lecture on CDFs? 5) optional "Pimp My Code" challenge For next time you should: 1) work on Chapter 4 2) read a review a swappy book, any unread chapter, last recommended. 3) brace yourself for a quiz on Chapter 4 and Barabasi/Albert Chapter 4 suggestions --------------------- 4.1 The coding part should be straightforward, but feel free to use my code if you want to allocate effort elsewhere. 4.2 Again, coding this up is a good way to test your understanding and get more practice, but again, use my code if you want to make up some time. If you have not had stats, I will do a mini-lecture on CDFs either this time or next. 4.3 Should be easy. If not, ask. 4.4 Think about this one a little. If you understand what I am asking you to do, it is a powerful, simple tool for testing whether a dataset is well modelled by a closed form distribution. 4.5 Optional for your book., but it would make a good quiz question. 4.6 This one is high value/effort, and interesting (I think). 4.7 Synthesizes chapters 3 and 4. Save some time for this one, but don't overkill it. 4.8 Totally optional, relatively low value/effort. Warning: I started to do this using anydbm and found what seemed to be superlinear growth in run time. So it quicky became too slow to work with (in the time I had). So one of the challenges of this exercise is keeping the run times practical. Using a better database (like MySQL) might help. Quiz 2 Solutions ---------------- def is_clique(self, vs): for i, v in enumerate(vs): for j, w in enumerate(vs): if i <= j: continue try: self[v,w] except KeyError: return False return True Saves a factor of 2 by check each edge only once. Saves a little overhead by using try...except and accessing the graphs internal dictionaries directly rather than using get_edge or has_edge. This method is O(n^2) where n is the length of (vs). If n is unrelated to |V| and |E|, then it would be considered O(1) as the size of the graph increases. If n is O(|V|) then this algorithm is O(|V|^2) 2) Quoth the Wikipedia: "The simplest implementation of the Dijkstra's algorithm stores vertices of set Q in an ordinary linked list or array, and operation Extract-Min(Q) is simply a linear search through all vertices in Q. In this case, the running time is O(|V|^2+|E|)=O(|V|^2). For sparse graphs, that is, graphs with fewer than |V|^2 edges, Dijkstra's algorithm can be implemented more efficiently by storing the graph in the form of adjacency lists and using a binary heap, pairing heap, or Fibonacci heap as a priority queue to implement the Extract-Min function efficiently. With a binary heap, the algorithm requires O((|E|+|V|) log |V|) time (which is dominated by O(|E| log |V|) assuming every vertex is connected, that is, |E| ≥ |V| - 1), and the Fibonacci heap improves this to O( | E | + | V | log | V | ) amortized time." In the case of a small world graph, we know that the number of edges is k|V|, which is O(|V|), so either implementation of the priority queue yields O(|V| log |V|). So I accepted that answer. HOWEVER, there is one additional consideration. In the small world graph, the edges are all the same length, so we don't need a priority queue at all, a simple queue will do. That eliminates the log term and yields O(|V|). 3) This implementation achieves the theoretically expected order of growth, O(|V|^3). Pimp My Code Challenge ---------------------- See your email for details. Look for ways to improve the code, including: 1) Better run time (order of growth or just overhead)? 2) Less space used (including run time stack)? 3) Fewer lines of code? 4) Improved readability/maintainability? 5) Other?