Computational Modeling
Fall 2008

For today, you should:

1) read Chapter 4 and Barabasi/Albert paper

2) prepare an evaluation package


Today:

1) chapter 4 suggestions

2) quiz 2 solutions

3) work on chapter 4

4) mini-lecture on CDFs?

5) optional "Pimp My Code" challenge


For next time you should:

1) work on Chapter 4

2) read a review a swappy book, any unread chapter, last recommended.

3) brace yourself for a quiz on Chapter 4 and Barabasi/Albert


Chapter 4 suggestions
---------------------

4.1 The coding part should be straightforward, but feel free
    to use my code if you want to allocate effort elsewhere.

4.2 Again, coding this up is a good way to test your understanding
    and get more practice, but again, use my code if you want
    to make up some time.

    If you have not had stats, I will do a mini-lecture on
    CDFs either this time or next.

4.3 Should be easy.  If not, ask.

4.4 Think about this one a little.  If you understand what I
    am asking you to do, it is a powerful, simple tool for testing
    whether a dataset is well modelled by a closed form distribution.

4.5 Optional for your book., but it would make a good quiz question.

4.6 This one is high value/effort, and interesting (I think).

4.7 Synthesizes chapters 3 and 4.  Save some time for this one,
    but don't overkill it.

4.8 Totally optional, relatively low value/effort.

    Warning: I started to do this using anydbm and found what
    seemed to be superlinear growth in run time.  So it
    quicky became too slow to work with (in the time I had).

    So one of the challenges of this exercise is keeping
    the run times practical.  Using a better database (like
    MySQL) might help.


Quiz 2 Solutions
----------------

def is_clique(self, vs):
    for i, v in enumerate(vs):
        for j, w in enumerate(vs):
	    if i <= j: continue
	    try:
	        self[v,w]
	    except KeyError:
                return False
    return True

Saves a factor of 2 by check each edge only once.

Saves a little overhead by using try...except and accessing
the graphs internal dictionaries directly rather than using
get_edge or has_edge.

This method is O(n^2) where n is the length of (vs).

If n is unrelated to |V| and |E|, then it would be considered
O(1) as the size of the graph increases.

If n is O(|V|) then this algorithm is O(|V|^2)


2) Quoth the Wikipedia:

"The simplest implementation of the Dijkstra's algorithm stores
vertices of set Q in an ordinary linked list or array, and operation
Extract-Min(Q) is simply a linear search through all vertices in Q. In
this case, the running time is O(|V|^2+|E|)=O(|V|^2).

For sparse graphs, that is, graphs with fewer than |V|^2 edges,
Dijkstra's algorithm can be implemented more efficiently by storing
the graph in the form of adjacency lists and using a binary heap,
pairing heap, or Fibonacci heap as a priority queue to implement the
Extract-Min function efficiently. With a binary heap, the algorithm
requires O((|E|+|V|) log |V|) time (which is dominated by O(|E| log
|V|) assuming every vertex is connected, that is, |E| ≥ |V| - 1), and
the Fibonacci heap improves this to O( | E | + | V | log | V | )
amortized time."

In the case of a small world graph, we know that the number of
edges is k|V|, which is O(|V|), so either implementation of the
priority queue yields O(|V| log |V|).  So I accepted that answer.

HOWEVER, there is one additional consideration.  In the small
world graph, the edges are all the same length, so we don't
need a priority queue at all, a simple queue will do.  That
eliminates the log term and yields O(|V|).


3) This implementation achieves the theoretically expected
   order of growth, O(|V|^3).


Pimp My Code Challenge
----------------------

See your email for details.

Look for ways to improve the code, including:

1) Better run time (order of growth or just overhead)?

2) Less space used (including run time stack)?

3) Fewer lines of code?

4) Improved readability/maintainability?

5) Other?