Browsed by
Tag: python

Matplotlib animation in Jupyter

Matplotlib animation in Jupyter

For two of my books, Think Complexity and Modeling and Simulation in Python, many of the examples involve animation. Fortunately, there are several ways to do animation with Matplotlib in Jupyter. Unfortunately, none of them is ideal.

FuncAnimation

Until recently, I was using FuncAnimation, provided by the matplotlib.animation package, as in this example from Think Complexity. The documentation of this function is pretty sparse, but if you want to use it, you can find examples.

For me, there are a few drawbacks:

  • It requires a back end like ffmpeg to display the animation. Based on my email, many readers have trouble installing packages like this, so I avoid using them.
  • It runs the entire computation before showing the result, so it takes longer to debug, and makes for a less engaging interactive experience.
  • For each element you want to animate, you have to use one API to create the element and another to update it.

For example, if you are using imshow to visualize an array, you would run

    im = plt.imshow(a, **options)

to create an AxesImage, and then

    im.set_array(a)

to update it. For beginners, this is a lot to ask. And even for experienced people, it can be hard to find documentation that shows how to update various display elements.

As another example, suppose you have a 2-D array and plot it like this:

    plot(a)

The result is a list of Line2D objects. To update them, you have to traverse the list and invoke set_xdata() on each one.

Updating a display is often more complicated than creating it, and requires substantial navigation of the documentation. Wouldn’t it be nice to just call plot(a) again?

Clear output

Recently I discovered simpler alternative using clear_output() from Ipython.display and sleep() from the time module. If you have Python and Jupyter, you already have these modules, so there’s nothing to install.

Here’s a minimal example using imshow:

%matplotlib inline

import numpy as np
from matplotlib import pyplot as plt
from IPython.display import clear_output
from time import sleep

n = 10
a = np.zeros((n, n))
plt.figure()

for i in range(n):
    plt.imshow(a)
    plt.show()
    a[i, i] = 1
    sleep(0.1)
    clear_output(wait=True)

The drawback of this method is that it is relatively slow, but for the examples I’ve worked on, the performance has been good enough.

In the ModSimPy library, I provide a function that encapsulates this pattern:

def animate(results, draw_func, interval=None):
    plt.figure()
    try:
        for t, state in results.iterrows():
            draw_func(state, t)
            plt.show()
            if interval:
                sleep(interval)
            clear_output(wait=True)
        draw_func(state, t)
        plt.show()
    except KeyboardInterrupt:
        pass

results is a Pandas DataFrame that contains results from a simulation; each row represents the state of a system at a point in time.

draw_func is a function that takes a state and draws it in whatever way is appropriate for the context.

interval is the time between frames in seconds (not counting the time to draw the frame).

Because the loop is wrapped in a try statement that captures KeyboardInterrupt, you can interrupt an animation cleanly.

You can see an example that uses this function in this notebook from Chapter 22 of Modeling and Simulation in Python, and you can run it on Binder.

And here’s an example from Chapter 6 of Think Complexity, which you can also run on Binder.

Local regression in Python

Local regression in Python

I love data visualization make-overs (like this one I wrote a few months ago), but sometimes the tone can be too negative (like this one I wrote a few months ago).

Sarah Leo, a data journalist at The Economist, has found the perfect solution: re-making your own visualizations. Here’s her tweet.

And here’s the link to the article, which you should go read before you come back here.

One of her examples is the noisy line plot on the left, which shows polling results over time.


Here’s Leo’s explanation of what’s wrong and why:

Instead of plotting the individual polls with a smoothed curve to show the trend, we connected the actual values of each individual poll. This happened, primarily, because our in-house charting tool does not plot smoothed lines. Until fairly recently, we were less comfortable with statistical software (like R) that allows more sophisticated visualisations. Today, all of us are able to plot a polling chart like the redesigned one above.

This confession made me realize that I am in the same boat they were in: I know about local regression, but I don’t use it because I haven’t bothered to learn the tools.

Fortunately, filling this gap in my toolkit took less than an hour. The StatsModels library provides lowess, which computes locally weighted scatterplot smoothing.

I grabbed the data from The Economist and read it into a Pandas DataFrame. Then I wrote the following function, which takes a Pandas Series, computes a LOWESS, and returns a Pandas Series with the results:

from statsmodels.nonparametric.smoothers_lowess import lowess

def make_lowess(series):
    endog = series.values
    exog = series.index.values

    smooth = lowess(endog, exog)
    index, data = np.transpose(smooth)

    return pd.Series(data, index=pd.to_datetime(index)) 

And here’s what the results look like:

The smoothed lines I got look a little different from the ones in The Economist article. In general the results depends on the parameters we give LOWESS. You can see all the details in this Jupyter notebook.

Thanks to Sarah Leo for inspiring me to learn to use LOWESS, and for providing the data I used to replicate the results.