python Archives - Page 2 of 2 - Probably Overthinking It

Right, left, apart, together?

July 26, 2019 AllenDowney

Is the United States getting more conservative? With the rise of the alt-right, Republican control of Congress, and the election of Donald Trump, it might seem so.

Or is the country getting more liberal? With the 2015 Supreme Court decision supporting same-sex marriage, the incremental legalization of marijuana, and recent proposals to expand public health care, you might think so.

Or maybe the country is becoming more polarized, with moderates choosing sides and partisans moving to the extremes.

In a series of articles, I’ll use data from the General Social Survey (GSS) to explore these questions. The GSS goes back to 1972; every second year they survey a representative sample of U.S. residents and ask questions about their political beliefs. Many of the questions have been unchanged for almost 50 years, making it possible to observe long-term trends.

In this article, I’ll look at political alignment, that is, whether the respondents consider themselves liberal or conservative. In subsequent articles, I’ll explore their political beliefs on a range of topics.

Political alignment

From 1974 to the most recent cycle in 2018, the GSS asked the following question, “We hear a lot of talk these days about liberals and conservatives.
I’m going to show you a seven-point scale on which the political views that people might hold are arranged from extremely liberal–point 1–to extremely conservative–point 7. Where would you place yourself on this scale?”

The following figure shows the distribution of responses in 1974 and 2018.

In 2018, it looks like there are more 1s (Extremely Liberal) and maybe more 7s (Extremely Conservative). So this figure provides some evidence of polarization.

We can get a better sense of the long term trend by taking the mean of the 7-point scale and plotting it over time. By treating this scale as a numerical quantity, I’m making assumptions about the spacing between the values. The numbers we get don’t mean much in absolute terms, but they provide a quick look at the trend.

It looks like the “center of mass” was increasing until about 1990, which means more conservative on this scale, and has been decreasing ever since. On average the country might be a little more conservative now than it was in 1974.

With the same caveat about treating this scale as a numerical quantity, we can also compute the standard deviation, which measures average distance from the mean, as a way of quantifying polarization.

The trend is clearly increasing, indicating increasing polarization, but with the way we computed these numbers, it’s hard to get a sense of how substantial the increase is in practical terms.

In the next article, I’ll look more closely at changes in political alignment over time.

I am planning to turn these articles into a case study for an upcoming Data Science class, so I welcome comments and questions.

The code I used to generate these figures is in this Jupyter notebook.

Matplotlib animation in Jupyter

July 25, 2019 AllenDowney

For two of my books, Think Complexity and Modeling and Simulation in Python, many of the examples involve animation. Fortunately, there are several ways to do animation with Matplotlib in Jupyter. Unfortunately, none of them is ideal.

FuncAnimation

Until recently, I was using FuncAnimation, provided by the matplotlib.animation package, as in this example from Think Complexity. The documentation of this function is pretty sparse, but if you want to use it, you can find examples.

For me, there are a few drawbacks:

It requires a back end like ffmpeg to display the animation. Based on my email, many readers have trouble installing packages like this, so I avoid using them.
It runs the entire computation before showing the result, so it takes longer to debug, and makes for a less engaging interactive experience.
For each element you want to animate, you have to use one API to create the element and another to update it.

For example, if you are using imshow to visualize an array, you would run

    im = plt.imshow(a, **options)

to create an AxesImage, and then

    im.set_array(a)

to update it. For beginners, this is a lot to ask. And even for experienced people, it can be hard to find documentation that shows how to update various display elements.

As another example, suppose you have a 2-D array and plot it like this:

    plot(a)

The result is a list of Line2D objects. To update them, you have to traverse the list and invoke set_xdata() on each one.

Updating a display is often more complicated than creating it, and requires substantial navigation of the documentation. Wouldn’t it be nice to just call plot(a) again?

Clear output

Recently I discovered simpler alternative using clear_output() from Ipython.display and sleep() from the time module. If you have Python and Jupyter, you already have these modules, so there’s nothing to install.

Here’s a minimal example using imshow:

%matplotlib inline

import numpy as np
from matplotlib import pyplot as plt
from IPython.display import clear_output
from time import sleep

n = 10
a = np.zeros((n, n))
plt.figure()

for i in range(n):
    plt.imshow(a)
    plt.show()
    a[i, i] = 1
    sleep(0.1)
    clear_output(wait=True)

The drawback of this method is that it is relatively slow, but for the examples I’ve worked on, the performance has been good enough.

In the ModSimPy library, I provide a function that encapsulates this pattern:

def animate(results, draw_func, interval=None):
    plt.figure()
    try:
        for t, state in results.iterrows():
            draw_func(state, t)
            plt.show()
            if interval:
                sleep(interval)
            clear_output(wait=True)
        draw_func(state, t)
        plt.show()
    except KeyboardInterrupt:
        pass

results is a Pandas DataFrame that contains results from a simulation; each row represents the state of a system at a point in time.

draw_func is a function that takes a state and draws it in whatever way is appropriate for the context.

interval is the time between frames in seconds (not counting the time to draw the frame).

Because the loop is wrapped in a try statement that captures KeyboardInterrupt, you can interrupt an animation cleanly.

You can see an example that uses this function in this notebook from Chapter 22 of Modeling and Simulation in Python, and you can run it on Binder.

And here’s an example from Chapter 6 of Think Complexity, which you can also run on Binder.

Local regression in Python

April 1, 2019 AllenDowney

I love data visualization make-overs (like this one I wrote a few months ago), but sometimes the tone can be too negative (like this one I wrote a few months ago).

Sarah Leo, a data journalist at The Economist, has found the perfect solution: re-making your own visualizations. Here’s her tweet.

And here’s the link to the article, which you should go read before you come back here.

One of her examples is the noisy line plot on the left, which shows polling results over time.

Here’s Leo’s explanation of what’s wrong and why:

Instead of plotting the individual polls with a smoothed curve to show the trend, we connected the actual values of each individual poll. This happened, primarily, because our in-house charting tool does not plot smoothed lines. Until fairly recently, we were less comfortable with statistical software (like R) that allows more sophisticated visualisations. Today, all of us are able to plot a polling chart like the redesigned one above.

This confession made me realize that I am in the same boat they were in: I know about local regression, but I don’t use it because I haven’t bothered to learn the tools.

Fortunately, filling this gap in my toolkit took less than an hour. The StatsModels library provides lowess, which computes locally weighted scatterplot smoothing.

I grabbed the data from The Economist and read it into a Pandas DataFrame. Then I wrote the following function, which takes a Pandas Series, computes a LOWESS, and returns a Pandas Series with the results:

from statsmodels.nonparametric.smoothers_lowess import lowess

def make_lowess(series):
    endog = series.values
    exog = series.index.values

    smooth = lowess(endog, exog)
    index, data = np.transpose(smooth)

    return pd.Series(data, index=pd.to_datetime(index))

And here’s what the results look like:

The smoothed lines I got look a little different from the ones in The Economist article. In general the results depends on the parameters we give LOWESS. You can see all the details in this Jupyter notebook.

Thanks to Sarah Leo for inspiring me to learn to use LOWESS, and for providing the data I used to replicate the results.

Probably Overthinking It

Data science, Bayesian Statistics, and other ideas

Browsed by
Tag: python