What does a confidence interval mean?

What does a confidence interval mean?

Here’s another installment in Data Q&A: Answering the real questions with Python. In general, I will try to focus on practical problems, but this one is a little more philosophical.

confidence
Standard deviation of a count

Standard deviation of a count

This post is part of a new project with the working title Data Q&A: Answering the real questions with Python. In each installment, I’ll take a question from Reddit’s statistics forum and answer it, using Python code to demonstrate. My answer is in a Jupyter notebook — see the link below to run it in Colab.

count_data
Data Q&A

Data Q&A

Today I’m starting a new project with the working title Data Q&A: Answering the real questions with Python. In each installment, I’ll take a question from Reddit’s statistics forum and answer it, using Python code to demonstrate. The first installment is a question about the harmonic mean, which is a recurring topic of discussion on Reddit. It’s in a Jupyter notebook — see the link below to run it in Colab.

harmonic
Think Python Goes to Production

Think Python Goes to Production

Think Python has moved into production, on schedule for the official publication date in July — but maybe earlier if things go well.

To celebrate, I have posted the next batch of chapters on the new site, up through Chapter 12, which is about Markov text analysis and generation, one of my favorite examples in the book. From there, you can follow links to run the notebooks on Colab.

And we have a cover!

The new animal is a ringneck parrot, I’ve been told. I will miss the Carolina parakeet that was on the old cover, which was particularly apt because it is an ex-parrot. Nevertheless, I think the new cover looks great!

Huge thanks to Sam Lau and Luciano Ramalho for their technical reviews. Both made many helpful corrections and suggestions that improved the book. Sam is an expert on learning to program with AI assistants. And Luciano was inspired by the turtles to make an improved module for turtle graphics in Jupyter, called jupyturtle. Here’s an example of what it looks like (from Chapter 5):

If you have a chance to check out the current draft, and you have any corrections or suggestions, please create an issue on GitHub.

And if you would like a copy of the book as soon as possible, you can read the Early Release version and order from O’Reilly here or pre-order the third edition from Amazon.

.

The Gender Gap in Political Beliefs Is Small

The Gender Gap in Political Beliefs Is Small

In previous articles (here, here, and here) I’ve looked at evidence of a gender gap in political alignment (liberal or conservative), party affiliation (Democrat or Republican), and policy preferences.

Using data from the GSS, I found that women are more likely to say they are liberal, and more likely to say they are Democrats, by 5-10 percentage points. But in their responses to 15 policy questions that most distinguish conservatives and liberals, men and women give similar answers.

In other words, the political gap is mostly in what people say about themselves, not in what they believe about specific policy questions.

Now let’s see if we get similar results with ANES data. As with the GSS, I looked for questions where liberals and conservatives give different answers. From those, I selected questions about specific policies, plus four questions related to moral foundations, with preference for questions asked over a long period of time. Here are the 16 topics that met these criteria:

For each question, I identified one or more responses that were more likely to be given by conservatives, which is what I’m calling “conservative responses”.

Not every respondent was asked every question, so I used a Bayesian method based on item response theory to fill missing values. You can get the details of the method here.

As in the GSS data, the average number of conservative responses has gone down over time.

Men give more conservative responses than women, on average, but the differences is only half a question, and the gap is not getting bigger.

Among people younger than 30, the gap is closer to 1 question, on average. And it is not growing.

In summary:

  • In the ANES, there is no evidence of a growing gender gap in political alignment, party affiliation, or policy preferences.
  • In both the GSS and the ANES the gap in policy preferences is small and not growing.

The details of this analysis are in this Jupyter notebook.

What about economics?

Many of the questions in the previous section are about social issues. On economic issues some of the patterns are different. Here are 15 questions I selected that are mostly about federal spending.

Unlike the social issues, which trend liberal over time, responses to these questions are almost unchanged.

In the general population, the gender gap is about 0.5 questions and not growing.

Among young adults, the gender gap is smaller, and not growing.

On a total of 30 questions where conservatives and liberal disagree, men and women provide similar responses.

Think Python third edition!

Think Python third edition!

I am happy to announce the third edition of Think Python, which will be published by O’Reilly Media later this year.

You can read the online version of the book here. I’ve posted the Preface and the first four chapters — more on the way soon!

You can read the Early Release and pre-order from O’Reilly, or pre-order the third edition on Amazon.

Here is an excerpt from the Preface that explains…

What’s new in the third edition?

The biggest changes in this edition were driven by two new technologies — Jupyter notebooks and virtual assistants.

Each chapter of this book is a Jupyter notebook, which is a document that contains both ordinary text and code. For me, that makes it easier to write the code, test it, and keep it consistent with the text. For readers, it means you can run the code, modify it, and work on the exercises, all in one place.

The other big change is that I’ve added advice for working with virtual assistants like ChatGPT and using them to accelerate your learning. When the previous edition of this book was published in 2016, the predecessors of these tools were far less useful and most people were unaware of them. Now they are a standard tool for software engineering, and I think they will be a transformational tool for learning to program — and learning a lot of other things, too.

The other changes in the book were motivated by my regrets about the second edition.

The first is that I did not emphasize software testing. That was already a regrettable omission in 2016, but with the advent of virtual assistants, automated testing has become even more important. So this edition presents Python’s most widely-used testing tools, doctest and unittest, and includes several exercises where you can practice working with them.

My other regret is that the exercises in the second edition were uneven — some were more interesting than others and some were too hard. Moving to Jupyter notebooks helped me develop and test a more engaging and effective sequence of exercises.

In this revision, the sequence of topics is almost the same, but I rearranged a few of the chapters and compressed two short chapters into one. Also, I expanded the coverage of strings to include regular expressions.

A few chapters use turtle graphics. In previous editions, I used Python’s turtle module, but unfortunately it doesn’t work in Jupyter notebooks. So I replaced it with a new turtle module that should be easier to use. Here’s what it looks like in the notebooks.

Finally, I rewrote a substantial fraction of the text, clarifying places that needed it and cutting back in places where I was not as concise as I could be.

I am very proud of this new edition — I hope you like it!

The Political Gender Gap is Not Growing

The Political Gender Gap is Not Growing

In a previous article, I used data from the General Social Survey (GSS) to see if there is a growing gender gap among young people in political alignment, party affiliation, or political attitudes. So far, the answer is no.

  • Young women are more likely than men to say they are liberal by 5-10 percentage points. But there is little or no evidence that the gap is growing.
  • Young women are more likely to say they are Democrats. In the 1990s, the gap was almost 20 percentage points. Now it is only 5-10 percentage points. So there’s no evidence this gap is growing — if anything, it is shrinking.
  • To 15 questions related to policies and attitudes, young men give slightly more conservative responses than women, on average, but the gap is small and consistent over time — there is no evidence it is growing.

Ryan Burge has done a similar analysis with data from the Cooperative Election Study (CSE). Looking at stated political alignment, he finds that young women are more likely to say they are liberal by 5-10 percentage points. But there is no evidence that the gap is growing.

That leaves one other long-running survey to consider, the American National Election Studies (ANES). I have been meaning to explore this dataset for a long time, so this project is a perfect excuse.

This Jupyter notebook shows my analysis of alignment and party affiliation. I’ll get to beliefs and attitudes next week.

Alignment

This figure shows the percent who say they are liberal minus the percent who say they are conservative, for men and women ages 18-29.

It looks like the gender gap in political alignment appeared in the 1980s, but it has been nearly constant since then.

Affiliation

This figure shows the percent who say they are Democrats minus the percent who say they are Republicans, for men and women ages 18-29.

The gender gap in party affiliation has been mostly constant since the 1970s. It might have been a little wider in the 1990s, and might be shrinking now.

So what’s up with Gallup?

The results from GSS, CES, and ANES are consistent: there is no evidence of a growing gender gap in alignment, affiliation, or attitudes. So why does the Gallup data tell a different story?

Here’s the figure from the Financial Times article again, zooming in on just the US data.

First, I think this figure is misleading. As explained in this tweet, the data here have been adjusted by subtracting off the trend in the general population. As a result, the figure gives the impression that young men now are more likely to identify as conservative than in the past, and that’s not true. They are more likely to identify as liberal, but this trend is moving slightly slower than in the general population.

But misleading or not, this way of showing the data doesn’t change the headline result, which is that the gender gap in this dataset has grown substantially, from about 10 percentage points in 2010 to about 30 percentage points now.

On Twitter, the author of the FT article points out that one difference is that the sample size is bigger for the Gallup data than the datasets I looked at — and that’s true. Sample size explains why the variability from year to year is smaller in the Gallup data, but it does not explain why we see a big trend in the Gallup data that does not exist at in the other datasets.

As a next step, I would ideally like to access the Gallup data so I can replicate the analysis in the FT article and explore reasons for the discrepancy. If anyone with access to the Gallup data can and will share it with me, let me know.

Barring that, we are left with two criteria to consider: plausibility and preponderance of evidence.

Plausibility: The size of the changes in the Gallup data are at least surprising if not implausible. A change of 20 percentage points in 10 years is unlikely, especially in an analysis like this where we follow an age group over time — so the composition of the group changes slowly.

Preponderance of evidence: At this point see a trend in one analysis of one dataset, and no sign of that result in several analyses of three other similar datasets.

Until we see better evidence to support the surprising claim, it seems most likely that the gender gap among young people is not growing, and is currently no larger than it has been in the past.

Political Alignment, Affiliation, and Attitudes

Political Alignment, Affiliation, and Attitudes

Is there a growing gender gap in the U.S?

Alignment

A recent article in the Financial Times suggests that among young people there is a growing gender gap in political alignment on a spectrum from liberal to conservative.

In last week’s post, I tried to replicate this result using data from the General Social Survey. I generated the following figure, which shows the percentage of liberals minus the percentage of conservatives from 1988 to 2021, among people 18 to 29 years old. The analysis is in this Jupyter notebook.

Women are more likely to say they are liberal by 5-10 percentage points. But there is little or no evidence that the gap is growing.

Party Affiliation

This figure shows the percentage of Democrats minus the percentage of Republicans from 1988 to 2021. The analysis is in this Jupyter notebook.

Women are more likely than men to say they are Democrats. In the 1990s, the gap was almost 20 percentage points. Now it is only 5-10 percentage points. So there’s no evidence this gap is growing — if anything, it is shrinking.

Attitudes and beliefs

To quantify political attitudes, I will take advantage of a method I used in Chapter 12 of Probably Overthinking It. In the General Social Survey, I chose 15 questions where there is the biggest difference in the responses of people who identify as liberal or conservative. Then I estimated the number of conservative responses from each respondent.

The following figure shows the average number of conservative responses for young men and women since 1974. The analysis is in this Jupyter notebook.

Men give slightly more conservative responses than women, on average, but the gap is small and consistent over time — there is no evidence it is growing.

In summary, GSS data provides no support for the claim that there is a growing gender gap in political alignment, affiliation, or attitudes.

Extremes, outliers, and GOATS

Extremes, outliers, and GOATS

The video from my PyData Global 2023 talk, Extremes, outliers, and GOATS, is available now:

The slides are here.

There are two Jupyter notebooks that contain the analysis I presented:

Here’s the abstract:

The fastest runners are much faster than we expect from a Gaussian distribution, and the best chess players are much better. In almost every field of human endeavor, there are outliers who stand out even among the most talented people in the world. Where do they come from?

In this talk, I present as possible explanations two data-generating processes that yield lognormal distributions, and show that these models describe many real-world scenarios in natural and social sciences, engineering, and business. And I suggest methods — using SciPy tools — for identifying these distributions, estimating their parameters, and generating predictions.

This talk is based on Chapter 4 of Probably Overthinking It. If you liked the talk, you’ll love the book 🙂

Thanks to the organizers of PyData Global and NumFOCUS!

Is the Ideology Gap Growing?

Is the Ideology Gap Growing?

This tweet from John Burn-Murdoch links to an article in the Financial Times (FT), “A new global gender divide is emerging”, which includes this figure:

The article claims:

In the US, Gallup data shows that after decades where the sexes were each spread roughly equally across liberal and conservative world views, women aged 18 to 30 are now 30 percentage points more liberal than their male contemporaries. That gap took just six years to open up.

The figure says it is based on General Social Survey data and the text says it’s based on Gallup data, so I’m not sure which it is. UPDATE: In this tweet Burn-Murdoch explains that the figure shows Gallup data, backfilled with GSS data from before the Gallup series began.

And I don’t know what it means that “All figures are adjusted for time trend in the overall population”. UPDATE: In this tweet, Burn-Murdoch explains that the adjustment mentioned in the figure is to subtract off the overall trend. In the notebook for this article, I apply the same adjustment, but it does not change my conclusions.

Anyway, since I used GSS data in several places in Probably Overthinking It, this analysis did not sound right to me. So I tried to replicate the analysis with GSS data.

I conclude:

  • The GSS data does not look like the figure in the FT.
  • Women are a more likely to say that they are liberal, by 5-10 percentage points.
  • The only evidence that the gap is growing depends entirely on a data point from 2022 that is probably an error.
  • If we drop the 2022 data and apply moderate smoothing, we see no evidence that the gap is growing.

Most of the functions in this notebook are the ones I used to write Probably Overthinking It. All of the notebooks for that book are available in this repository.

Click here to run this notebook on Colab

GSS Data

I’m using data from the General Social Survey (GSS), which I previous cleaned in this notebook. The primary variable we’ll use is polviews, which asks:

We hear a lot of talk these days about liberals and conservatives. I’m going to show you a seven-point scale on which the political views that people might hold are arranged from extremely liberal–point 1–to extremely conservative–point 7. Where would you place yourself on this scale?

The points on the scale are Extremely liberal, Liberal, and Slightly liberal; Moderate; Slightly conservative, Conservative, and Extremely conservative.

I’ll lump the first three points into “Liberal” and the last three into “Conservative”

All respondents

The following figure shows the percentage who says they are liberal minus the percentage who say they are conservative, grouped by sex.

In the general population, women are more likely to say they are liberal by 5-10 percentage points. The gap might have increased in the most recent data, depending on how seriously we take the last two points in a noisy series.

Just young people

Now let’s select people under 30.

The trends here are pretty much the same as in the general population. Women are more likely to say they are liberal by 5-10 percentage points.

It’s possible that the gap has grown in the most recent data, but the evidence is weak and depends on how we draw a smooth curve through noisy data.

Anyway, there is no evidence the trend for men is going down — as in the FT graph — and the gap in the most recent data is nowhere near 30 percentage points.

With Sampling Weights

In the previous figures, I did not take into account the sampling weights, partly to keep the analysis simple and partly because I didn’t expect them to make much difference.

And I was mostly right, except for men in 2022 – and as we’ll see, there is almost certainly something wrong with that data point.

In this figure, the shaded area is the 90% CI of 101 weighted resamplings, the line is the median of the resamplings, and the points show the unweighted data. We only have weighted data since 1988, since that’s how far back the wtssps variable goes.

In most cases, the unweighted data falls in the CI of the weighted data, but for male respondents in 2022, the weighting moves the needle by almost 30 percentage points.

So something is not right there. I think the best option is to drop the 2022 data, but just for completeness, let’s see what happens if we apply some smoothing.

Resampling and smoothing

Here’s a version of the same plot with moderate smoothing, dropping the unweighted data.

You could argue that this figure shows evidence for an increasing gap, but the error bounds are very wide, and as we’ll see in the next figure, the entire effect is due to the likely error in the 2022 data.

Resampling and smoothing without 2022

Finally, here’s the analysis I think is the best choice, dropping the 2022 data for both men and women.

In summary:

  • Since the 1990s, both men and women have become more likely to identify as liberal.
  • Women are more likely to identify as liberal by 5-10 percentage points.
  • There is no evidence that the ideology gap is growing.