Browsed by
Month: June 2023

Go Get the Data

Go Get the Data

My mantra when I was working on Probably Overthinking It was “Go Get the Data.” If I wanted to use a result from prior work, I would get the data whenever possible and make my own visualization. Of course, that’s more work than copying and pasting a figure, but there are a lot of benefits. One is that I can often get newer data. Another is that I can check the results.

I was reminded of these benefits a few months ago when I was reading a very good book that I won’t name because I’m about to point out an error, and I don’t want to be a jerk about it. The book includes the following figure:

Source: A book I’m not identifying because the point of this example is not to name and shame.

The top line got my attention because I have worked on several projects with data from the General Social Survey (GSS), and this particular question is one of the examples in my Political Alignment Case Study. Plotting the fraction of people who think homosexuality is wrong, I generated this figure:

Source: Downey, Political Alignment Case Study.

These results are not the same (even accounting for the inverted y-axis).

So I checked the source, which is Steven Pinker’s The Better Angels of Our Nature, specifically this figure:

Source: Page 452 of Pinker, The Better Angels of Our Nature

It looks like Pinker’s figure is consistent with mine. So what went wrong with the first figure?

Here’s what I think happened: they wanted to invert the y axis to show the percentage who think homosexuality is not wrong. But instead of flipping the y axis, they seem to have flipped the x axis, reversing the arrow of time! If you compare the other lines, it looks like they have all been reversed left-for-right.

Now, this is just one error in an otherwise excellent book. But it’s a reminder that this kind of thing can be avoided, if you Go Get the Data.

Abstracts and keywords

Abstracts and keywords

As Probably Overthinking It approaches the finish line, there are just a few more tasks: I am working on the index and — as I have recently learned — I also have to write a 200-word abstract, a list of keywords for each chapter, and a 250-word abstract for the whole book. I have come to two conclusions:

  • There is no good software for making an index from a PDF document.
  • ChatGPT is pretty good at writing chapter abstracts.

By converting each chapter to a simple text file (no figures and tables) I was able to get most of them under the ~4000 token limit. Then I asked, “write a one-paragraph abstract for this chapter with no more than 180 words. write about the chapter in the third person”.

Here’s the unedited abstract for the last chapter:

The chapter titled “Chasing the Overton Window” explores the phenomenon of people considering themselves more conservative as they age, despite their views actually becoming more liberal over time. The chapter begins by examining the trends of self-identification and political views across different age groups. It then delves into the concept of the Overton window, which represents the range of acceptable public opinion on a given issue. The chapter explains how the center of public opinion has shifted toward liberalism over the past 50 years and how the relative terms “liberal” and “conservative” depend on the perceived center. Furthermore, it demonstrates that both liberals and conservatives have become more liberal over time. The chapter concludes by proposing an explanation for why people think they are becoming more conservative, even though their views are becoming more liberal, which involves the interplay of generational effects, the shifting center of public opinion, and the connotations associated with political labels.

ChatGPT June 10, 2023

It’s not great prose, but I think I can revise it into something acceptable without much effort.

Three of the chapters exceeded the token limit, so I asked for a summary of the first half, then a summary of the second half, then I asked, “Combine the following two abstracts into a single paragraph with no more than 180 words”. Here’s the combined abstract of Chapter 8:

This chapter delves into the distribution of natural and human-caused disasters, investigating their sizes, costs, prevalence, and characteristics within long-tailed distributions. Understanding the probabilities of major disasters is crucial for effective preparedness and response, despite the challenge of comprehending rare and large-scale events. By analyzing a dataset of 125 disasters, including hurricanes, earthquakes, floods, nuclear disasters, and terror attacks, the author demonstrates a pattern where doubling the rank of a disaster corresponds to halving its costs when plotted on a logarithmic scale. While exploring the limitations of the lognormal distribution in predicting the probabilities of large disasters, the author introduces Student’s t-distribution as a more suitable model for estimating the probabilities of extreme events. The chapter also examines lunar craters and their abundance and sizes, revealing the prevalence of long-tailed distributions and their connection to asteroid sizes. Additionally, it explores the occurrence of long-tailed distributions in stock market crashes and introduces the concept of black swans to emphasize their relevance in understanding rare and impactful events. Concluding the chapter, it discusses the challenges associated with predicting and comprehending rare, large events in a long-tailed world, with a specific focus on earthquake magnitudes and a comparison of prediction models.

ChatGPT June 10, 2023

Again, I think that’s editing distance away from acceptable — and a near perfect 198 words.

It does pretty well with keywords, too:

  1. Disasters
  2. Long-tailed distributions
  3. Probabilities
  4. Preparedness
  5. Response
  6. Natural events
  7. Human-made incidents
  8. Lognormal distribution
  9. Student’s t-distribution
  10. Black swans

To generate the abstract for the whole book, I pasted the chapter abstracts and keywords into a single document and asked, “the following are chapter abstracts for a book called ‘Probably Overthinking It’. write a one-paragraph abstract for the whole book, more than 200 words, but not more than 250 words. write about the book in the third person. not too much like an ad”

Here’s the unedited result:

“Probably Overthinking It” is a thought-provoking book that delves into various statistical and cognitive paradoxes, challenging common intuitions and shedding light on the complexities of human behavior and measurement. Through a series of captivating chapters, the author explores topics such as normalcy and weirdness, inspection paradoxes, family size and population growth, lognormal distributions, expected remaining time, Berkson’s paradox, low birthweight paradox, long-tailed distributions, fairness and fallacy in algorithmic decision-making, Simpson’s paradox, the moral circle, the evolution of societal values, and the shifting perceptions of political ideologies. By analyzing real-world data and employing statistical models, the book uncovers the nuances and contradictions that emerge when multiple factors interact, leading to counterintuitive outcomes. Readers will gain a deeper understanding of statistical biases, the limitations of measurement, and the challenges inherent in making accurate predictions and drawing meaningful conclusions. “Probably Overthinking It” encourages readers to question their assumptions, approach complex problems with skepticism, and develop a more nuanced and informed perspective on the intricacies of human behavior and the world around us.

ChatGPT June 10, 2023

Surprisingly, the book abstract is only 169 words, despite my prompt. In general, the word counts are not precise, and more often over than under (which is why I asked for 180, not 200).

So, I have some editing to do, but I’d say ChatGPT saved me at least a few hours — and spared me from exactly the kind of writing I dislike the most.