Browsed by
Category: Uncategorized

The Frog Puzzle

The Frog Puzzle

Here’s a probability puzzle from a TED-Ed video called Can you solve the frog riddle? by Derek Abbott. It came up recently in this Reddit thread:

You’re stranded in a rainforest after accidentally eating a poisonous mushroom. To survive the poison, you need to lick a certain species of frog. Only female frogs produce the antidote. Male and female frogs occur in equal numbers and look identical, but male frogs have a distinctive croak.

You see one frog alone on a tree stump. In another direction, you hear the croak of a male frog coming from a clearing with two frogs. You can’t tell which one made the sound.

You only have time to go to one place. What are your chances of survival if you go to the clearing and lick both frogs? What if you go to the lone frog?

The second question is relatively easy: if we assume that you are equally likely to see a male or female frog, the probability is 50% that the lone frog is female.

The first question depends on how we interpret the puzzle. In particular, it hinges on the word “distinctive” – does that mean:

  • Only male frogs croak, and the sound is distinguishable from background noises, or
  • Both male and female frogs croak, but the male croak is distinguishable from the female croak.

Based on the answer presented in the video, the first meaning is intended. So we’ll start by solving that version.

But the second meaning makes the problem a little harder, so we’ll solve that one, too.

Only Male Frogs Croak

To solve the intended version of the puzzle, we’ll assume

  • Only male frogs croak, and
  • When two frogs appear together, their sexes are independent.

So we’ll start with a prior where all two-frog combinations are equally likely.

from sympy import Rational

hypo = ['FF', 'FM', 'MF', 'MM']
prior = Rational(1)

Now let’s think about the likelihood of the data under each scenario. In the video, the solution is based on these assumptions:

  • If both frogs are female, the probability of hearing the male croak is 0.
  • If either frog is male, the probability that one of them croaks is 1.
likelihood = [0, 1, 1, 1]

I’ll use a BayesTable to compute the posterior probability for each scenario.

import pandas as pd
import numpy as np

class BayesTable(pd.DataFrame):
    def __init__(self, hypo, prior=1, **options):
        columns = ['prior', 'likelihood', 'unnorm', 'posterior']
        super().__init__(index=hypo, columns=columns, **options)
        self.prior = prior
    
    def update(self, likelihood):
        self.likelihood = likelihood
        self.unnorm = self.prior * self.likelihood
        nc = self.unnorm.sum()
        self.posterior = self.unnorm / nc
table = BayesTable(hypo, prior)
table.update(likelihood)
table
priorlikelihoodunnormposterior
FF1000
FM1111/3
MF1111/3
MM1111/3

From the table, we can extract the posterior probability that both frogs are male.

from sympy import init_printing
init_printing(use_latex=False)
table.posterior['MM']
1/3

With these assumptions, the probability 1/3 that both frogs are male (and you die), so the probability is 2/3 that at least one is female (and you live).

And that’s the answer in the video.

Poisson (not Poison) Frogs

But is that the right likelihood? Suppose frogs are equally likely to croak at any instant in time, so their croaks follow a Poisson process. If we assume that these croaking processes are independent, two frogs would be more likely to croak, during a given interval, than one.

If the interval is much longer than the average time between croaks, the probability that either frog croaks approaches 1, which is consistent with the previous solution.

But if the interval is short – as it might be if you were deciding whether to approach the first frog – the probability of hearing a croak would be double if there are two male frogs rather than one.

In that case, the likelihood of the data would be:

half = Rational(1, 2)
likelihood = [0, half, half, 1]

And here are the posterior probabilities:

table = BayesTable(hypo, prior)
table.update(likelihood)
table
priorlikelihoodunnormposterior
FF1000
FM11/21/21/4
MF11/21/21/4
MM1111/2

With Poisson frogs and a short interval, the probability of two male frogs is 1/2, so it doesn’t matter whether you approach the lone frog or the pair of frogs.

Female Frogs Croak, Too

Now let’s think about the other interpretation of the puzzle: suppose both male and female frogs croak, but we can distinguish one from the other. And suppose male and female frogs croak at different rates, but they are still independent.

Assume that male frogs croak at a rate of 1 per time unit, and female frogs at a rate of r per time unit. In that case, if we start listening at a random time, the probability that we hear a male frog first is 1 / (r+1) if there’s only one male frog, and 1 if there are two male frogs.

So the likelihood in this case is:

from sympy import symbols

r = symbols('r')
likelihood = [0, 1 / (r+1), 1 / (r+1), 1]

And here are the posteriors

table = BayesTable(hypo, prior)
table.update(likelihood)
table
priorlikelihoodunnormposterior
FF1000
FM11/(r + 1)1/(r + 1)1/((1 + 2/(r + 1))*(r + 1))
MF11/(r + 1)1/(r + 1)1/((1 + 2/(r + 1))*(r + 1))
MM1111/(1 + 2/(r + 1))

In this scenario, here’s the probability you die.

prob_die = table.posterior['MM']
prob_die.simplify()
r + 1
─────
r + 3

If female frogs don’t croak, we get the same answer as in the first scenario.

prob_die.subs({r: 0})
1/3

If male and female frogs croak at the same rate, the probability that both frogs are male is 1/2.

prob_die.subs({r: 1})
1/2

But if female frogs croak much more often, the fact that a male croaked first is strong evidence that both are male, so the posterior probability is close to 1.

prob_die.subs({r: 1000}).evalf()
0.998005982053839

Assortative Mating

Now suppose that when we see two frogs together, their sexes are not independent; specifically, let’s assume that the probability of a same-sex pair is p, so the probability of a mixed-sex pair is 1-p. In this scenario, the priors (before we hear the croak) are not equal.

p = symbols('p')
prior = [p, 1-p, 1-p, p]

Here are the posterior probabilities, assuming again that both male and female frogs, possibly at different rates.

likelihood = [0, 1 / (r+1), 1 / (r+1), 1]
table = BayesTable(hypo, prior)
table.update(likelihood)
table
priorlikelihoodunnormposterior
FFp000
FM1 – p1/(r + 1)(1 – p)/(r + 1)(1 – p)/((p + 2*(1 – p)/(r + 1))*(r + 1))
MF1 – p1/(r + 1)(1 – p)/(r + 1)(1 – p)/((p + 2*(1 – p)/(r + 1))*(r + 1))
MMp1pp/(p + 2*(1 – p)/(r + 1))
table.posterior['MM'].simplify()
 p⋅(r + 1) 
───────────
p⋅r - p + 2

If p=1/2, this simplifies to the previous scenario.

table.posterior['MM'].subs({p: half}).simplify()
r + 1
─────
r + 3

And if r=0 (female frogs don’t croak), we get the answer presented in the video.

table.posterior['MM'].subs({p: half, r: 0}).simplify()
1/3

But depending on the assumptions, the probability can be as low as 0

table.posterior['MM'].subs({p: 0, r: 1}).simplify()
0

Or as high as 1.

table.posterior['MM'].subs({p: 1, r: 0}).simplify()
1

Or anything in between. As is often the case with problems like these, the answer depends on a precise specification of the data-generating process.

Discussion

If all of this seems like more trouble than it’s worth, let me suggest a metacognitive shortcut for solving puzzles like this.

  1. Notice that in all probability puzzles, the answer is either 1/2 or 1/3.
  2. Also, the answer is always counterintuitive; otherwise it wouldn’t be a puzzle.
  3. Therefore, if your intuition says the answer is 1/2, it’s actually 1/3, and vice versa.

That might save you some time.

This notebook uses methods and materials from Think Bayes, second edition. If you like this sort of thing, you can read the whole book, and more examples, at allendowney.github.io/ThinkBayes2/.



Planning for your midlife crisis

Planning for your midlife crisis

Yesterday I presented a talk at ODSC East 2026, called “Counterfactual Analysis with Bayesian Models: What Drives the Life Expectancy Gap?” Here’s the abstract

Across nearly every country in the world, women live longer than men—but the size of this gap varies from about two years in some countries to more than twelve in others. What explains these differences, and how much of the gap can be closed?

In this talk, I present a practical approach to counterfactual analysis using Bayesian regression models. Using publicly available mortality data, we build a model that relates the life expectancy gap between men and women to differences in cause-specific death rates, including homicide, drug overdoses, traffic fatalities, smoking-related disease, and chronic illness.

The model generates posterior simulations that answer “what-if” questions. For example: How much smaller would the U.S. life expectancy gap be if homicide rates matched those in Western Europe?

The talk presents the workflow—from assembling global datasets to fitting interpretable Bayesian models with PyMC and generating counterfactual simulations. Attendees will learn how Bayesian models can support explainable modeling and analysis under uncertainty.

I think the talk went well, and we got some good questions at the end. There’s no recording, unfortunately, but my slides are here. And if you want to know more, I have a series of blog posts on Substack

The fifth and final post is on the way. In the meantime, here’s a quick post on a related topic.

Are you middle-aged?

Here’s a question from Reddit’s Stupid Questions forum:

I always thought middle age was in your 40s but since life expectancy is around 75 or so, wouldn’t it be about 35?

If life expectancy is 75, you might think the midpoint is half that, which is 37.5. But if 75 is life expectancy at birth and you survive to age 37.5, your life expectancy at that age is higher than 75. So 37.5 is not halfway!

If we really want to find the midpoint – and it wouldn’t be Probably Overthinking It if we didn’t – we have to find the age where your expected remaining lifetime equals your current age.

Let’s do it.

Data

From the Human Mortality Database I downloaded life tables for the United States, combined and broken down for men and women. The following function reads and cleans a table.

def read_life_table(filename):
    lt = pd.read_fwf(filename, skiprows=2, infer_nrows=200)
    lt['Age'] = lt['Age'].str.replace('+', '', regex=False).astype(int)
    return lt

Here are the first few rows of the combined table (see notes below for details).

blt = read_life_table('../data/bltper_1x1.txt')
blt.head()
YearAgemxqxaxlxdxLxTxex
0193300.061290.058610.25100000586195624608960960.90
1193310.009460.009410.509413988693696599398563.67
2193320.004350.004340.509325340593050590028963.27
3193330.003100.003100.509284828892704580723962.55
4193340.002390.002380.509256022192450571453561.74

We’ll also read the female and male tables.

flt = read_life_table('../data/fltper_1x1.txt')
mlt = read_life_table('../data/mltper_1x1.txt')

The tables include data from 1933 to 2024, so we’ll select the most recent data.

year = blt['Year'].unique()[-1]
table = blt.query('Year == @year').set_index('Age')

The column we’ll use is ex, which is life expectancy as a function of age.

age = table.index.to_series()
ex = table['ex']

Life expectancy at birth is 79 years, so the naive midpoint is 39.5.

ex[0], ex[0] / 2
(79.08, 39.54)

But at age 40, expected remaining lifetime is 41.1, so 39.5 is not the midpoint.

ex[39], ex[40]
(42.04, 41.12)

This plot shows life expectancy at each age, compared to age.

ex.plot(label='Remaining life expectancy')
age.plot(label='Age')
decorate(ylabel='Years',
        title='Remaining life expectancy vs age, United States 2024')
_images/a55e9cbde25b36d9c42e2ede2a78a827460d537f230b5cc4e7dfcb519a44bbc6.png

“Middle age” is where the lines cross, which we can compute by linear interpolation.

from scipy.interpolate import interp1d

inverse = interp1d(ex - age, age)
inverse(0)
array(40.58638743)

So the overall midpoint is 40.6 years. But as you might expect, it’s different for men and women. Let’s put the analysis we did in a function.

def get_midpoint(filename):
    lt = read_life_table(filename)
    year = lt['Year'].unique()[-1]
    table = lt.query('Year == @year').set_index('Age')

    age = pd.Series(table.index)
    ex = table['ex']

    inverse = interp1d(ex - age, age)
    return inverse(0)

And run it for men.

get_midpoint('../data/mltper_1x1.txt')
array(39.57142857)

And women.

get_midpoint('../data/fltper_1x1.txt')
array(41.56185567)

Men hit middle age at 39.6, women at 41.6.

The Gender Gap and Age

Finally, let’s see how the gender gap in life expectancy changes as a function of age.

ex_male = mlt.query('Year == @year').set_index('Age')['ex']
ex_female = flt.query('Year == @year').set_index('Age')['ex']
gap = ex_female - ex_male
gap.plot(label='')
decorate(ylabel='Years',
         title='Life expectancy gender gap vs age')
_images/2cf48be5acc971ad8f6973506c224767a25ef2e38bcf53c5b12b0f895c26ed11.png

At birth the life expectancy gap is close to five years. At age 100, it is close to zero.

But just looking at the gap might be misleading. For a more complete picture let’s also look at the ratio.

ratio = ex_female / ex_male
ratio.plot(label='')

decorate(ylabel='Ratio',
         title='Life expectancy gender ratio (female / male)')
_images/1a514d22895c3e18a767184700d6edf467975abb4b302afe044fd42c002ae21c.png

The life expectancy ratio tells a more complicated story.

  • At birth, the ratio is 1.06, which means female babies live 6% longer, on average.
  • Around age 80, the ratio peaks at nearly 1.14 – so between female and male octogenarians, we expect the women to live 14% longer.
  • At advanced ages, the ratio declines steeply and actually crosses over after age 100 – although the crossover is minimal and might not be statistically valid.

To interpret these results, we can think about the causes of death that contribute to age-specific death rates at different stages of life.

  • In young adulthood, the causes of death that contribute most to gender gaps include road traffic, homicide, accidental injury, drug use disorders.
  • In advanced adulthood, they include cancer, cardiovascular disease, respiratory disease, liver disease, diabetes, and suicide.

The causes that affect younger people have large gender gaps, but relatively low death rates. As people get older, these low-rate causes contribute less to age-specific death rates, and the higher-rate causes contribute more.

I think that’s a plausible explanation for the increasing ratio from age 0 to 80. For the decline that follows, I can only speculate that there is a selection effect: people who get to these advanced ages are likely to have better-than-average lifestyle histories (less smoking and drinking, better diet, more exercise) – and among people with better lifestyles, the gender gap is small.

Notes

Data credit: HMD. Human Mortality Database. Max Planck Institute for Demographic Research (Germany), University of California, Berkeley (USA), and French Institute for Demographic Studies (France). Available at [www.mortality.org].

Here are the columns of the 1×1 Period Life Tables:

  • Year: Calendar year to which the period life table refers.
  • Age: Exact age (x), in years, at the beginning of the interval ([x, x+1)).
  • mx: Central death rate at age (x):
  • qx: Probability of dying between ages (x) and (x+1):
  • ax: Average fraction of the interval lived by those who die in ([x, x+1)). Typically around 0.5 for most ages, lower for infants (reflecting higher early mortality within the year).
  • lx: Number of survivors at exact age (x), out of a radix (usually 100,000 births).
  • dx: Number of deaths between ages (x) and (x+1):
  • Lx: Person-years lived between ages (x) and (x+1), approximately
  • Tx: Total person-years remaining above age (x):
  • ex: Life expectancy at age (x):

The details are in this Jupyter notebook.

Attention, Chinese Readers

Attention, Chinese Readers

The Chinese edition of Probably Overthinking It is available now (also here)!

If you have the Chinese edition, there are two sections you won’t get to read — so I am including them here.

Here is an excerpt from Chapter 3, including the deleted paragraph:

In the Present

The women surveyed in 1990 rejected the childbearing example of their mothers emphatically. On average, each woman had 2.3 fewer children than her mother. If that pattern had continued for another generation, the average family size in 2018 would have been about 0.8. But it wasn’t.

In fact, the average family size in 2018 was very close to 2, just as in 1990. So how did that happen?

As it turns out, this is close to what we would expect if every woman had one child fewer than her mother. The following distribution shows the actual distribution in 2018, compared to the result if we start with the 1990 distribution and simulate the “one child fewer” scenario.

_images/ddb1f82d657fad8171d5c400c9a539aead9ac1a4f85b7460f3a4ae7f7cb00237.png

The means of the two distributions are almost the same, but the shapes are different. In reality, there were more zero- and two-child families in 1990 than the simulation predicts, and fewer one-child families. But at least on average, it seems like women in the U.S. have been following the “one child fewer” policy for the last 30 years.

The scenario at the beginning of this chapter is meant to be light-hearted, but in reality governments in many places and times have enacted policies meant to control family sizes and population growth. Most famously, China implemented a one-child policy in 1980 that imposed severe penalties on families with more than one child. Of course, this policy is objectionable to anyone who considers reproductive freedom a fundamental human right. But even as a practical matter, the unintended consequences were profound.

Rather than catalog them, I will mention one that is particularly ironic: while this policy was in effect, economic and social forces reduced the average desired family size so much that, when the policy was relaxed in 2015 and again in 2021, average lifetime fertility increased to only 1.3, far below the level needed to keep the population constant, near 2.1. Since then, China has implemented new policies intended to increase family sizes, but it is not clear whether they will have much effect. Demographers predict that by the time you read this, the population of China will probably be shrinking [UPDATE: It is.]. The consequences of the one-child policy are widespread and will affect China and the rest of the world for a long time.

And here is an excerpt from Chapter 5, including the deleted explanation.

Child mortality

Fortunately, child mortality has decreased since 1900. The following figure shows the percentage of children who die before age 5 for four geographical regions, from 1900 to 2019. These data were combined from several sources by Gapminder, a foundation based in Sweden that “promotes sustainable global development […] by increased use and understanding of statistics.”

_images/220c5c7e411ef012b610deab7f65ab6dbd0a010aa40d46318a9a14823f2a268e.png

In every region, child mortality has decreased consistently and substantially. The only exceptions are indicated by the vertical lines: the 1918 influenza pandemic, which visibly affected Asia, the Americas, and Europe; World War II in Europe (1939-1945); and the Great Leap Forward in China (1958-1962). In every case, these exceptions did not affect the long-term trend.

[COMMENT: I thought I was being diplomatic by referring generally to the Great Leap Forward — rather than the Great Chinese Famine or “Three Years of Great Famine” (三年大饥荒) — but apparently that was not enough.]

Although there is more work to do, especially in Africa, child mortality is substantially lower now, in every region of the world, than in 1900. As a result most people now are better new than used.

To demonstrate this change, I collected recent mortality data from the Global Health Observatory of the World Health Organization (WHO). For people born in 2019, we don’t know what their future lifetimes will be, but we can estimate it if we assume that the mortality rate in each age group will not change over their lifetimes.

Based on that simplification, the following figure shows average remaining lifetime as a function of age for Sweden and Nigeria in 2019, compared to Sweden in 1905.

_images/6a45a65e6a7b3201af0f74c7b7df4d57ce5c5976972ce0e69538a1914fa5cc5b.png

Since 1905, Sweden has continued to make progress; life expectancy at every age is higher in 2019 than in 1905. And Swedes now have the new-better-than-used property. Their life expectancy at birth is about 82 years, and it declines consistently over their lives, just like a light bulb.

Unfortunately, Nigeria has one of the highest rates of child mortality in the world: in 2019, almost 8% of babies died in their first year of life. After that, they are briefly better used than new: life expectancy at birth is about 62 years; however, a baby who survives the first year will live another 65 years, on average.

Going forward, I hope we continue to reduce child mortality in every region; if we do, soon every person born will be better new than used. Or maybe we can do even better than that.