Smoking causes cancer
Here’s a question posted on Reddit’s statistics forum:
The Centers for Disease Control and Prevention states on its website that “in the United States, cigarette smoking causes about 90% of lung cancers.” If S is the event “smokes cigarettes” and L is the event “has lung cancer,” then the probability 0.90 is expressed in probability notation as
- P(S and L).
- P(S | L).
- P(L | S).
Let’s consider a scenario that’s not exactly what the question asks about, but will help us understand the relationships among these quantities:
Suppose 20% of people smoke, so out of 1000 people, we have 200 smokers and 800 nonsmokers.
Suppose 1% of nonsmokers get lung cancer, so out of 800 nonsmokers, there would be 8 cases of lung cancer, 0 caused by smoking.
And suppose 20% of smokers get lung cancer, 19% caused by smoking and 1% caused by something else (same as the nonsmokers). Out of 200 smokers, there would be 40 cases of lung cancer, 38 caused by smoking.
In this scenario, there are a total of 48 cases of lung cancer, 38 caused by smoking. So smoking caused 38/48 cancers, which is 79%.
P(S and L) is 40 / 1000, which is 4%.
P(S | L) = 40 / 48, which is 83%.
P(L | S) = 40 / 200, which is 20%.
From this scenario we can conclude:
- The percentage of cases caused by smoking does not correspond to any of the listed probabilities, so the answer to the question is “None of the above”.
- In order to compute these quantities, we need to know the percentage of smokers and the risk ratio for smokers vs nonsmokers.
In reality, the relationships among these quantities are complicated by time: the percentage of smokers changes over time, and there is a long lag between smoking and cancer diagnosis.