cs358 Lecture Notes Week 4, Thursday Statistical inference and Bayesian Statistics --------------------------------------------- Two urns. One contains 75% red marbles and 25% black The other is the other way around. Choose an urn at random. Pull out a marble. It's red. Pull out another marble. It's red. You want to know which urn it is. Classical approach ------------------ State a hypothesis: this is the red urn. Assume you're wrong: assume it's the black urn. Calculate the probability of seeing what you saw, given the null hypothesis. What are the odds of pulling two red marbles from the black urn? 1 in 4 times 1 in 4 = 1 in 16 = 6.25% That's more than 5%, so the results are not statistically significant, so you cannot reject the null hypothesis. In classical stats, that means that you have NO MORE INFORMATION after the experiment than you did before. This contradicts our intuition, which says that there is a pretty good chance we chose the red urn. Bayes theorem ------------- Pr{A and B} = Pr {A} * Pr {B | A} = Pr {B} * Pr {A | B} A = drawing two red marbles B = we chose the red urn We know Pr{A and B} = odds of choosing read and then drawing two red = 1/2 (3/4 * 3/4) We know Pr{A} = odds of choosing black and then drawing two reds + odds of choosing red and then drawing two red = 1/2 (1/4 * 1/4) + 1/2 (3/4 * 3/4) Then Pr {B|A} = Pr{A and B} / Pr{A} = 90% Before the experiment, there was a 50% chance that we had chosen the red urn. We used the data to UPDATE our beliefs. We now believe that there is a 90% chance we chose the red urn. This result is consistent with our intuition, and yields a much stronger statement than the classical analysis. Also, it does not depend on an arbitrary threshold of significance. Unfortuntely, it does depend on prior beliefs. Subjectivity ------------ Let's say I choose an urn and you think I chose the black one (with 90% certainty) and I think I chose the red one (with 90% certainty). I try to prove it to you by drawing marbles, and sure enough, I draw two red ones. What is your new, updated opinion? 50% What is my new, update opinion? 98.78% The good news is that the data moved us both in the same direction. The bad news is that we still don't agree on which urn it is (or at least we have different degrees of confidence). Which is better? ---------------- Subjectivity is the bugaboo of classists. But from the Bayesian's point of view it is inevitable. It's better to confront subjectivity explicitly than to rely on illusory objectivity. Also, prior beliefs are relevant -- you can't ignore them. Imagine I claim that I can predict coin tosses. You don't believe me. I toss a coin 5 times and predict it right every time. The p-value is 3.125% Do you believe me? For example: in Kerill's data, the p value was about 0.25 But there are a lot of additional reasons to believe his hypothesis: 1) obsidian is a better material, but it was more expensive and it had to be imported 2) during the period in question, the city was becoming more wealthy 3) also, they were travelling more and trading more Even though the p-value is quite high, we are inclined to believe Kerill and not believe me. These beliefs are consistent with Bayesian thinking and in conflict with classical thinking. Are they in conflict with scientific thinking? Science and persuasion ---------------------- It is sometimes claimed that the thing that distinguishes science from non-science is the possibility of persuasion. If you and I disagree on an empirical question, there might be an experiement we could perform, on the basis of which one of us would be forced to change our view (or abandon logic). In the classical view, this is true. In the Bayesian view, this might be true, if 1) our prior beliefs are non zero (non-one) 2) the evidence is sufficient If you think the probability that I chose the red urn is zero, then it doesn't matter how many red balls I pull out, you will never change your mind. If you think the probability that I can predict coin tosses is zero, then it doesn't matter how many I get right, you will never change your mind. But in either case, if your prior belief is any non-zero value, then eventually I can persuade you. THAT IS WHAT SCIENTISTS MEAN WHEN THEY TALK ABOUT KEEPING AN OPEN MIND. To say that "anything is possible" does not mean that anything is likely; it just means that everything should have a non-zero probability so that we might be persuaded by sufficient evidence. Think about your own beliefs regarding ESP. What is your current level of belief? What evidence would it take to make you change your mind? Statistical power ----------------- Power is the ability of a test to change people's minds. In classical statistics it is the probability that the null hypothesis will be rejected, assuming it is false. In other words, the prob that the alternate hypothesis will be accepted, given that it is true. The two-ball test for urn color had no power, because no matter what the outcome was, we would be unable to reject the null hypothesis. Assuming that we chose the red urn and drew three balls, what is the probability that we will reject the null (black urn) hypothesis? pvalue 3r 1.5% 2r 1b 14% 1r 2b 42% 3b 42% We can only reject the null hypothesis if we get 3 red. What is the chance that that will happen (assuming the alternate hypothesis)? 42% In Bayesland, the power of a test depends on the prior belief of the person I am trying to persuade. If the prior belief is zero, then no test has any possibility of changing his mind. As the prior belief increases, the "power" of the test increases. Falsifiability -------------- "Falsifiability" is Popper's notion of riskiness -- a hypothesis is falsifiable according to the existence of cheap tests that have a high probability of refuting the hypothesis, assuming that it's false. p-value is the prob of accepting the hypothesis, assuming that it's false. falsifiability is the probability of refuting the hypothesis, assuming that it's false: (1 - p-value). Falsifiability provides a partial ordering on hypotheses. H1: easy test 90% chance of refuting <--- winner H2: hard test 90% chance of refuting H3: easy test 90% chance of refuting <--- winner H4: easy test 50% chance of refuting H5: easy test 50% chance of refuting ??? H6: hard test 90% chance of refuting ??? The weak version of Popper's argument is that hypotheses are scientific in proportion to their falsifiability. If you claim that you chose the red urn, and you can only afford to choose two marbles, then the claim is unfalsifiable, because there is no outcome of an affordable test that would refute your claim. If you can afford to choose three marbles, then the claim is falsifiable. New technology (new test) can change the status of a hypothesis. ----- The strong version says that hypotheses are useful in proportion to their falsifiability. The intuition behind the strong claim is that more precise predictions are more useful, but more likely to be refuted. I have been trying without success for a while to make a rigorous connection between power, falsifiability, and the usefulness of hypotheses.