cs358 Lecture Notes
Week 4, Thursday


Statistical inference and Bayesian Statistics
---------------------------------------------

Two urns.  One contains 75% red marbles and 25% black
The other is the other way around.

Choose an urn at random.  Pull out a marble.  It's red.
Pull out another marble.  It's red.

You want to know which urn it is.

Classical approach
------------------

State a hypothesis:  this is the red urn.

Assume you're wrong:  assume it's the black urn.

Calculate the probability of seeing what you saw, given
	  the null hypothesis.

What are the odds of pulling two red marbles from
the black urn?  1 in 4 times 1 in 4 = 1 in 16 = 6.25%

That's more than 5%, so the results are not statistically
significant, so you cannot reject the null hypothesis.

In classical stats, that means that you have NO MORE INFORMATION
after the experiment than you did before.

This contradicts our intuition, which says that there is
a pretty good chance we chose the red urn.

Bayes theorem
-------------

	Pr{A and B} = Pr {A} * Pr {B | A}
		    = Pr {B} * Pr {A | B}

A = drawing two red marbles
B = we chose the red urn

We know Pr{A and B} = odds of choosing read and then drawing two red
                    = 1/2 (3/4 * 3/4)

We know Pr{A} = odds of choosing black and then drawing two reds
              + odds of choosing red and then drawing two red

	      = 1/2 (1/4 * 1/4)
              + 1/2 (3/4 * 3/4)

Then Pr {B|A} = Pr{A and B} / Pr{A} = 90%

Before the experiment, there was a 50% chance that we had
chosen the red urn.

We used the data to UPDATE our beliefs.  We now believe
that there is a 90% chance we chose the red urn.

This result is consistent with our intuition, and yields
a much stronger statement than the classical analysis.

Also, it does not depend on an arbitrary threshold of
significance.

Unfortuntely, it does depend on prior beliefs.

Subjectivity
------------

Let's say I choose an urn and you think I chose the black
one (with 90% certainty) and I think I chose the red one
(with 90% certainty).

I try to prove it to you by drawing marbles, and sure enough,
I draw two red ones.

What is your new, updated opinion?  50%

What is my new, update opinion?     98.78%

The good news is that the data moved us both in the same
direction.  The bad news is that we still don't agree on
which urn it is (or at least we have different degrees of
confidence).


Which is better?
----------------

Subjectivity is the bugaboo of classists.

But from the Bayesian's point of view it is inevitable.
It's better to confront subjectivity explicitly than
to rely on illusory objectivity.

Also, prior beliefs are relevant -- you can't ignore them.

Imagine I claim that I can predict coin tosses.  You don't
believe me.  I toss a coin 5 times and predict it right every
time.  The p-value is 3.125%  Do you believe me?

For example: in Kerill's data, the p value was about 0.25
But there are a lot of additional reasons to believe
his hypothesis:

   1) obsidian is a better material, but it was more
      expensive and it had to be imported

   2) during the period in question, the city was becoming
      more wealthy

   3) also, they were travelling more and trading more

Even though the p-value is quite high, we are inclined to
believe Kerill and not believe me.

These beliefs are consistent with Bayesian thinking and
in conflict with classical thinking.

Are they in conflict with scientific thinking?


Science and persuasion
----------------------

It is sometimes claimed that the thing that distinguishes
science from non-science is the possibility of persuasion.

If you and I disagree on an empirical question, there might
be an experiement we could perform, on the basis of which
one of us would be forced to change our view (or abandon
logic).

In the classical view, this is true.

In the Bayesian view, this might be true, if

1) our prior beliefs are non zero (non-one)

2) the evidence is sufficient


If you think the probability that I chose the red urn
is zero, then it doesn't matter how many red balls I
pull out, you will never change your mind.

If you think the probability that I can predict coin tosses
is zero, then it doesn't matter how many I get right,
you will never change your mind.

But in either case, if your prior belief is any non-zero
value, then eventually I can persuade you.

THAT IS WHAT SCIENTISTS MEAN WHEN THEY TALK ABOUT KEEPING
AN OPEN MIND.

To say that "anything is possible" does not mean that anything
is likely; it just means that everything should have a non-zero
probability so that we might be persuaded by sufficient
evidence.

Think about your own beliefs regarding ESP.  What is your
current level of belief?  What evidence would it take to make
you change your mind?


Statistical power
-----------------

Power is the ability of a test to change people's minds.

In classical statistics it is the probability that the
null hypothesis will be rejected, assuming it is false.

In other words, the prob that the alternate hypothesis
will be accepted, given that it is true.

The two-ball test for urn color had no power, because no
matter what the outcome was, we would be unable to reject
the null hypothesis.

Assuming that we chose the red urn and drew three balls,
what is the probability that we will reject the null
(black urn) hypothesis?

	    pvalue

3r	    1.5%
2r 1b	    14%
1r 2b	    42%
   3b	    42%

We can only reject the null hypothesis if we get 3 red.

What is the chance that that will happen
(assuming the alternate hypothesis)?

42%

In Bayesland, the power of a test depends on the prior
belief of the person I am trying to persuade.  If the
prior belief is zero, then no test has any possibility
of changing his mind.  As the prior belief increases,
the "power" of the test increases.


Falsifiability
--------------

"Falsifiability" is Popper's notion of riskiness -- 
a hypothesis is falsifiable according to the existence
of cheap tests that have a high probability of refuting
the hypothesis, assuming that it's false.

p-value is the prob of accepting the hypothesis, assuming
that it's false.

falsifiability is the probability of refuting the hypothesis,
assuming that it's false: (1 - p-value).

Falsifiability provides a partial ordering on hypotheses.

H1:  easy test 90% chance of refuting  <--- winner
H2:  hard test 90% chance of refuting

H3:  easy test 90% chance of refuting  <--- winner
H4:  easy test 50% chance of refuting

H5:  easy test 50% chance of refuting  ???
H6:  hard test 90% chance of refuting  ???

The weak version of Popper's argument is that hypotheses
are scientific in proportion to their falsifiability.

If you claim that you chose the red urn, and you can only
afford to choose two marbles, then the claim is unfalsifiable,
because there is no outcome of an affordable test that would
refute your claim.

If you can afford to choose three marbles, then the claim
is falsifiable.

New technology (new test) can change the status of a hypothesis.

-----

The strong version says that hypotheses are useful in
proportion to their falsifiability.

The intuition behind the strong claim is that more precise
predictions are more useful, but more likely to be refuted.

I have been trying without success for a while to make
a rigorous connection between power, falsifiability, and
the usefulness of hypotheses.