cs358 Lecture Notes
Week 4, Tuesday


Statistical inference and Monte-Carlo Simulation
------------------------------------------------

How can you tell whether an observation is meaningful or
random?

This is _the_ central question in statistics.

Examples:

	A coin (or die) is biased -- are there more heads
	  than there are supposed to be?

	Kerill's Chert-Obsidian data -- is the composition
          of blades changing over time

	SEC violations -- every time Bob buys something, it
	  goes up

	Sexual dimorphism -- the men I know are taller than
	  the women I know, mostly


Classical statistical technique
-------------------------------

1) assume that you are wrong and that the results are random

2) build a probablistic model of the system

3) use analysis to calculate the probability of the data
   you saw, within your probablistic model

4) if the calculated probability is small, you reject the
   assumption that you are wrong.


Difficulties
------------

FORMULATING THE QUESTION...

	    what were the chances of seeing that?

Well, zero or one, depending on what you mean.

Instead, you have to define some notion of weirdness and
ask: what were the chances of seeing something as weird
as that, if there were no underlying cause (other than
chance).

Additional difficulties...

1) arbitrary threshhold on probability

2) probablistic model may be wrong

3) not all probablistic models can be analyzed

4) not clear how to phrase your new beliefs based on the data
   (unsatisfying to say "we conclude that the null hypothesis
   is false)

Alternatives
------------

1) Bayesian statistics: all knowledge about the world is
   probablistic.  Rather than accept or reject hypotheses,
   you modify your distributions according to data
   you observe.

   Example:

   a) start with an assumption about the way human heights
      are distributed

   b) collect some data about male and female heights

   c) update your "belief" about male heights using the male
      data, and conversely  (the update uses Bayes' theorem,
      hence the name)

   d) now you have a belief about the distribution of heights
      for males and females


2) Monte-carlo simulation: rather than trying to analyze
   everything, use computers to run many simulations of
   random systems, and look at the distribution of outcomes.

   Advantage: no need to do math!

   Because you don't have to analyze the system, you don't
   have to make as many simplifying assumptions in your
   probablistic model.

   Example:   coin-toss -- run actual trials rather than
                           figure out the binomial distribution

	      Kerill's data -- coin toss model

	      human heights -- no need to assume that heights
	                       are distributed normally; you
			       can use the actual observed
                               distribution

	      SEC violations -- need a way to generate realistic
		                random time series of price.


BIG DANGER
----------

    often people get distracted looking at approximations
    in the calculation of probability, or focus on choosing just
    the right theshhold, and forget that their probability model
    is just a model!

Examples:

coin-toss: actually, in this case the probability model is
	   pretty good

human heights: sampling errors; racial and geographical factors

SEC violations: what if your model of market fluctuations is wrong?
    basic models tend to underestimate the prob. of large moves
    likely to convict Bob wrongly!

Kerill's data: underlying assumption that the knives that
	 happened to wind up buried at a particular location
	 represent the proportion of materials in circulation

	 what if the dig happens to be the site of a chert
	 knife maker?

	 what if more valuable knives are less likely to wind
	 up buried?