cs358 Lecture Notes
Week 6, Tuesday


Random number generation
------------------------

These notes based on discussion in Knuth, Seminumerical Algorithms

Uses:

1) simulation

2) statistical sampling

3) cryptography

4) monte carlo (integration, simulation)

   Example: how to find the volume of complex shapes

5) decision-making (lottery)

6) games, unpredictable programs


Physical methods for generating
-------------------------------

1) ping pong balls

2) Geiger counters

3) time between keystrokes (caution)


Problems with physical methods
------------------------------

1) special hardware required

2) slow

3) unrepeatable (problem?)


Solution: numerical algorithms that generate random-seeming numbers
--------

They're not really random because they are produced by 
deterministic machines.

Computers are non-deterministic at the level of electrons,
like everything else in the universe.

Computers are deterministic at the level of bits, which is
possible because even if the behavior of an individual
electron is not predictable, the aggregate behavior of many
electrons is.

1)  Called pseudo-random because they are deterministic and
    repeatable.

2)  But they are "random enough" to be useful for many purposes.

What do we mean when we say that a number is random?


Random sequences
----------------

Pick a number between 1 and 10.  Say 7.

Is that a random number?  What does that mean?

1) Probably doesn't mean anything to say that a single
   number is random.

   Randomness may be a property of a process, or
   of a sequence of numbers.

   We might say that a number is random if we know that it
   came from a random process.

   We might look at a sequence of numbers and decide that
   they are random, even if we don't know where they came
   from.

   I am going to focus on the latter.

   How can we examine a sequence of numbers and access how
   random it is?

   What properties do we want?

   a) equidistribution -- all values equally likely

   b) no history -- no relationship between subsequent values

   More concretely, in a sequence of bits, there should be the
   same number of 0s and 1s, and there should be no way to 
   predict the next bit based on its predecessors.


k-distribution
--------------

To say that a binary sequence is k-distributed is to say that
all subsequences of k digits are equally likely.

equidistribution is the same thing as 1-distribution, but
it is clearly not enough for randomness.

Example:      101010101010101010...

Has the same number of 1s and 0s, and it has lots of 10s
and lots of 01s, but no 11 and no 00.

If all pairs are equally likely, that's 2-distribution.

Example:      0011 0011..

Has all pairs with equal frequency, but no 111 or 000.

k-distribution implies that if I know k-1 values, I cannot
predict the kth value any better than chance.

But if I know k values and the sequence is not k+1 distributed,
then there is a non-50% chance that the next value is 0 or 1.

This kind of unpredictability is clearly one of the things
we want in a random sequence.

Higher values of k seem to be "more random" because they
imply that we need more information before we can start
predicting.

If something is infinity-distributed, that implies that no
matter how much information we have, we can never predict the
next bit better than chance.

Is that the same thing as random?  Is there something else
that mean mean by "random" that an infinity-distributed
sequence doesn't have?

Can a deterministic process produce an inf-dist sequence?
Yes!  BIT 5 (1965) 246-250.


(m, k) distribution
-------------------

Some k-distributed sequences are better than others.

     0001 0001 1101 1101 ...  (period = 16)

3-distributed, but even-indexed elements are 75% zeros
	       and odd-indexed elements are 75% ones

If the sequence is (m, k)-distributed, then for all m
between 1 and k, we can take every kth element and the
resulting sequence will be k-distributed.

Cool fact: something that is inf-dist is (m, k)-dist for
all m and all k.


Modulus
-------

In math land, there is "congruence mod m" and "reduction mod m"

   x == y (mod m)    means that x and y differ by a multiple of m
		     means that x and y map onto the same element
		     of the mod m group

   x = y % m         means "reduce y mod m and set x to that value"
		     reduction means adding or subtracting a multiple
		     of m so that the result is between 0 and m-1

		     means find the remainder when y is divided by m

In programming languages, the modulus operator reduces mod m.


Linear congruential generators (LCG)
-----------------------------

not random bits any more, random #s between 0 and m-1

Xn+1 = (A Xn + c) % m

m, modulus
a, multiplier
c, increment
X0, seed

If a is reasonably large compared to m, then most of the
time A Xn will be much bigger than m, and the result
will be unpredictable (assuming you don't know the
parameters)

Like spinning the big wheel many times on The Price is Right.

Example: m=10, X0 = a = c = 7

	 7 6 9 0 7 6 9 0 ...

1) all LCG's are periodic, since Xn+1 = f(Xn)

   at best, period = m (cycle through all possibilities)

   at worst, period = 1   f(X*) = X*

2) if period=m, LCG is 1-distributed, since all values
   will appear exactly once per period

   no LCG is 2-distributed, since each value has only
   one possible successor (instead of the m possible
   successors it should have)


Are some bits better than others?
---------------------------------

Often LCGs are used to generate numbers between 0 and (2^n)-1,
and then the n bits are broken up and used for various things.

There is a common rule of thumb that says that the leftmost
bits are "more random" than the rightmost bits.

What does that mean?

By selecting the rightmost digit, we are in effect taking
the Xn modulus some d, where d is a power of 2.

So the sequence we report is Yn = Xn % d

Looking at a value of Yn, we cannot tell which Xn it came
from.

Thus there are many possible Yn+1

Depending on d and m, the Yn could be 2-distributed, or
	  even 3-distributed...

But if m % d = 0, it doesn't work out that way.

    Xn+1 == a Xn + c (mod m)
    Xn+1 = a Xn + c - qm

where qm is some multiple of m

    Yn+1 = Xn+1 % d = (a Xn + c) % d     

qm drops out because m % d = 0

   Xn == Yn (mod d)
   Xn = Yn - q'd

The Xn differ from the Yn only by multiples of d
So we can substitute Yn for Xn without affecting
congruence mod d.

   Yn+1 - q'd = (a (Yn - q''d) + c) % d
   Yn+1 == a Yn + c (mod d)

Conclusion: if d divides m, the Yn are a LCG with modulus d

Therefore, not only are they 1-distributed, but their
period is (at most) d, which is << m.

What does that imply about a LCG with m = 2^n?

The rightmost digit is an LCG with m=2.  Ugh!


Actual improvements
-------------------

Shuffling: use one LCG to permute the elements of another

1) generate a table of 100 or so Xn

2) generate a Yn and use it to choose and report one of the Xn

3) generate the next Xn and add it to the table

4) go to step 2

period length = LCM of moduli of Xn and Yn

In the C library...

       The random() function uses a non-linear additive  feedback
       random  number generator employing a default table of size
       31 long integers to return successive  pseudo-random  num-
       bers  in the range from 0 to RAND_MAX.  The period of this
       random  number  generator  is  very  large,  approximately
       16*((2**31)-1).

Java provides the method Math.random() that returns a
double between 0.0 and 1.0.

Annoyingly, it does not indicate what the algorithm is,
or even whether 1.0 is included in the range.


Empirical testing
-----------------

Given a RNG, how can you evaluate whether it is random enough?

The following is an excellent web page that describes some
of the common tests and reports the results of the C library
RNG.

http://www-ece.rice.edu/~bs/rng/test.html

All tests have a common structure, which is to compute some
property of a sequence and compare the distribution of that
property with what would be expected from a truly random
sequence.

Example: frequency test basically forms a histogram and
compares the histogram with the expected uniform distribution

tests for 1-distribution

pairs frequency test does the same thing with pairs

tests for 2-distribution

In both cases, the result is a histogram that is not a
perfect match.

How do you distinguish random variations from non-random
variations?

This is exactly the question of statistical inference we
talked about last week, except that in this case random
is good.


Chi-square tests
----------------

100 buckets, 10000 random numbers, you expect 100 in each bucket

Compute the difference between the expected and actual
values in each bucket.

Square the differences and add them up.

Squaring is a common trick for making sure that all
differences are positive.  Because it is analytic (unlike
absolute value), it lends itself to analysis...

For example, we can calculate the distribution of chi-squared
values that would come from a truly random sequence.

We can form a 95% interval.  In other words, a truly random
sequence will produce a chi-squared value in the interval
95% of the time and outside 5% of the time.

Apply the test to a RNG 1000 times.  It should "pass" 950
times and "fail" 50 times.  That's the language they use
on the web page, but the real test is how close you get to
95%.

"Passing" all the time is just as bad as "failing" all the time.

As usual, the 5% in this test is arbitrary.

Actually, it might make more sense to plot the distribution
of chi-squared values and compare it to the ideal distribution.


Summary of reasons for chi-square
---------------------------------

It takes a vector of numbers and reduces it to a single value.

By analysis we can predict the distribution of that value.


Other tests
-----------

The other tests might seem arbitrary:

run-length test: how long are the monotonically-increasing
	         subsequences

permutation test: divide sequence into chunks.  In each chunk,
	    replace the values with their ranks.  Do a frequency
	    test on the ranks.

interval test: how long is the interval between repeated values
	 (how would a simple LCG do on this?)

coupon collector's test: form categories of values.  See how many
       numbers you have to draw before you get one from each group.


The point of these is that we have analytic models that
predict the distributions of these things for a perfect
random sequence, and we can use them for comparison.


Alternative tests
-----------------

Some tests are application-specific.  They ask, in effect,
"Is this sequence random enough for the application I have
in mind?"

One of my papers is a collaboration with someone who tests
RNGs for purposes of simulated annealing.


Random floating-point
---------------------

So far we have been talking about generating random integers
distributed between 0 and m-1.

It is commonly useful to generate random floating-point values
between 0.0 and 1.0 (usually not including 1.0)

The usual technique is to divide by m.

There are problems with this technique, but I will not get into them.
If you are interested, you can read:

http://www.sdsc.edu/~downey/rand/


Generating other distributions
------------------------------

Percentiles and CDFs

What is a percentile?

Median is the 50th percentile.  CDF is a percentile plot,
sideways.

The number of people in the 99th percentile is the same
as the number of people in all the other percentiles,
by definition.

So one way to generate random numbers from any distribution
is to choose a random percentile and then find out what
value corresponds to that percentile.

If you know your stats, the means that you generate a
random y between 0.0 and 1.0 and transform it:

       x = F-1 (y)

Where F-1 is the inverse of the CDF of the distribution
you are trying to generate.

Example:  exponential distribution  F(x) = 1 - e^(-cx)

	  F-1 = -ln (1-y) / c

If y is a random number between 0.0 and 1.0, then 
1-y is also a random number between 0.0 and 1.0.