cs358 Lecture Notes Week 6, Tuesday Random number generation ------------------------ These notes based on discussion in Knuth, Seminumerical Algorithms Uses: 1) simulation 2) statistical sampling 3) cryptography 4) monte carlo (integration, simulation) Example: how to find the volume of complex shapes 5) decision-making (lottery) 6) games, unpredictable programs Physical methods for generating ------------------------------- 1) ping pong balls 2) Geiger counters 3) time between keystrokes (caution) Problems with physical methods ------------------------------ 1) special hardware required 2) slow 3) unrepeatable (problem?) Solution: numerical algorithms that generate random-seeming numbers -------- They're not really random because they are produced by deterministic machines. Computers are non-deterministic at the level of electrons, like everything else in the universe. Computers are deterministic at the level of bits, which is possible because even if the behavior of an individual electron is not predictable, the aggregate behavior of many electrons is. 1) Called pseudo-random because they are deterministic and repeatable. 2) But they are "random enough" to be useful for many purposes. What do we mean when we say that a number is random? Random sequences ---------------- Pick a number between 1 and 10. Say 7. Is that a random number? What does that mean? 1) Probably doesn't mean anything to say that a single number is random. Randomness may be a property of a process, or of a sequence of numbers. We might say that a number is random if we know that it came from a random process. We might look at a sequence of numbers and decide that they are random, even if we don't know where they came from. I am going to focus on the latter. How can we examine a sequence of numbers and access how random it is? What properties do we want? a) equidistribution -- all values equally likely b) no history -- no relationship between subsequent values More concretely, in a sequence of bits, there should be the same number of 0s and 1s, and there should be no way to predict the next bit based on its predecessors. k-distribution -------------- To say that a binary sequence is k-distributed is to say that all subsequences of k digits are equally likely. equidistribution is the same thing as 1-distribution, but it is clearly not enough for randomness. Example: 101010101010101010... Has the same number of 1s and 0s, and it has lots of 10s and lots of 01s, but no 11 and no 00. If all pairs are equally likely, that's 2-distribution. Example: 0011 0011.. Has all pairs with equal frequency, but no 111 or 000. k-distribution implies that if I know k-1 values, I cannot predict the kth value any better than chance. But if I know k values and the sequence is not k+1 distributed, then there is a non-50% chance that the next value is 0 or 1. This kind of unpredictability is clearly one of the things we want in a random sequence. Higher values of k seem to be "more random" because they imply that we need more information before we can start predicting. If something is infinity-distributed, that implies that no matter how much information we have, we can never predict the next bit better than chance. Is that the same thing as random? Is there something else that mean mean by "random" that an infinity-distributed sequence doesn't have? Can a deterministic process produce an inf-dist sequence? Yes! BIT 5 (1965) 246-250. (m, k) distribution ------------------- Some k-distributed sequences are better than others. 0001 0001 1101 1101 ... (period = 16) 3-distributed, but even-indexed elements are 75% zeros and odd-indexed elements are 75% ones If the sequence is (m, k)-distributed, then for all m between 1 and k, we can take every kth element and the resulting sequence will be k-distributed. Cool fact: something that is inf-dist is (m, k)-dist for all m and all k. Modulus ------- In math land, there is "congruence mod m" and "reduction mod m" x == y (mod m) means that x and y differ by a multiple of m means that x and y map onto the same element of the mod m group x = y % m means "reduce y mod m and set x to that value" reduction means adding or subtracting a multiple of m so that the result is between 0 and m-1 means find the remainder when y is divided by m In programming languages, the modulus operator reduces mod m. Linear congruential generators (LCG) ----------------------------- not random bits any more, random #s between 0 and m-1 Xn+1 = (A Xn + c) % m m, modulus a, multiplier c, increment X0, seed If a is reasonably large compared to m, then most of the time A Xn will be much bigger than m, and the result will be unpredictable (assuming you don't know the parameters) Like spinning the big wheel many times on The Price is Right. Example: m=10, X0 = a = c = 7 7 6 9 0 7 6 9 0 ... 1) all LCG's are periodic, since Xn+1 = f(Xn) at best, period = m (cycle through all possibilities) at worst, period = 1 f(X*) = X* 2) if period=m, LCG is 1-distributed, since all values will appear exactly once per period no LCG is 2-distributed, since each value has only one possible successor (instead of the m possible successors it should have) Are some bits better than others? --------------------------------- Often LCGs are used to generate numbers between 0 and (2^n)-1, and then the n bits are broken up and used for various things. There is a common rule of thumb that says that the leftmost bits are "more random" than the rightmost bits. What does that mean? By selecting the rightmost digit, we are in effect taking the Xn modulus some d, where d is a power of 2. So the sequence we report is Yn = Xn % d Looking at a value of Yn, we cannot tell which Xn it came from. Thus there are many possible Yn+1 Depending on d and m, the Yn could be 2-distributed, or even 3-distributed... But if m % d = 0, it doesn't work out that way. Xn+1 == a Xn + c (mod m) Xn+1 = a Xn + c - qm where qm is some multiple of m Yn+1 = Xn+1 % d = (a Xn + c) % d qm drops out because m % d = 0 Xn == Yn (mod d) Xn = Yn - q'd The Xn differ from the Yn only by multiples of d So we can substitute Yn for Xn without affecting congruence mod d. Yn+1 - q'd = (a (Yn - q''d) + c) % d Yn+1 == a Yn + c (mod d) Conclusion: if d divides m, the Yn are a LCG with modulus d Therefore, not only are they 1-distributed, but their period is (at most) d, which is << m. What does that imply about a LCG with m = 2^n? The rightmost digit is an LCG with m=2. Ugh! Actual improvements ------------------- Shuffling: use one LCG to permute the elements of another 1) generate a table of 100 or so Xn 2) generate a Yn and use it to choose and report one of the Xn 3) generate the next Xn and add it to the table 4) go to step 2 period length = LCM of moduli of Xn and Yn In the C library... The random() function uses a non-linear additive feedback random number generator employing a default table of size 31 long integers to return successive pseudo-random num- bers in the range from 0 to RAND_MAX. The period of this random number generator is very large, approximately 16*((2**31)-1). Java provides the method Math.random() that returns a double between 0.0 and 1.0. Annoyingly, it does not indicate what the algorithm is, or even whether 1.0 is included in the range. Empirical testing ----------------- Given a RNG, how can you evaluate whether it is random enough? The following is an excellent web page that describes some of the common tests and reports the results of the C library RNG. http://www-ece.rice.edu/~bs/rng/test.html All tests have a common structure, which is to compute some property of a sequence and compare the distribution of that property with what would be expected from a truly random sequence. Example: frequency test basically forms a histogram and compares the histogram with the expected uniform distribution tests for 1-distribution pairs frequency test does the same thing with pairs tests for 2-distribution In both cases, the result is a histogram that is not a perfect match. How do you distinguish random variations from non-random variations? This is exactly the question of statistical inference we talked about last week, except that in this case random is good. Chi-square tests ---------------- 100 buckets, 10000 random numbers, you expect 100 in each bucket Compute the difference between the expected and actual values in each bucket. Square the differences and add them up. Squaring is a common trick for making sure that all differences are positive. Because it is analytic (unlike absolute value), it lends itself to analysis... For example, we can calculate the distribution of chi-squared values that would come from a truly random sequence. We can form a 95% interval. In other words, a truly random sequence will produce a chi-squared value in the interval 95% of the time and outside 5% of the time. Apply the test to a RNG 1000 times. It should "pass" 950 times and "fail" 50 times. That's the language they use on the web page, but the real test is how close you get to 95%. "Passing" all the time is just as bad as "failing" all the time. As usual, the 5% in this test is arbitrary. Actually, it might make more sense to plot the distribution of chi-squared values and compare it to the ideal distribution. Summary of reasons for chi-square --------------------------------- It takes a vector of numbers and reduces it to a single value. By analysis we can predict the distribution of that value. Other tests ----------- The other tests might seem arbitrary: run-length test: how long are the monotonically-increasing subsequences permutation test: divide sequence into chunks. In each chunk, replace the values with their ranks. Do a frequency test on the ranks. interval test: how long is the interval between repeated values (how would a simple LCG do on this?) coupon collector's test: form categories of values. See how many numbers you have to draw before you get one from each group. The point of these is that we have analytic models that predict the distributions of these things for a perfect random sequence, and we can use them for comparison. Alternative tests ----------------- Some tests are application-specific. They ask, in effect, "Is this sequence random enough for the application I have in mind?" One of my papers is a collaboration with someone who tests RNGs for purposes of simulated annealing. Random floating-point --------------------- So far we have been talking about generating random integers distributed between 0 and m-1. It is commonly useful to generate random floating-point values between 0.0 and 1.0 (usually not including 1.0) The usual technique is to divide by m. There are problems with this technique, but I will not get into them. If you are interested, you can read: http://www.sdsc.edu/~downey/rand/ Generating other distributions ------------------------------ Percentiles and CDFs What is a percentile? Median is the 50th percentile. CDF is a percentile plot, sideways. The number of people in the 99th percentile is the same as the number of people in all the other percentiles, by definition. So one way to generate random numbers from any distribution is to choose a random percentile and then find out what value corresponds to that percentile. If you know your stats, the means that you generate a random y between 0.0 and 1.0 and transform it: x = F-1 (y) Where F-1 is the inverse of the CDF of the distribution you are trying to generate. Example: exponential distribution F(x) = 1 - e^(-cx) F-1 = -ln (1-y) / c If y is a random number between 0.0 and 1.0, then 1-y is also a random number between 0.0 and 1.0.