Article 18371 of rec.puzzles: Newsgroups: rec.puzzles,news.answers Path: nntp-server.caltech.edu!elroy.jpl.nasa.gov!ames!haven.umd.edu!uunet!questrel!chris From: uunet!questrel!chris (Chris Cole) Subject: rec.puzzles FAQ, part 15 of 15 Message-ID: Followup-To: rec.puzzles Summary: This posting contains a list of Frequently Asked Questions (and their answers). It should be read by anyone who wishes to post to the rec.puzzles newsgroup. Sender: chris@questrel.com (Chris Cole) Reply-To: uunet!questrel!faql-comment Organization: Questrel, Inc. References: Date: Mon, 21 Sep 1992 00:09:54 GMT Approved: news-answers-request@MIT.Edu Expires: Sat, 3 Apr 1993 00:08:21 GMT Lines: 1411 Xref: nntp-server.caltech.edu rec.puzzles:18371 news.answers:3098 Archive-name: puzzles-faq/part15 Last-modified: 1992/09/20 Version: 3 ==> probability/coupon.s <== The problem is well known under the name of "COUPON COLLECTOR PROBLEM". The answer for n equally likely coupons is (1) C(n)=n*H(n) with H(n)=1+1/2+1/3+...+1/n. In the unequal probabilities case, with p_i the probability of coupon i, it becomes (2) C(n)=int_0^infty [1-prod_{i=1}^n (1-exp(-p_i*t))] dt which reduces to (1) when p_i=1/n (An easy exercise). In the equal probabilities case C(n)~n log(n). For a Zipf law, from (2), we have C(n)~n log^2(n). A related problem is the "BIRTHDAY PARADOX" familiar to people interested in hashing algorithms: With a party of 24 persons, you are likely (i.e. with probability >50%) to find two with the same birthday. The non equiprobable case was solved by: M. Klamkin and D. Newman, Extensions of the birthday surprise, J. Comb. Th. 3 (1967), 279-282. ==> probability/darts.p <== Peter throws two darts at a dartboard, aiming for the center. The second dart lands farther from the center than the first. If Peter now throws another dart at the board, aiming for the center, what is the probability that this third throw is also worse (i.e., farther from the center) than his first? Assume Peter's skilfulness is constant. ==> probability/darts.s <== Since the three darts are thrown independently, they each have a 1/3 chance of being the best throw. As long as the third dart is not the best throw, it will be worse than the first dart. Therefore the answer is 2/3. Ranking the three darts' results from A (best) to C (worst), there are, a priori, six equiprobable outcomes. possibility # 1 2 3 4 5 6 1st throw A A B B C C 2nd throw B C A C A B 3rd throw C B C A B A The information from the first two throws shows us that the first throw will not be the worst, nor the second throw the best. Thus possibilities 3, 5 and 6 are eliminated, leaving three equiprobable cases, 1, 2 and 4. Of these, 1 and 2 have the third throw worse than the first; 4 does not. Again the answer is 2/3. ==> probability/flips.p <== Consider a run of coin tosses: HHTHTTHTTTHTTTTHHHTHHHHHTHTTHT Define a success as a run of one H or T (as in THT or HTH). Use two different methods of sampling. The first method would consist of sampling the above data by taking 7 groups of three. This translates into the following sequences HHT, HTT, HTT, THT, TTT, HHH, and THH. In this sample there was one success, THT. The second method of sampling could be gotten by taking the groups in a serial sequence. Another way of explaining the method would be to take the tosses 1, 2, and 3 as the first set then tosses 2, 3, and 4 as the second set and so on to produce seven samples. The samples would be HHT, HTH, THT, HTT TTH, THT, and HTT, thus giving two success, HTH and THT. With these two methods what would be the expected value and the standard deviation for both methods? ==> probability/flips.s <== Case 1: Let us start with a simple case: Define success as T and consider sequences of length 1. In this case, the two sampling techniques are equivalent. Assuming that we are examining a truly random sequence of T and H. Thus if n groups of single sequences are considered or a sequence of length n is considered we will have the following statistics on the number of successes: Mean = n/2, and standard deviation (sd) = square_root(n)/2 Case 2: Define success as HT or TH: Here the two sampling techniques need to be distinguished: Sampling 1: Take "n" groups of two: Here probability of getting success in any group is 1/2 (TH and HT out of 4 possible cases). So the mean and the standard deviation is same as described in case 1. Sampling 2: Generate a sequence of length "n". It is easy to show that (n-1) samples are generated. The number of successes in this sequence is same as the number of T to H and H to T transitions. This problem is solved in John P. Robinson, Transition Count and Syndrome are Uncorrelated, IEEE Transactions on Information Theory, Jan 1988. I will just state the results: mean = (n-1)/2 and standard deviation = square_root(n-1)/2. In fact, if you want "n" samples in this case you need to generate length (n+1) sequence. Then the new mean and standard deviation are the same as described in Sampling 1. (replace n by n+1). The advantage in this sampling (i.e., sampling 2) is that you need only generate a sequence length of (n+1) to obtain n sample points as opposed to 2n (n groups of 2) in Sampling 1. OBSERVATION: The statistics has not changed. Case 3: Define success as THT and HTH. Sampling 1: This is a simple situation. Let us assume "n" samples need to be generated. Therefore, "n" groups of three are generated. The probability of a group being successful is 1/4 (THT and HTH out of 8 cases). Therefore mean and standard deviation are: mean= n/4 and sd= square_root(7n)/4 Sampling 2: This is not a simple situation. Let us generate a random sequence of length "n". There will be (n-2) samples in this case. Use an approach similar to that followed in the Jan 88 IEEE paper. I will just state the results. First we define a function or enumerator P(n,k) as the number of length "n" sequences that generate "k" successes. For example, P(4,1)= 4 (HHTH, HTHH, TTHT, and THTT are 4 possible length 4 sequences). I derived two generating functions g(x) and h(x) in order to enumerate P(n,k), they are compactly represented by the following matrix polynomial. _ _ _ _ _ _ | g(x) | | 1 1 | (n-3) | 4 | | | = | | | | | h(x) | | 1 x | |2+2x | |_ _| |_ _| |_ _| The above is expressed as matrix generating function. It can be shown that P(n,k) is the coefficient of the x^k in the polynomial (g(x)+h(x)). For example, if n=4 we get (g(x)+h(x)) from the matrix generating function as (10+4x+2x^2). Clearly, P(4,1) (coefficient of x) is 4 and P(4,2)=2 ( There are two such sequences THTH, and HTHT). We can show that mean(k) = (n-2)/4 and sd= square_root(5n-12)/4 If we want to compare Sampling 1 with Sampling 2 then in Sampling 2 we need to generate "n" samples. This can be done by using sequences of length (n+2). Then our new statistics would be mean = n/4 (same as that in sampling 1) sd = square_root(5n-2)/4 (not the same as in sampling 1) sd in sampling 2 < sd in sampling 1. This difference can be explained by the fact that in serial sampling there is dependence on the past state. For example, if the past sample was TTT then we know that the next sample can't be a success. This was not the case in Case 1 or Case 2 (transition count). For example, in case 2 success was independent of previous sample. That is if my past sample was TT then my next sample can be TT or TH. The probability of success in Case 1 and Case 2 is not influenced by past history). Similar approach can be followed for higher dimensional cases. Here's a C program I had lying around that is relevant to the current discussion of coin-flipping. The algorithm is N^3 (for N flips) but it could certainly be improved. Compile with cc -o flip flip.c -lm -- Guy _________________ Cut here ___________________ #include #include char *progname; /* Program name */ #define NOT(c) (('H' + 'T') - (c)) /* flip.c -- a program to compute the expected waiting time for a sequence of coin flips, or the probabilty that one sequence of coin flips will occur before another. Guy Jacobson, 11/1/90 */ main (ac, av) int ac; char **av; { char *f1, *f2, *parseflips (); double compute (); progname = av[0]; if (ac == 2) { f1 = parseflips (av[1]); printf ("Expected number of flips until %s = %.1f\n", f1, compute (f1, NULL)); } else if (ac == 3) { f1 = parseflips (av[1]); f2 = parseflips (av[2]); if (strcmp (f1, f2) == 0) { printf ("Can't use the same flip sequence.\n"); exit (1); } printf ("Probability of flipping %s before %s = %.1f%%\n", av[1], av[2], compute (f1, f2) * 100.0); } else usage (); } char *parseflips (s) char *s; { char *f = s; while (*s) if (*s == 'H' || *s == 'h') *s++ = 'H'; else if (*s == 'T' || *s == 't') *s++ = 'T'; else usage (); return f; } usage () { printf ("usage: %s {HT}^n\n", progname); printf ("\tto get the expected waiting time, or\n"); printf ("usage: %s s1 s2\n\t(where s1, s2 in {HT}^n for some fixed n)\n", progname); printf ("\tto get the probability that s1 will occur before s2\n"); exit (1); } /* compute -- if f2 is non-null, compute the probability that flip sequence f1 will occur before f2. With null f2, compute the expected waiting time until f1 is flipped technique: Build a DFA to recognize (H+T)*f1 [or (H+T)*(f1+f2) when f2 is non-null]. Randomly flipping coins is a Markov process on the graph of this DFA. We can solve for the probability that f1 precedes f2 or the expected waiting time for f1 by setting up a linear system of equations relating the values of these unknowns starting from each state of the DFA. Solve this linear system by Gaussian Elimination. */ typedef struct state { char *s; /* pointer to substring string matched */ int len; /* length of substring matched */ int backup; /* number of one of the two next states */ } state; double compute (f1, f2) char *f1, *f2; { double solvex0 (); int i, j, n1, n; state *dfa; int nstates; char *malloc (); n = n1 = strlen (f1); if (f2) n += strlen (f2); /* n + 1 states in the DFA */ dfa = (state *) malloc ((unsigned) ((n + 1) * sizeof (state))); if (!dfa) { printf ("Ouch, out of memory!\n"); exit (1); } /* set up the backbone of the DFA */ for (i = 0; i <= n; i++) { dfa[i].s = (i <= n1) ? f1 : f2; dfa[i].len = (i <= n1) ? i : i - n1; } /* for i not a final state, one next state of i is simply i+1 (this corresponds to another matching character of dfs[i].s The other next state (the backup state) is now computed. It is the state whose substring matches the longest suffix with the last character changed */ for (i = 0; i <= n; i++) { dfa[i].backup = 0; for (j = 1; j <= n; j++) if ((dfa[j].len > dfa[dfa[i].backup].len) && dfa[i].s[dfa[i].len] == NOT (dfa[j].s[dfa[j].len - 1]) && strncmp (dfa[j].s, dfa[i].s + dfa[i].len - dfa[j].len + 1, dfa[j].len - 1) == 0) dfa[i].backup = j; } /* our dfa has n + 1 states, so build a system n + 1 equations in n + 1 unknowns */ eqsystem (n + 1); for (i = 0; i < n; i++) if (i == n1) equation (1.0, n1, 0.0, 0, 0.0, 0, -1.0); else equation (1.0, i, -0.5, i + 1, -0.5, dfa[i].backup, f2 ? 0.0 : -1.0); equation (1.0, n, 0.0, 0, 0.0, 0, 0.0); free (dfa); return solvex0 (); } /* a simple gaussian elimination equation solver */ double *m, **M; int rank; int neq = 0; /* create an n by n system of linear equations. allocate space for the matrix m, filled with zeroes and the dope vector M */ eqsystem (n) int n; { char *calloc (); int i; m = (double *) calloc (n * (n + 1), sizeof (double)); M = (double **) calloc (n, sizeof (double *)); if (!m || !M) { printf ("Ouch, out of memory!\n"); exit (1); } for (i = 0; i < n; i++) M[i] = &m[i * (n + 1)]; rank = n; neq = 0; } /* add a new equation a * x_na + b * x_nb + c * x_nc + d = 0.0 (note that na, nb, and nc are not necessarily all distinct.) */ equation (a, na, b, nb, c, nc, d) double a, b, c, d; int na, nb, nc; { double *eq = M[neq++]; /* each row is an equation */ eq[na + 1] += a; eq[nb + 1] += b; eq[nc + 1] += c; eq[0] = d; /* column zero holds the constant term */ } /* solve for the value of variable x_0. This will go nuts if therer are errors (for example, if m is singular) */ double solvex0 () { register i, j, jmax, k; register double max, val; register double *maxrow, *row; for (i = rank; i > 0; --i) { /* for each variable */ /* find pivot element--largest value in ith column*/ max = 0.0; for (j = 0; j < i; j++) if (fabs (M[j][i]) > fabs (max)) { max = M[j][i]; jmax = j; } /* swap pivot row with last row using dope vectors */ maxrow = M[jmax]; M[jmax] = M[i - 1]; M[i - 1] = maxrow; /* normalize pivot row */ max = 1.0 / max; for (k = 0; k <= i; k++) maxrow[k] *= max; /* now eliminate variable i by subtracting multiples of pivot row */ for (j = 0; j < i - 1; j++) { row = M[j]; if (val = row[i]) /* if variable i is in this eq */ for (k = 0; k <= i; k++) row[k] -= maxrow[k] * val; } } /* the value of x0 is now in constant column of first row we only need x0, so no need to back-substitute */ val = -M[0][0]; free (M); free (m); return val; } _________________________________________________________________ Guy Jacobson (201) 582-6558 AT&T Bell Laboratories uucp: {att,ucbvax}!ulysses!guy 600 Mountain Avenue internet: guy@ulysses.att.com Murray Hill NJ, 07974 ==> probability/flush.p <== Which set contains more flushes than the set of all possible hands? (1) Hands whose first card is an ace (2) Hands whose first card is the ace of spades (3) Hands with at least one ace (4) Hands with the ace of spades ==> probability/flush.s <== An arbitrary hand can have two aces but a flush hand can't. The average number of aces that appear in flush hands is the same as the average number of aces in arbitrary hands, but the aces are spread out more evenly for the flush hands, so set #3 contains a higher fraction of flushs. Aces of spades, on the other hand, are spread out the same way over possible hands as they are over flush hands, since there is only one of them in the deck. Whether or not a hand is flush is based solely on a comparison between different cards in the hand, so looking at just one card is necessarily uninformative. So the other sets contain the same fraction of flushes as the set of all possible hands. ==> probability/hospital.p <== A town has two hospitals, one big and one small. Every day the big hospital delivers 1000 babies and the small hospital delivers 100 babies. There's a 50/50 chance of male or female on each birth. Which hospital has a better chance of having the same number of boys as girls? ==> probability/hospital.s <== If there are 2N babies born, then the probability of an even split is (2N choose N) / (2 ** 2N) , where (2N choose N) = (2N)! / (N! * N!) . This is a DECREASING function. Think about it. If there are two babies born, then the probability of a split is 1/2 (just have the second baby be different from the first). With 2N babies, If there is a N,N-1 split in the first 2N-1, then there is a 1/2 chance of the last baby making it an even split. Otherwise there can be no even split. Therefore the probability is less than 1/2 overall for an even split. As N goes to infinity the probability of an even split approaches zero (although it is still the most likely event). ==> probability/icos.p <== The "house" rolls two 20-sided dice and the "player" rolls one 20-sided die. If the player rolls a number on his die between the two numbers the house rolled, then the player wins. Otherwise, the house wins (including ties). What are the probabilities of the player winning? ==> probability/icos.s <== It is easily seen that if any two of the three dice agree that the house wins. The probability that this does not happen is 19*18/(20*20). If the three numbers are different, the probability of winning is 1/3. So the chance of winning is 19*18/(20*20*3) = 3*19/200 = 57/200. ==> probability/intervals.p <== Given two random points x and y on the interval 0..1, what is the average size of the smallest of the three resulting intervals? ==> probability/intervals.s <== You could make a graph of the size of the smallest interval versus x and y. We know that the height of the graph is 0 along all the edges of the unit square where it is defined and on the line x=y. It achieves its maximum of 1/3 at (1/3,2/3) and (2/3,1/3). Assuming the graph has no curves or bizzare jags, the volume under the graph must be 1/9 and so the average height must also be 1/9. ==> probability/lights.p <== Waldo and Basil are exactly m blocks west and n blocks north from Central Park, and always go with the green light until they run out of options. Assuming that the probability of the light being green is 1/2 in each direction and that if the light is green in one direction it is red in the other, find the expected number of red lights that Waldo and Basil will encounter. ==> probability/lights.s <== Let E(m,n) be this number, and let (x)C(y) = x!/(y! (x-y)!). A model for this problem is the following nxm grid: ^ B---+---+---+ ... +---+---+---+ (m,0) | | | | | | | | | N +---+---+---+ ... +---+---+---+ (m,1) <--W + E--> : : : : : : : : S +---+---+---+ ... +---+---+---+ (m,n-1) | | | | | | | | | v +---+---+---+ ... +---+---+---E (m,n) where each + represents a traffic light. We can consider each traffic light to be a direction pointer, with an equal chance of pointing either east or south. IMHO, the best way to approach this problem is to ask: what is the probability that edge-light (x,y) will be the first red edge-light that the pedestrian encounters? This is easy to answer; since the only way to reach (x,y) is by going south x times and east y times, in any order, we see that there are (x+y)C(x) possible paths from (0,0) to (x,y). Since each of these has probability (1/2)^(x+y+1) of occuring, we see that the the probability we are looking for is (1/2)^(x+y+1)*(x+y)C(x). Multiplying this by the expected number of red lights that will be encountered from that point, (n-k+1)/2, we see that m-1 ----- \ E(m,n) = > ( 1/2 )^(n+k+1) * (n+k)C(n) * (m-k+1)/2 / ----- k=0 n-1 ----- \ + > ( 1/2 )^(m+k+1) * (m+k)C(m) * (n-k+1)/2 . / ----- k=0 Are we done? No! Putting on our Captain Clever Cap, we define n-1 ----- \ f(m,n) = > ( 1/2 )^k * (m+k)C(m) * k / ----- k=0 and n-1 ----- \ g(m,n) = > ( 1/2 )^k * (m+k)C(m) . / ----- k=0 Now, we know that n ----- \ f(m,n)/2 = > ( 1/2 )^k * (m+k-1)C(m) * (k-1) / ----- k=1 and since f(m,n)/2 = f(m,n) - f(m,n)/2, we get that n-1 ----- \ f(m,n)/2 = > ( 1/2 )^k * ( (m+k)C(m) * k - (m+k-1)C(m) * (k-1) ) / ----- k=1 - (1/2)^n * (m+n-1)C(m) * (n-1) n-2 ----- \ = > ( 1/2 )^(k+1) * (m+k)C(m) * (m+1) / ----- k=0 - (1/2)^n * (m+n-1)C(m) * (n-1) = (m+1)/2 * (g(m,n) - (1/2)^(n-1)*(m+n-1)C(m)) - (1/2)^n*(m+n-1)C(m)*(n-1) therefore f(m,n) = (m+1) * g(m,n) - (n+m) * (1/2)^(n-1) * (m+n-1)C(m) . Now, E(m,n) = (n+1) * (1/2)^(m+2) * g(m,n) - (1/2)^(m+2) * f(m,n) + (m+1) * (1/2)^(n+2) * g(n,m) - (1/2)^(n+2) * f(n,m) = (m+n) * (1/2)^(n+m+1) * (m+n)C(m) + (m-n) * (1/2)^(n+2) * g(n,m) + (n-m) * (1/2)^(m+2) * g(m,n) . Setting m=n in this formula, we see that E(n,n) = n * (1/2)^(2n) * (2n)C(n), and applying Stirling's theorem we get the beautiful asymptotic formula E(n,n) ~ sqrt(n/pi). ==> probability/lottery.p <== There n tickets in the lottery, k winners and m allowing you to pick another ticket. The problem is to determine the probability of winning the lottery when you start by picking 1 (one) ticket. A lottery has N balls in all, and you as a player can choose m numbers on each card, and the lottery authorities then choose n balls, define L(N,n,m,k) as the minimum number of cards you must purchase to ensure that at least one of your cards will have at least k numbers in common with the balls chosen in the lottery. ==> probability/lottery.s <== This relates to the problem of rolling a random number from 1 to 17 given a 20 sided die. You simply mark the numbers 18, 19, and 20 as "roll again". The probability of winning is, of course, k/(n-m) as for every k cases in which you get x "draw again"'s before winning, you get n-m-k similar cases where you get x "draw again"'s before losing. The example using dice makes it obvious, however. L(N,k,n,k) >= Ceiling((N-choose-k)/(n-choose-k)* (n-1-choose-k-1)/(N-1-choose-k-1)*L(N-1,k-1,n-1,k-1)) = Ceiling(N/n*L(N-1,k-1,n-1,k-1)) The correct way to see this is as follows: Suppose you have an (N,k,n,k) system of cards. Look at all of the cards that contain the number 1. They cover all ball sets that contain 1, and therefore these cards become an (N-1,k-1,n-1,k-1) covering upon deletion of the number 1. Therefore the number 1 must appear at least L(N-1,k-1,n-1,k-1). The same is true of all of the other numbers. There are N of them but n appear on each card. Thus we obtain the bound. ==> probability/particle.in.box.p <== A particle is bouncing randomly in a two-dimensional box. How far does it travel between bounces, on avergae? Suppose the particle is initially at some random position in the box and is traveling in a straight line in a random direction and rebounds normally at the edges. ==> probability/particle.in.box.s <== Let theta be the angle of the point's initial vector. After traveling a distance r, the point has moved r*cos(theta) horizontally and r*sin(theta) vertically, and thus has struck r*(sin(theta)+cos(theta))+O(1) walls. Hence the average distance between walls will be 1/(sin(theta)+cos(theta)). We now average this over all angles theta: 2/pi * intg from theta=0 to pi/2 (1/(sin(theta)+cos(theta))) dtheta which (in a computation which is left as an exercise) reduces to 2*sqrt(2)*ln(1+sqrt(2))/pi = 0.793515. ==> probability/pi.p <== Are the digits of pi random (i.e., can you make money betting on them)? ==> probability/pi.s <== No, the digits of pi are not truly random, therefore you can win money playing against a supercomputer that can calculate the digits of pi far beyond what we are currently capable of doing. This computer selects a position in the decimal expansion of pi -- say, at 10^100. Your job is to guess what the next digit or digit sequence is. Specifically, you have one dollar to bet. A bet on the next digit, if correct, returns 10 times the amount bet; a bet on the next two digits returns 100 times the amount bet, and so on. (The dollar may be divided in any fashion, so we could bet 1/3 or 1/10000 of a dollar.) You may place bets in any combination. The computer will tell you the position number, let you examine the digits up to that point, and calculate statistics for you. It is easy to set up strategies that might win, if the supercomputer doesn't know your strategy. For example, "Always bet on 7" might win, if you are lucky. Also, it is easy to set up bets that will always return a dollar. For example, if you bet a penny on every two-digit sequence, you are sure to win back your dollar. Also, there are strategies that might be winning, but we can't prove it. For example, it may be that a certain sequence of digits never occurs in pi, but we have no way of proving this. The problem is to find a strategy that you can prove will always win back more than a dollar. The assumption that the position is beyond the reach of calculation means that we must rely on general facts we know about the sequence of digits of pi, which is practically nil. But it is not completely nil, and the challenge is to find a strategy that will always win money. A theorem of Mahler (1953) states that for all integers p, q > 1, -42 |pi - p/q| > q This says that pi cannot have a rational approximation that is extremely tight. Now suppose that the computer picks position N. I know that the next 41 * N digits cannot be all zero. For if they were, then the first N digits, treated as a fraction with denominator 10^N, satisfies: |pi - p / 10^N| < 10^(-42 N) which contradicts Mahler's theorem. So, I split my dollar into 10^(41N) - 1 equal parts, and bet on each of the sequences of 41N digits, except the one with all zeroes. One of the bets is sure to win, so my total profit is about 10(^-41N) of a dollar! This strategy can be improved a number of ways, such as looking for other repeating patterns, or improvements to the bound of 42 -- but the earnings are so pathetic, it hardly seems worth the effort. Are there other winning strategies, not based on Mahler's theorem? I believe there are algorithms that generate 2N binary digits of pi, where the computations are separate for each block of N digits. Maybe from something like this, we can find a simple subsequence of the binary digits of pi which is always zero, or which has some simple pattern. ==> probability/random.walk.p <== Waldo has lost his car keys! He's not using a very efficient search; in fact, he's doing a random walk. He starts at 0, and moves 1 unit to the left or right, with equal probability. On the next step, he moves 2 units to the left or right, again with equal probability. For subsequent turns he follows the pattern 1, 2, 1, etc. His keys, in truth, were right under his nose at point 0. Assuming that he'll spot them the next time he sees them, what is the probability that poor Waldo will eventually return to 0? ==> probability/random.walk.s <== I can show the probability that Waldo returns to 0 is 1. Waldo's wanderings map to an integer grid in the plane as follows. Let (X_t,Y_t) be the cumulative sums of the length 1 and length 2 steps respectively taken by Waldo through time t. By looking only at even t, we get the ordinary random walk in the plane, which returns to the origin (0,0) with probability 1. In fact, landing at (2n, n) for any n will land Waldo on top of his keys too. There's no need to look at odd t. Similar considerations apply for step sizes of arbitrary (fixed) size. ==> probability/reactor.p <== There is a reactor in which a reaction is to take place. This reaction stops if an electron is present in the reactor. The reaction is started with 18 positrons; the idea being that one of these positrons would combine with any incoming electron (thus destroying both). Every second, exactly one particle enters the reactor. The probablity that this particle is an electron is 0.49 and that it is a positron is 0.51. What is the probablity that the reaction would go on for ever?? Note: Once the reaction stops, it cannot restart. ==> probability/reactor.s <== Let P(n) be the probability that, starting with n positrons, the reaction goes on forever. Clearly P'(n+1)=P'(0)*P'(n), where the ' indicates probabilistic complementation; also note that P'(n) = .51*P'(n+1) + .49*P'(n-1). Hence we have that P(1)=(P'(0))^2 and that P'(0) = .51*P'(1) ==> P'(0) equals 1 or 49/51. We thus get that either P'(18) = 1 or (49/51)^19 ==> P(18) = 0 or 1 - (49/51)^19. The answer is indeed the latter. A standard result in random walks (which can be easily derived using Markov chains) yields that if p>1/2 then the probability of reaching the absorbing state at +infinity as opposed to the absorbing state at -1 is 1-r^(-i), where r=p/(1-p) (p is the probability of moving from state n to state n-1, in our case .49) and i equals the starting location + 1. Therefore we have that P(18) = 1-(.49/.51)^19. ==> probability/roulette.p <== You are in a game of Russian roulette, but this time the gun (a 6 shooter revolver) has three bullets _in_a_row_ in three of the chambers. The barrel is spun only once. Each player then points the gun at his (her) head and pulls the trigger. If he (she) is still alive, the gun is passed to the other player who then points it at his (her) own head and pulls the trigger. The game stops when one player dies. Now to the point: would you rather be first or second to shoot? ==> probability/roulette.s <== All you need to consider are the six possible bullet configurations B B B E E E -> player 1 dies E B B B E E -> player 2 dies E E B B B E -> player 1 dies E E E B B B -> player 2 dies B E E E B B -> player 1 dies B B E E E B -> player 1 dies One therefore has a 2/3 probability of winning (and a 1/3 probability of dying) by shooting second. I for one would prefer this option. ==> probability/unfair.p <== Generate even odds from an unfair coin. For example, if you thought a coin was biased toward heads, how could you get the equivalent of a fair coin with several tosses of the unfair coin? ==> probability/unfair.s <== Toss twice. If both tosses give the same result, repeat this process (throw out the two tosses and start again). Otherwise, take the first of the two results. ==> series/series.01.p <== M, N, B, D, P ? ==> series/series.01.s <== T. If you say the sounds these letters make out loud, you will see that the next letter is T. ==> series/series.02.p <== H, H, L, B, B, C, N, O, F ? ==> series/series.02.s <== Answer 1: N, N, M, A The symbols for the elements. Answer 2: N, S, M, A The names of the elements. ==> series/series.03.p <== W, A, J, M, M, A, J? ==> series/series.03.s <== J, V, H, T, P, T, F, P, B, L. Presidents. ==> series/series.03a.p <== G, J, T, J, J, J, A, M, W, J, J, Z, M, F, J, ? ==> series/series.03a.s <== G, J, T, J, J, J, A, M, W, J, J, Z, M, F, J, A. Presidents' first names. ==> series/series.03b.p <== A, J, B, C, G, T, C, V, J, T, D, F, K, B, H, ? ==> series/series.03b.s <== A, J, B, C, G, T, C, V, J, T, D, F, K, B, H, J. Vice Presidents. ==> series/series.03c.p <== M, A, M, D, E, L, R, H, ? ==> series/series.03c.s <== M, A, M, D, E, L, R, H, A. Presidents' wives' first names. ==> series/series.04.p <== A, E, H, I, K, L, ? ==> series/series.04.s <== M, N, O, P, U, W. Letters in the Hawaiian alphabet. ==> series/series.05.p <== A B C D E F G H? ==> series/series.05.s <== M. The names of the cross-streets travelling west on (say) Commonwealth Avenue from Boston Garden: Arlington, Berkeley, Clarendon, Dartmouth, Exeter, Fairfield, Gloucester, Hereford, Massachusetts Ave. ==> series/series.06.p <== Z, O, T, T, F, F, S, S, E, N? ==> series/series.06.s <== T. The name of the integers starting with zero. ==> series/series.06a.p <== F, S, T, F, F, S, ? ==> series/series.06a.s <== The words "first", "second", "third", etc. The same as the previous from this point on. ==> series/series.07.p <== 1, 1 1, 2 1, 1 2 1 1, ... What is the pattern and asymptotics of this series? ==> series/series.07.s <== Each line is derived from the last by the transformation (for example) ... z z z x x y y y ... -> ... 3 z 2 x 3 y ... John Horton Conway analyzed this in "The Weird and Wonderful Chemistry of Audioactive Decay" (T M Cover & B Gopinath (eds) OPEN PROBLEMS IN COMMUNICATION AND COMPUTATION, Springer-Verlag (1987)). You can also find his most complete FRACTRAN paper in this collection. First, he points out that under this sequence, you frequently get adjacent subsequences XY which cannot influence each other in any future derivation of the sequence rule. The smallest such are called "atoms" or "elements". As Conway claims to have proved, there are 92 atoms which show up eventually in every sequence, no matter what the starting value (besides <> and <22>), and always in the same non-zero limiting proportions. Conway named them after some other list of 92 atoms. As a puzzle, see if you can recreate the list from the following, in decreasing atomic number: U Pa Th Ac Ra Fr Rn Ho.AT Po Bi Pm.PB Tl Hg Au Pt Ir Os Re Ge.Ca.W Ta HF.Pa.H.Ca.W Lu Yb Tm ER.Ca.Co HO.Pm Dy Tb Ho.GD EU.Ca.Co Sm PM.Ca.Zn Nd Pr Ce LA.H.Ca.Co Ba Cs Xe I Ho.TE Eu.Ca.SB Pm.SN In Cd Ag Pd Rh Ho.RU Eu.Ca.TC Mo Nb Er.ZR Y.H.Ca.Tc SR.U Rb Kr Br Se As GE.Na Ho.GA Eu.Ca.Ac.H.Ca.ZN Cu Ni Zn.CO Fe Mn CR.Si V Ti Sc Ho.Pa.H.CA.Co K Ar Cl S P Ho.SI Al Mg Pm.NA Ne F O N C B Be Ge.Ca.LI He Hf.Pa.H.Ca.Li Uranium is 3, Protactinium is 13, etc. Rn => Ho.AT means the following: Radon forms a string that consists of two atoms, Holmium on the left, and Astatine on the right. I capitalize the symbol for At to remind you that Astatine, and not Holmium, is one less than Radon in atomic number. As a check, against you or me making a mistake, Hf is 111xx, Nd is 111xxx, In and Ni are 111xxxxx, K is 111x, and H is 22. Next see if you can at least prove that any atom other than Hydrogen, eventually (and always thereafter) forms strings containing all 92 atoms. The grand Conway theorem here is that every string eventually forms (within a universal time limit) strings containing all the 92 atoms in certain specific non-zero limiting proportions, and that digits N greater than 3 are eventually restricted to one of two atomic patterns (ie, abc...N and def...N for some {1,2,3} sequences abc... and def...), which Conway calls isotopes of Np and Pu. (For N=2, these are He and Li), and that these transuranic atoms have a zero limiting proportion. The longest lived exotic element is Methuselum (2233322211N) which takes about 25 applications to reduce to the periodic table. -Matthew P Wiener (weemba@libra.wistar.upenn.edu) Conway gives many results on the ultimate behavior of strings under this transformation: for example, taking the sequence derived from 1 (or any other string except 2 2), the limit of the ratio of length of the (n+1)th term to the length of the nth term as n->infinity is a fixed constant, namely 1.30357726903429639125709911215255189073070250465940... This number is from Ilan Vardi, "Computational Recreations in Mathematica", Addison Wesley 1991, page 13. Another sequence that is related but not nearly as interesting is: 1, 11, 21, 1112, 3112, 211213, 312213, 212223, 114213, 31121314, 41122314, 31221324, 21322314, and 21322314 generates itself, so we have a cycle. ==> series/series.08a.p <== G, L, M, B, C, L, M, C, F, S, ? ==> series/series.08a.s <== Army officer ranks, descending. ==> series/series.08b.p <== A, V, R, R, C, C, L, L, L, E, ? ==> series/series.08b.s <== Navy officer ranks, descending. ==> series/series.09a.p <== S, M, S, S, S, C, P, P, P, ? ==> series/series.09a.s <== Army non-comm ranks, descending. ==> series/series.09b.p <== M, S, C, P, P, P, S, S, S, ? ==> series/series.09b.s <== Navy non-comm ranks, descending. ==> series/series.10.p <== D, P, N, G, C, M, M, S, ? ==> series/series.10.s <== N, V, N, N, R. States in Constitution ratification order. ==> series/series.11.p <== R O Y G B ? ==> series/series.11.s <== V. Colors. ==> series/series.12.p <== A, T, G, C, L, ? ==> series/series.12.s <== V, L, S, S, C, A, P. Zodiacal signs. ==> series/series.13.p <== M, V, E, M, J, S, ? ==> series/series.13.s <== U, N, P. Names of the Planets. ==> series/series.14.p <== A, B, D, O, P, ? ==> series/series.14.s <== Q, R. Only letters with an inside as printed. ==> series/series.14a.p <== A, B, D, E, G, O, P, ? ==> series/series.14a.s <== Q. Letters with cursive insides. ==> series/series.15.p <== A, E, F, H, I, ? ==> series/series.15.s <== L, M, N, O, S, U. Letters whose English names start with vowels. ==> series/series.16.p <== A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, X, Y? ==> series/series.16.s <== Z. Letters whose English names have one syllable. ==> series/series.17.p <== T, P, O, F, O, F, N, T, S, F, T, F, E, N, S, N? ==> series/series.17.s <== T, T, T, E, T, E. Digits of Pi. ==> series/series.18.p <== 10, 11, 12, 13, 14, 15, 16, 17, 20, 22, 24, ___ , 100, 121, 10000 ==> series/series.18.s <== 10, 11, 12, 13, 14, 15, 16, 17, 20, 22, 24, 31 , 100, 121, 10000 Sixteen in base n for n=16, 15, ..., 2. ==> series/series.19.p <== 1 01 01011 0101101011011 0101101011011010110101101101011011 etc. Each string is formed from the previous string by substituting '01' for '1' and '011' for '0' simultaneously at each occurance. Notice that each string is an initial substring of the previous string so that we may consider them all as initial substrings of an infinite string. The puzzle then is, given n, determine if the nth digit is 0 or 1 without having to construct all the previous digits. That is, give a non-recursive formula for the nth digit. ==> series/series.19.s <== Let G equal the limit string generated by the above process and define the string F by F[0] = "0", F[n] = "1" if n = floor(phi*m) for some positive integer m, F[n] = "0" if n = floor(phi^2*m) for some positive integer m, where floor(x) is the greatest integer =< x and phi = (1 + \/5)/2; I claim that F = G. I will try to motivate my solution. Let g[0]="0" and define g[n+1] to be the string that results from replacing "0" in g[n] with "01" and "1" with "011"; furthermore, let s(n) and t(n) be the number of "0"'s and "1"'s in g[n], respectively. Note that we have the following recursive formulas : s(n+1) = s(n) + t(n) and t(n+1) = s(n) + 2t(n). I claim that s(n) = Fib(2n-1) and t(n) = Fib(2n), where Fib(m) is the mth Fibonacci number (defined by Fib(-1) = 1, Fib(0) = 0, Fib(n+1) = Fib(n) + Fib(n-1) for n>=0); this is easily established by induction. Now noting that Fib(2n)/Fib(2n-1) -> phi as n -> infinity, we see that if the density of the "0"'s and "1"'s exists, they must be be 1/phi^2 and 1/phi, respectively. What is the simplest generating sequence which has this property? Answer: the one given above. Proof: We start with Beatty's Theorem: if a and b are positive irrational numbers such that 1/a + 1/b = 1, then every positive integer has a representation of the form floor(am) or floor(bm) (m a positive integer), and this representation is unique. This shows that F is well-defined. I now claim that Lemma: If S(n) and T(n) (yes, two more functions; apparently today's the day that functions have their picnic) represent the number of "0"'s and "1"'s in the initial string of F of length n, then S(n) = ceil(n/phi^2) and T(n) = floor(n/phi) (ceil(x) is the smallest integer >= x). Proof of lemma: using the identity phi^2 = phi + 1 we see that S(n) + T(n) = n, hence for a given n either S(n) = S(n-1) + 1 or T(n) = T(n-1) + 1. Now note that if F[n-1]="1" ==> n-1 = floor(phi*m) for some positive integer m and since phi*m-1 < floor(phi*m) < phi*m ==> m-1/phi < (n-1)/phi < m ==> T(n) = T(n-1) + 1. To finish, note that if F[n-1]="0" ==> n-1 = floor(phi^2*m) for some positive integer m and since phi^2*m-1 < floor(phi^2*m) < phi^2*m ==> m-1/phi^2 < (n-1)/phi^2 < m ==> S(n) = S(n-1) + 1. Q.E.D. I will now show that F is invariant under the operation of replacing "0" with "01" and "1" with "011"; it will then follow that F=G. Note that this is equivalent to showing that F[2S(n) + 3T(n)] = "0", F[2S(n) + 3T(n) + 1] = "1", and that if n = [phi*m] for some positive integer m, then F[2S(n) + 3T(n) + 2] = "1". One could waste hours trying to prove some fiendish identities; watch how I sidestep this trap. For the first part, note that by the above lemma F[2S(n) + 3T(n)] = F[2*ceil(n/phi^2) + 3*floor(n/phi)] = F[2n + floor(n/phi)] = F[2n + floor(n*phi-n)] = F[floor(phi*n+n)] = F[floor(phi^2*n)] ==> F[2S(n) + 3T(n)] = "0". For the second, it is easy to see that since phi^2>2, if F[m]="0" ==> F[m]="1" hence the first part implies the second part. Finally, note that if n = [phi*m] for some positive integer m, then F[2S(n) + 3T(n) + 3] = F[2S(n+1) + 3T(n+1)] = "0", hence by the same reasoning as above F[2S(n) + 3T(n) + 2] = "1". Q.E.D. -- clong@remus.rutgers.edu (Chris Long) ==> series/series.20.p <== 1 2 5 16 64 312 1812 12288 ==> series/series.20.s <== ANSWER: 95616 The sum of factorial(k)*factorial(n-k) for k=0,...,n. ==> series/series.21.p <== 5, 6, 5, 6, 5, 5, 7, 5, ? ==> series/series.21.s <== The number of letters in the ordinal numbers. First 5 Second 6 Third 5 Fourth 6 Fifth 5 Sixth 5 Seventh 7 Eighth 6 Ninth 5 etc. ==> series/series.22.p <== 3 1 1 0 3 7 5 5 2 ? ==> series/series.22.s <== ANSWER: 4 The digits of pi expressed in base eight. ==> series/series.23.p <== 22 22 30 13 13 16 16 28 28 11 ? ==> series/series.23.s <== ANSWER: 15 The birthdays of the Presidents of the United States. ==> series/series.24.p <== What is the next letter in the sequence: W, I, T, N, L, I, T? ==> series/series.24.s <== S. First letters of words in question. ==> series/series.25.p <== 1 3 4 9 10 12 13 27 28 30 31 36 37 39 40 ? ==> series/series.25.s <== 1 3 4 9 10 12 13 27 28 30 31 36 37 39 40 ... i in binary, treated as a base 3 number and converted to decimal. ==> series/series.26.p <== 1 3 2 6 7 5 4 12 13 15 14 10 11 9 8 24 25 27 26 ? ==> series/series.26.s <== 1 3 2 6 7 5 4 12 13 15 14 10 11 9 8 24 25 27 26 ... Take i in binary, for each 1 bit (in i, not changed) flip the next bit. This can also be phrased in reversing sequences of numbers. More simply, just the integers in reflective-Gray-code order. ==> series/series.27.p <== 0 1 1 2 1 2 1 3 2 2 1 3 1 2 2 4 1 3 1 3 2 2 1 4 2 ? ==> series/series.27.s <== 0 1 1 2 1 2 1 3 2 2 1 3 1 2 2 4 1 3 1 3 2 2 1 4 2 ... Number of factors in prime factorization of i. ==> series/series.28.p <== 0 2 3 4 5 5 7 6 6 7 11 7 13 9 8 8 17 8 19 9 10 13 23 9 10 ? ==> series/series.28.s <== 0 2 3 4 5 5 7 6 6 7 11 7 13 9 8 8 17 8 19 9 10 13 23 9 10 ... Sum of factors in prime factorization of i. ==> series/series.29.p <== 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 ? ==> series/series.29.s <== 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 ... The number of 1s in the binary expansion of n. ==> series/series.30.p <== I I T Y W I M W Y B M A D ==> series/series.30.s <== ? (first letters of "If I tell you what it means will you buy me a drink?") ==> series/series.31.p <== 6 2 5 5 4 5 6 3 7 ==> series/series.31.s <== 6. The number of segments on a standard calculator display it takes to represent the digits starting with 0. _ _ _ _ _ _ _ _ | | | _| _| |_| |_ |_ | |_| |_| |_| | |_ _| | _| |_| | |_| _| ==> series/series.32.p <== 0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0 1 ==> series/series.32.s <== 0 -> 1 01 -> 10 0110 -> 1001 01101001 -> 10010110 Recursively append the inverse. This sequence is known as the Morse-Thue sequence. It can be defined non-recursively as the nth term is the mod 2 count of 1s in n written in binary: 0->0 1->1 10->1 11->0 100->1 101->0 110->0 111->1 etc. Reference: Dekking, et. al., "Folds! I,II,III" The Mathematical Intelligencer, v4,#3,#4,#4. ==> series/series.33.p <== 2 12 360 75600 ==> series/series.33.s <== 2 = 2^1 12 = 2^2 * 3^1 360 = 2^3 * 3^2 * 5^1 75600 = 2^4 * 3^3 * 5^2 * 7^1 174636000 = 2^5 * 3^4 * 5^3 * 7^2 * 11^1 ==> series/series.34.p <== 3 5 4 4 3 5 5 4 3 ==> series/series.34.s <== The number of letters in the English words for the counting numbers. ==> series/series.35.p <== 1 2 3 2 1 2 3 4 2 1 2 3 4 2 2 3 ==> series/series.35.s <== The number of letters in the Roman numeral representation of the numbers. ==> trivia/area.codes.p <== When looking at a map of the distribution of telephone area codes for North America, it appears that they are randomly distributed. I am doubtful that this is the case, however. Does anyone know how the area codes were/are chosen? ==> trivia/area.codes.s <== Originally, back in the middle 1950's when direct dialing of long distance calls first became possible, the idea was to assign area codes with the 'shortest' dialing time required to the larger cities. Touch tone dialing was very rare. Most dialed calls were with 'rotary' dials. Area codes like 212, 213, 312 and 313 took very little time to dial (while waiting for the dial to return to normal) as opposed, for example, to 809, 908, 709, etc ... So the 'quickest to dial' area codes were assigned to the places which would probably receive the most direct dialed calls, i.e. New York City got 212, Chicago got 312, Los Angeles got 213, etc ... Washington, DC got 202, which is a little longer to dial than 212, but much shorter than others. In order of size and estimated amount of telephone traffic, the numbers got larger: San Fransisco got 415, which is sort of in the middle, and Miami got 305, etc. At the other end of the spectrum came places like Hawaii (it only got statehood as of about 1958) with 808, Puerto Rico with 809, Newfounland with 709, etc. The original (and still in use until about 1993) plan is that area codes have a certain construction to the numbers: The first digit will be 2 through 9. The second digit will always be 0 or 1. The third digit will be 1 through 9. Three digit numbers with two zeros will be special codes, ie. 700, 800 or 900. Three digit numbers with two ones are for special local codes, i.e. 411 for local directory assistance, 611 for repairs, etc. Three digit codes ending in '10', i.e. 410, 510, 610, 710, 810, 910 were 'area codes' for the AT&T (and later on Western Union) TWX network. This rule has been mostly abolished, however 610 is still Canadian TWX, and 910 is still used by Western Union TWX. Gradually the '10' codes are being converted to regular area codes. We are running out of possible combinations of numbers using the above rules, and it is estimated that beginning in 1993-94, area codes will begin looking like regular telephone prefix codes, with numbers other than 0 or 1 as the second digit. I hope this gives you a basic idea. There were other rules at one time such as not having an area code with zero in the second digit in the same state as a code with one in the second digit, etc .. but after the initial assignment of numbers back almost forty years ago, some of those rules were dropped when it became apparent they were not flexible enough. Patrick Townson TELECOM Digest Moderator -- Patrick Townson patrick@chinet.chi.il.us / ptownson@eecs.nwu.edu / US Mail: 60690-1570 FIDO: 115/743 / AT&T Mail: 529-6378 (!ptownson) / MCI Mail: 222-4956 ==> trivia/eskimo.snow.p <== How many words do the Eskimo have for snow? ==> trivia/eskimo.snow.s <== Couple of weeks ago, someone named D.K. Holm in the Boston Phoenix came up with the list, drawn from the Inupiat Eskimo Dictionary by Webster and Zibell, and from Thibert's English-Eskimo Eskimo-English Dictionary. The words may remind you of generated passwords. Eskimo English Eskimo English ---------------------------------+---------------------------- apun snow | pukak sugar snow apingaut first snowfall | pokaktok salt-like snow aput spread-out snow | miulik sleet kanik frost | massak snow mixed with water kanigruak frost on a | auksalak melting snow living surface | aniuk snow for melting ayak snow on clothes | into water kannik snowflake | akillukkak soft snow nutagak powder snow | milik very soft snow aniu packed snow | mitailak soft snow covering an aniuvak snowbank | opening in an ice floe natigvik snowdrift | sillik hard, crusty snow kimaugruk snowdrift that | kiksrukak glazed snow in a thaw blocks something | mauya snow that can be perksertok drifting snow | broken through akelrorak newly drifting snow | katiksunik light snow mavsa snowdrift overhead | katiksugnik light snow deep enough and about to fall | for walking kaiyuglak rippled surface | apuuak snow patch of snow | sisuuk avalanche =*= ==> trivia/federal.reserve.p <== What is the pattern to this list: Boston, MA New York, NY Philadelphia, PA Cleveland, OH Richmond, VA Atlanta, GA Chicago, IL St. Louis, MO Minneapolis, MN Kansas City, MO Dallas, TX San Francisco, CA ==> trivia/federal.reserve.s <== Each of the cities is a location for a Federal Reserve. The cities are listed in alphabetical order based on the letter that represents each city on a dollar bill. ==> trivia/jokes.self-referential.p <== What are some self-referential jokes? ==> trivia/jokes.self-referential.s <== Q: What is alive, green, lives all over the world, and has seventeen legs? A: Grass. I lied about the legs. The two rules for success are: 1. Never tell them everything you know. There are three kinds of people in the world: those who can count, and those who cannot.