Suppose we have N bins and we randomly throw balls into them until exactly m bins contain at least two balls. What is the expected number of balls required? This is a special case of the generalized Birthday Problem, which asks for the expected number of duplicate birthdays (doubles, triples, etc) among k randomly selected people (with N=365). An exhaustive treatment of this general problem requires us to explicitly evaluate the probability of each individual partition of k, which becomes quite laborious for large k. However, if we seek only to know simple agregate expectations, such as the number of doubles, or the number of triples, then we can use a more efficient approach. Given N bins, let E(m,k) denote the expected number of bins containing exactly m balls after k have been thrown. We have the initial conditions E(0,0)=365 and E(m,0)=0 for all m>0. The first toss will necessarily land in one of the unoccupied bins (because they are all unoccupied), so this decreases the expected number of empty bins by 1, and increases the number of bins containing exactly one ball by 1. In general, on the kth toss, the probability of landing in a bin that already contains m-1 balls is E(m-1,k-1)/N, and this increases by 1 the expected number of bins containing m balls. On the other hand, the kth toss also has a probability of E(m,k-1)/N of landing in a bin that already contains m balls, and this event will DECREASE by 1 the number of bins containing exactly m balls. Therefore, after the kth toss, the expected number of bins containing exactly m balls is / 1 \ / 1 \ E(m,k) = E(m,k-1) + ( --- )E(m-1,k-1) - ( --- )E(m,k-1) (1) \ N / \ N / with the understanding that E(-1,k) is 0 for all k. This expression is homogeneous, so each E has a fixed proportion to E(0,0)=365. Thus we could define the normalized function f(m,k) = E(m,k)/N, so that f(0,0)=1 and in general f(m,k) signifies the expected fraction of the bins that contain exactly m balls after k tosses. The same recurrence relation applies, and we can collect terms to write this in the form f(m,k) = A f(m,k-1) + B f(m-1,k-1) where / 1 \ / 1 \ A = ( 1 - --- ) B = ( --- ) \ N / \ N / The first few rows of this function are shown below k 0 1 2 3 4 5 --- ----- ------ --------- --------- ------ ----- 0 1 0 0 0 0 0 1 A B 0 0 0 0 2 A^2 2BA B^2 0 0 0 3 A^3 3BA^2 3B^2 A B^3 0 0 4 A^4 4BA^3 6B^2 A^2 4B^3 A B^4 0 5 A^5 5BA^4 10B^2 A^3 10B^3 A^2 5B^4 A B^5 It's easy to see that the general term is k-m m f(m,k) = C(k,m) A B where C(k,m) is the binomial coefficient (k choose m). Hence the expected number of bins containing exactly m balls after k tosses is k! / 1 \ k-m / 1 \ m E(m,k) = N --------- ( 1 - --- ) ( --- ) (k-m)! m! \ N / \ N / Incidentally, equation (1) also suggests a continuous model, in which the rate of change of each variable is a linear function of the variables. Specifically, in the continuous limit, equation (1) goes to the system of equations dx0/dk = (0-x0)/N dx1/dk = (x0-x1)/N dx2/dk = (x1-x2)/N dx3/dk = (x2-x3)/N dx4/dk = (x3-x4)/N etc. where xj denotes the fraction of bins containing exactly j balls, and k is a continuous variable representing the number of tosses. Solving this system gives the exact analytical solution for each of the xj functions. Letting t denote the variable k/N, we have 1 j -t xj(t) = --- t e j! For example, the expected fraction of bins containing exactly 2 balls (in the continuous limit) is 1 / k \ 2 -(k/N) x2(k) = --- ( --- ) e 2 \ N / By comparison, the exact discrete solution gives k(k-1) / 1 \ k-2 / 1 \ 2 f(k,2) = ------ ( 1 - --- ) ( --- ) 2 \ N / \ N / Re-arranging terms, this can be written in the form _ _ 1 / k \2 /k-1\ | / 1 \ N |(k-2)/N f(k,2) = ---( --- ) ( --- )| ( 1 - --- ) | 2 \ N / \ k / |_ \ N / _| The quantity in the square brackets approaches 1/e as N increases, and the quantity (k-1)/k approaches 1 as k increases, and the exponent (k-2)/N approaches k/N, so this confirms that x2(k) is the continuous limit of f(k,2). Also notice that the sum of the expectations is -t / t^2 t^3 \ x0(t) + x1(t) + x2(t) + ... = e ( 1 + t + --- + --- + ... ) \ 2! 3! / -t t = e e = 1 as required. To illustrate, suppose we have N=365 bins, and k=80 tosses. What is the expected number of bins containing two or more balls? Using the exact discrete formula, we have E(80,0) = 293.0720... E(80,1) = 64.4114... E(80,2) = 6.9897... E(80,3) = 0.4992... E(80,4) = 0.0264... E(80,5) = 0.0011... Thus the expected number of bins with two or more balls is 6.9897 + 0.4992 + 0.0264 + 0.0011 + ... = 7.51... Notice that, as required, we have the identities E(80,0) + E(80,1) + E(80,2) + ... = 365 0*E(80,0) + 1*E(80,1) + 2*E(80,2) + ... = 80

Return to MathPages Main Menu