If there are n distinct kinds of baseball cards (or coupons, or whatever), each equally likely to be received with any given purchase, what is the expected number of purchases in order to acquire k complete sets, i.e., at least k of each of the n types of cards? With k=1 this is just the standard "collector's problem", which has a simple and well-known answer, namely, that the expected number of purchases is n(1 + 1/2 + 1/3 + ... + 1/n) This is analagous to a problem in reliability theory with n parallel redundant components, each with an exponential failure rate of 1/T, so the mean time to fail of each individual component is T. Beginning from a full-up system, the mean (expected) time until ALL n components have failed is T(1 + 1/2 + 1/3 + ... + 1/n) For failures of n redundant systems the proof of this expected mean time to failure can be seen immediately from the Markov model where L is the individual (exponential) failure rate of each component: ______ _______ ______ ______ | 0 | nL | 1 | (n-1)L | 2 | L | n | |failed|------->|failed |-------->|failed|--- ... -----|failed| |______| |_______| |______| |______| The mean times for these transitions are 1/nL, 1/((n-1)L),..,1/L, which gives the sum (1/L)(1 + 1/2 + 1/3 + ... + 1/n). Intuitively this is related to the collector's problem (with k=1) because we can regard each of the n types of coupons as a "component", and getting a coupon of a given type is the same as "failing the component". Since there is a 1/n chance of getting a coupon of that type on each purchase, the expected number of purchases to get one of that type is approximately n. Thus, the "mean time to fail" for that "component" is roughly T=n, where we have made "time" proportional to "purchases", and so the expected (mean) "time to fail" (i.e., number of purchases) to get ONE of EACH of the n types is just the quantity n(1 + 1/2 + ... + 1/n) as noted above. (Incidentally, if there were infinitely many different types of coupons, we could obviously never get a complete set, so this gives a proof that the harmonic series diverges.) However, the simple analogy between the reliability model and the collector's problem breaks down for k greater than 1. For this more general case we need to evaluate the problem combinatorially. We might naively think, at least as a first approximation, that the expected number of purchases needed to acquire k complete sets might be roughly k times the expected number of purchases needed to acquire just 1 complete set, but this turns out not to be a very good approximation. If we focus first on just the simple case n=2, the direct combinatorial approach gives the expected number of purchases needed to get at least k of both types as inf E(k) = SUM (j+1) C(j,k-1) / 2^j (1) j=2k-1 which gives E(1)=3, E(2)=11/2, E(3)=63/8, and so on. This solves the general problem for n=2, but unfortunately or n greater than 2 this approach requires us to evauate a summation over the set of partitions into n parts, which becomes fairly complicated as n increases. For example, with n=3 and k=2, there are 3^6 possible (and equally likely) sequences of coupons, but only 21 distinct partitions of 6 into three or fewer positive integers. Of those partitions, only [2,2,2] satisfies the requirement k=2. The number of sequences that give this partition is C(6,2)C(4,2)C(2,2)=90, so the contribution to the expectation for 6 purchases is 6*(90/3^6). Then we can proceed to compute the probability of at least 2 of each type after 7 purchases, and subtract the probability after 6 to give the prob that exactly 7 purchases are required, etc. As n increases we will need to consider the partitions into more and more parts, and that is generally non-trivial. Equation (1) gives the values k E_2(k) numerical 1st diff 2nd diff ------- ----------- --------- -------- -------- 1 3/ 2^0 3.0000 2 11/ 2^1 5.5000 2.5000 3 63/ 2^3 7.8750 2.3750 -0.1250 4 163/ 2^4 10.1875 2.3125 -0.0625 5 1595/ 2^7 12.4609 2.2734 -0.0391 6 3765/ 2^8 14.7070 2.2461 -0.0273 7 17339/2^10 16.9326 2.2256 -0.0205 8 39203/2^11 19.1420 2.2094 -0.0162 9 699219/2^15 21.3384 2.1964 -0.0130 10 1541665/2^16 23.5239 2.1855 -0.0109 11 6737137/2^18 25.7001 2.1762 -0.0093 12 14611029/2^19 27.8683 2.1682 -0.0080 It appears that the expected number of purchases to get k complete sets of 2 coupons goes up by about 2 each time k is incremented, so it ends up being just a little greater than 2k, whereas our earlier naive estimate (k times the expectation for one complete set) predicted 3k. It's interesting to note the exponents of 2 in the denominators of these expectations are 0,1,3,4,7,8,10,11,15,16,18,19,... and the differences between consecutive exponents are 1,2,1,3,1,2,1,4,1,2,1,... which of course are the powers of 2 in the sequence of consecutive even numbers 2,4,6,8,10,12,14,16,18,20,22,... Notice that we can write the above values in the form E_2(k) = k(sum) and when we do this we see a familiar pattern: E_2(1) = 1(3) E_2(2) = 2(3 - 1/4) E_2(3) = 3(3 - 1/4 - 1/8) E_2(4) = 4(3 - 1/4 - 1/8 - 5/64) E_2(5) = 5(3 - 1/4 - 1/8 - 5/64 - 7/128) etc. Thus each term is simply the coefficient in the expansion of 2 sqrt(1-x), which is _______ 2 / 1 - x = 2 - x - (1/4)x^2 - (1/8)x^3 - (5/64)x^4 - ... It's interesting to compare this with the well-known case k=1, for which we have E_1(1) = 1(1) E_2(1) = 2(1 + 1/2) E_3(1) = 3(1 + 1/2 + 1/3) E_4(1) = 4(1 + 1/2 + 1/3 + 1/4) E_5(1) = 5(1 + 1/2 + 1/3 + 1/4 + 1/5) where the terms are the coefficients of the expansion -ln(1+x) -ln(1+x) = (1)x + (1/2)x^2 + (1/3)x^3 + (1/4)x^4 + ... To summarize, if there are n different kinds of coupons, let E_n(k) denote the expected number of purchases necessary to acquire k complete sets. The well-known case is with k=1, in which case we have E_n(1) = n(1 + 1/2 + 1/3 + ... + 1/n) for all n. Of course, the case of n=1 trivially gives E_1(k) = k for all k. We've also seen that for n=2 the values of E_2(k) can be given as a sum of k terms whose values (after the first) are the leading coefficients in the power series expansion of 2sqrt(1-x). Thus we have / 3 1 1*3 1*3*5 1*3*..*(2k-3) \ E_2(k) = 2*k( --- - --- - ----- - ------- - ... - ------------- ) \ 2 2*4 2*4*6 2*4*6*8 2*4*..*(2k) / for all k. Anyway, generalizing the combinatorial approach to give the expected number of purchases necessary to acquire k full sets of n coupons, we first determine the probability of each way of accomplishing this in a given number of purchases. Clearly the probability of accomplishing this is zero for any number less than kn. The probability of having k complete sets of n coupons after exactly kn purchases is M(nk;k,k,...,k) q0 = --------------- n^(nk) where M(a;b,c,..,z) signifies the multinomial coefficient, e.g., M(6;3,2,1) = (6!)/((3!)(2!)(1!)). This represents the case of no excess coupons, so it can be regarded as a partition of 0 into n parts, added on top of a set of n consecutive k's: [k k k ... k]. If we let p[j] denote the probability of achieving k complete sets in *exactly* j purchases, then we have p[j]=0 for all j less than kn, and p[kn] = q0. The next case corresponds to the possible results after kn+1 purchases, so there is exactly one excess coupon, and this can be regarded as a partition of 1 into n parts, so it looks like [k+1 k k ... k], where order is suppressed. Of course, there are n distinct ways of arranging this partition, so the total probability of having (at least) k complete sets of n coupons after kn+1 purchases is M(nk+1;k+1,k,k,..,k) q1 = n -------------------- n^(nk+1) It follows that the probability of achieving k sets in *exactly* kn+1 purchases is p[kn+1] = q1 - q0. Now, going on to cover the case of kn+2 purchases, there are two possible ways of partitioning the 2 excess coupons. We have either [k+2 k k ... k] or else [k+1 k+1 k ... k]. In the first case there are n distinct arrangements, whereas in the second case there are n(n-1)/2 distinct arrangements. Thus the probability of having k complete sets after kn+2 purchases is q2, defined by n^(nk+2) q2 = C(n,1) M(nk+2;k+2,k,k,...,k) + C(n,2) M(nk+2;k+1,k+1,k,...,k) and the probability of achieving k sets in *exactly kn+2 purchases is p[kn+2] = q2 - (q1-q0) - q0 = q2 - q1 Continuing on in this way, we can generate p[kn+j] for j=0,1,2,... and then the expected number of purchases is the infinite sum E_n(k) = kn p[kn] + (kn+1) p[kn+1] + (kn+2) p[kn+2] + ... which can also be written in terms of the q values as E_n(k) = kn(q0) + (kn+1)(q1-q0) + (kn+2)(q2-q1) + ... The partial sum up to the jth term is (kn+j) q{j} - (q0+q1+q2+...+q{j-1}) so as j goes to infinity and q{j} goes to 1, this equals E_n(k) = kn + (1-q0) + (1-q1) + (1-q2) + ... To illustrate how this works, consider the simple case n=3 and k=2. We have (3^6) q0 = 1M(6;2,2,2) = 90 (3^7) q1 = 3M(7;3,2,2) = 630 (3^8) q2 = 3M(8;4,2,2) + 3M(8,3,3,2) = 2940 (3^9) q3 = 3M(9;5,2,2) + 6M(9,4,3,2) + 1M(9;3,3,3) = 11508 (3^10)q4 = 3M(10;6,2,2) + 6M(10,5,3,2) + 3M(10;4,4,2) + 3M(10;4,3,3) = 40950 (3^11)q5 = 3M(11;7,2,2) + 6M(11;6,3,2) + 6M(11;5,4,2) + 3M(11;5,3,3) + 3M(11;4,4,3) = 137610 so the expected number of purchases is E_3(2) = 6 + (1 - 90/3^6) + (1 - 630/3^7) + (1 - 2940/3^8) + (1 - 11508/3^9) + (1 - 40950/3^10) + (1 - 137610/3^11) + ... = 6 + 71/81 + 173/243 + 1207/2187 + 2725/6561 + 2011/6561 + 4393/19683 + ... This illustrates the two drawbacks of this approach. First, the sum doesn't converge very rapidly. Second, the jth term requires us to evaluate all the partitions of j into n parts. This isn't too bad for n=3, but for large values of n it would become very laborious, because the number of partitions rises rapidly. Now, for n=3 the direct combinatorial approach gives the values k E_3(k) --- ------------ 1 11/2 5.5000000000 2 347/36 9.6388888888 3 17479/1296 13.4868827160 4 3609499/209952 17.1920200807 5 20.8084225468 6 24.3628949881 7 27.8709986160 8 31.3427087302 9 34.7848707279 10 38.2024221968 11 41.5990628893 12 44.9776497020 This shows that the expected number of purchases goes up by somewhat more than 3 each time k is incremented, and the first differences between these numbers evidently go to 3. It's also interesting to note that the denominators of E_3(k) are (2^1)(3^0) (2^2)(3^2) (2^4)(3^4) (2^5)(3^8) which seems to suggest a systematic rule of formation. (Recall that the denominators for n=2 were all of the form 2^j with j = 0,1,3,4,7,8,... which showed that each terms was being multiplied by successive even numbers.)

Return to MathPages Main Menu