## The Collector's Problem

```Suppose there exist n (maybe 100) baseball players, each of whom
has his face on baseball cards.  A fan can buy cards, one at a
time.  Each purchase has an equal chance of being any of the
players (no matter how many cards are bought).  How many cards
must the fan buy in order to have a complete set with probability
p (say, 0.99)?

This is related to the well-known problem of determining the
*expected* number of purchases necessary to acquire a complete set
of cards, i.e., at least one of each of the n available types.  The
expected number of purchases is

E(n)  =  n(1 + 1/2 + 1/3 + ... + 1/n)

as discussed in the note Collecting k Complete Sets of Coupons.
However, it's somewhat more difficult to determine the number of
purchases necessary to give a certain probability of having a
complete set.  In a sense, this is a variation of the generalized
"birthday problem", with 100 possible baseball cards instead of
365 possible birthdays.  In general terms, if there are L distinct
possible outcomes (each assumed to be equally probable), the problem
is to determine the probability of a particular set of outcomes in
N trials.

If N is small, then the "brute force" method can be used to compute
the exact answer.  To take a simple example, suppose N=5.  The seven
possible partitions of 5 are

5  = [5]
= [4+1]
= [3+2]
= [3+1+1]
= [2+2+1]
= [2+1+1+1]
= [1+1+1+1+1]

These partitions correspond to all possible outcomes.  The first
represents the cases when all 5 trials have the same outcome.  The
second represents the case when four of the five outcomes were the
same and one was different.  The third represents the case when three
of the trials had one outcome and the other two trials had another
outcome, and so on.

Each of these seven outcomes has a definite probability.  For example,
the probability of outcome [5] is simply (1/L)^4, and the probability
of outcome [1+1+1+1+1] is given by (L-1)!/((L-5!)*(L)^4).  Let s(N,k)
denote the number of components in the kth partition.  The kth partition
of N will consist of a certain number of 1s  (i.e., unique outcomes), a
certain number of 2s (i.e., "doubles"), a certain number of 3s (i.e.,
"triples"), and so on.  Let  a_t  denote the number of t-tuples in the
kth partition.  Then define function

N!
M(N,k)   =    ---------------------------
N
PROD  ((t!)^(a_t))*((a_t)!)
t=1

This is called the M_3 multinomial coefficient in "The Handbook of
Mathematical Functions" by Abramowitz and Stegun (chapter 24 on
Combinatorial Analysis).  Then I believe the exact probability of
Q[N,k] (the kth partition of N) is given by

M(N,k) L!
Pr{Q[N,k]} =    -----------------------
(L-s(N,k))! L^N

Once we have computed the probability of each partition, we can then
add up the probabilities for the partitions having any particular
property of interest, such as those for which all possible outcomes
occurred at least once during the N trials (e.g., you have at least
one of the 100 available basaeball cards).

This approach is easy if N is small.  The difficulty is that the number
of possible partitions of N (of L or fewer components) increases very
rapidly as N gets larger.  For example, p(180) = 684,957,390,936.
Therefore, computing the EXACT rational answers for large values of N
may be computationally intractable, although you could probably derive
some good asymptotic formulas.  Feller's "Introduction to Probability"
discusses approximate solutions of this general problem.

Notice that the solution of the "standard" birthday problem (to find
the probability of NO duplicate results) is a special case of the above
formula.  In this case we want the probability of the single partition
consisting of all 1s.  Here we have M(N,k)=1 and s(N,k)=N, so the answer
to the standard birthday problem reduces to the well-known formula
Pr{no duplicates} = L!/[(L-N)!*L^N].

To answer the more specialized question of the probability of at least
one of each possible outcome, a Markov model is convenient.  The states
can be distinguished by a single number, the number of outcomes that
have occurred so far.  The 100x100 transition matrix is

0.01     0      0      0      0   . . .    0      0     0
0.99   0.02     0      0      0   . . .    0      0     0
0    0.98   0.03     0      0   . . .    0      0     0
0      0    0.97   0.04     0   . . .    0      0     0
0      0      0    0.96   0.05  . . .    0      0     0
.      .      .      .      .            .      .     .
.      .      .      .      .            .      .     .
.      .      .      .      .            .      .     .
0      0      0      0      0          0.98     0     0
0      0      0      0      0          0.02   0.99    0
0      0      0      0      0            0    0.01   1.00

The probability of having seen all possible outcomes after n trials
is the [100,1] component of M^n.  For n=2^k I get the following
results
n           Pr

1        0.000000
2        0.000000
4        0.000000
8        0.000000
16        0.000000
32        0.000000
64        0.000000
128        0.000000
256        0.000137
512        0.556018
1024        0.996647
2048        0.999999
4096        1.000000
8192        1.000000

With this method you can determine the precise value of n for which the
probability first exceeds any specific value.
```

Return to MathPages Main Menu