## Balls In Bins

```Suppose we have N bins and we randomly throw balls into them until
exactly m bins contain at least two balls.  What is the expected
number of balls required?  This is a special case of the generalized
Birthday Problem, which asks for the expected number of duplicate
birthdays (doubles, triples, etc) among k randomly selected people
(with N=365).  An exhaustive treatment of this general problem requires
us to explicitly evaluate the probability of each individual partition
of k, which becomes quite laborious for large k.  However, if we seek
only to know simple agregate expectations, such as the number of
doubles, or the number of triples, then we can use a more efficient
approach.

Given N bins, let E(m,k) denote the expected number of bins containing
exactly m balls after k have been thrown.  We have the initial
conditions E(0,0)=365 and E(m,0)=0 for all m>0.  The first toss will
necessarily land in one of the unoccupied bins (because they are all
unoccupied), so this decreases the expected number of empty bins by
1, and increases the number of bins containing exactly one ball by 1.

In general, on the kth toss, the probability of landing in a bin that
already contains m-1 balls is E(m-1,k-1)/N, and this increases by 1
the expected number of bins containing m balls.  On the other hand,
the kth toss also has a probability of E(m,k-1)/N of landing in a bin
that already contains m balls, and this event will DECREASE by 1 the
number of bins containing exactly m balls.  Therefore, after the kth
toss, the expected number of bins containing exactly m balls is

/ 1 \               / 1 \
E(m,k)  =  E(m,k-1) + ( --- )E(m-1,k-1) - ( --- )E(m,k-1)      (1)
\ N /               \ N /

with the understanding that E(-1,k) is 0 for all k.  This expression
is homogeneous, so each E has a fixed proportion to E(0,0)=365.  Thus
we could define the normalized function f(m,k) = E(m,k)/N, so that
f(0,0)=1 and in general f(m,k) signifies the expected fraction of the
bins that contain exactly m balls after k tosses.  The same recurrence
relation applies, and we can collect terms to write this in the form

f(m,k) = A f(m,k-1) + B f(m-1,k-1)

where
/     1 \               / 1 \
A = ( 1 - --- )        B =  ( --- )
\     N /               \ N /

The first few rows of this function are shown below

k      0       1          2          3         4      5
---   -----   ------   ---------  ---------  ------  -----
0      1       0          0         0          0      0
1      A       B          0         0          0      0
2     A^2     2BA        B^2        0          0      0
3     A^3     3BA^2    3B^2 A      B^3         0      0
4     A^4     4BA^3    6B^2 A^2   4B^3 A      B^4     0
5     A^5     5BA^4   10B^2 A^3  10B^3 A^2  5B^4 A   B^5

It's easy to see that the general term is

k-m   m
f(m,k)  =  C(k,m)  A     B

where C(k,m) is the binomial coefficient (k choose m).  Hence the
expected number of bins containing exactly m balls after k tosses
is
k!      /     1 \ k-m  / 1 \ m
E(m,k)  =  N --------- ( 1 - --- )    ( --- )
(k-m)! m!  \     N /      \ N /

Incidentally, equation (1) also suggests a continuous model, in which
the rate of change of each variable is a linear function of the
variables.  Specifically, in the continuous limit, equation (1) goes
to the system of equations

dx0/dk = (0-x0)/N
dx1/dk = (x0-x1)/N
dx2/dk = (x1-x2)/N
dx3/dk = (x2-x3)/N
dx4/dk = (x3-x4)/N
etc.

where xj denotes the fraction of bins containing exactly j balls,
and k is a continuous variable representing the number of tosses.
Solving this system gives the exact analytical solution for each of
the xj functions.  Letting t denote the variable k/N, we have

1   j   -t
xj(t) = --- t   e
j!

For example, the expected fraction of bins containing exactly 2 balls
(in the continuous limit) is

1   / k \ 2   -(k/N)
x2(k) = --- ( --- )   e
2   \ N /

By comparison, the exact discrete solution gives

k(k-1)  /     1 \ k-2  / 1 \ 2
f(k,2)  =   ------ ( 1 - --- )    ( --- )
2     \     N /      \ N /

Re-arranging terms, this can be written in the form
_             _
1  / k \2  /k-1\ |  /     1 \ N  |(k-2)/N
f(k,2)  = ---( --- ) ( --- )| ( 1 - --- )   |
2  \ N /   \ k / |_ \     N /   _|

The quantity in the square brackets approaches 1/e as N increases,
and the quantity (k-1)/k approaches 1 as k increases, and the
exponent (k-2)/N approaches k/N, so this confirms that x2(k) is
the continuous limit of f(k,2).

Also notice that the sum of the expectations is

-t /        t^2   t^3      \
x0(t) + x1(t) + x2(t) + ...  =  e  ( 1 + t + --- + --- + ... )
\         2!    3!      /

-t   t
=  e    e   =   1

as required.

To illustrate, suppose we have N=365 bins, and k=80 tosses.  What is
the expected number of bins containing two or more balls?  Using the
exact discrete formula, we have

E(80,0) = 293.0720...
E(80,1) =  64.4114...
E(80,2) =   6.9897...
E(80,3) =   0.4992...
E(80,4) =   0.0264...
E(80,5) =   0.0011...

Thus the expected number of bins with two or more balls is

6.9897 + 0.4992 + 0.0264 + 0.0011 + ...  =  7.51...

Notice that, as required, we have the identities

E(80,0) + E(80,1) + E(80,2) + ...  =  365

0*E(80,0) + 1*E(80,1) + 2*E(80,2) + ...  =  80
```