## Importance of Being Urnest (B Balls, K Colors, N Urns)

Given N urns and B balls of each of K colors, randomly distribute
the KB balls in the N urns such that no two balls of identical color
are placed in the same urn.  What is the probability that at least
one ball of each color will be in an urn by itself?

We know that the total number of distinct ways we can distribute the
balls in the urns is C(N,B)^K where C(n,m) is the binomial coefficient
n!/(m!(n-m)!).  Each of these arrangements is equally likely, so "all"
we need to do is determine how many of these arrangements satisfy the
special condition that at least one ball of each color is in an urn
by itself.  If we denote this number by Q(N,B,K) then we clearly have

Q(N,B,1) = C(N,B)

Q(N,B,2) = C(N,B)^2 - C(N,B)

for all N,B with N >= B.  However, for more than two colors it gets
complicated.  One approach might be to determine a generating
function, but a more simple-minded method of determining the number
of arrangements of K distinct sets of B balls into N urns such that
each color is in at least one urn by itself is to enumerate the
partitions of K(B-1) into N-K or fewer parts no greater than K,
then append K 1's, and then evaluate the combinations of colors.

To illustrate, let's compute the probability for 9 urns with 4 balls
in each of 3 colors.  The partitions of 9 into 6 or fewer parts no
greater than 3, augmented with three 1's, are shown below along with
their repsective enumerations.  (To save some space, let {mn} denote
the binomial coefficient C(m,n) and let [mn\jk] = ((m!n!)/(j!k!)).)

3 3 3        1 1 1  {96}{63} [3]                           10080
3 3 2 1      1 1 1  {92}{71}{64} {32}[4\2]                136080
3 3 1 1 1    1 1 1  {92}{76} [6\222]                       22680
3 2 2 2      1 1 1  {91}{83}{53} [33]                     181440
3 2 2 1 1    1 1 1  {91}{82}{65} (3[5\3]+6[5\22])         362880
3 2 1 1 1 1  1 1 1  {91}{81}{77} 3[7\322]                  45360
2 2 2 2 1    1 1 1  {94}{54} 3[44\22]                     272160
2 2 2 1 1 1  1 1 1  {93}{66} ([36\222]+18[6\32]+3[6\4])   143640
-------
Q(9,4,3)   =   1174320

Thus the probability of distributing 12 balls, 4 in each of 3 colors,
into 9 urns such that each color is by itself in at least one urn
is 1174320/2000376 = 2330/3969 = 0.587049635...

In principle this method can be used to compute any Q(N,B,K), but it
becomes increasingly laborious as the values of N, B, and K increase.
I'd be interested to know if there is a generating function that
would simplify the computation.

A related, but more tractable, problem is to compute the _expected
number_ of urns containing exactly 1 red ball, and exactly 1 green
ball, etc.  This can be computed fairly easily by a recursive method
because at each point you only need to keep track of the number of
empty urns, the number of urns with more than one ball, the number of
urns with exactly 1 red, the number with exactly 1 green, and so on.
The expected number of urns containing exactly one of a given color
is the same for each color, and is equal to

B (N-B)^(K-1)
-------------
N^(K-1)

Suppose we are interested in just approximate results for parameters
in the range of N = 100 to 1000, K = 8 to 32, and we want to determine
the optimum value of B.  For parameters like these it's possible to
come up with a reasonable estimate without too much trouble.  Suppose
N=100, K=8, and B=10.  Begin by distributing the 10 white balls.  Now
what is the probability that after distributing the other seven colors
we will have "covered" every one of the 10 white balls?  This depends
on what percentage of the urns will get hit by one of the remaining
seven colors.

We know that each color covers 10% of the urns (in this case), and
that the coverages are mutually independent, so (in the limit for
large N) we can just take the union, which is 1 - (1 - 0.1)^7.  This
means that about 52 of the urns will be covered by these seven colors
colors.  (We don't cover 70 urns with these 70 balls because they
overlap.)

The total number of arrangements of the original 10 balls and the 52
"covered" urns is C(100,10)*C(100,52).  On the other hand, the number
of these arrangements for which all 10 white balls are in a covered
urn is just C(52,10)*C(100,52), so the probability that all 10 of the
white balls are covered is just  C(52,10)/C(100,10), which is about
0.000913.

Now, the situation is clearly symmetrical in the eight colors, so
each one has probability of 0.000913 of being covered.  Thus the
probability that one or more of the colors gets completely covered
is just the union of these events.  Of course, they are not strictly
independent, but for parameters in the range of interest they are
essentially independent.  Therefore the probability of failure (i.e.,
one or more colors completely covered) is 1 - (1 - 0.000913)^8,
which equals 0.00728.

Needless to say, there are several approximations buried in this
procedure, and their validity would have to be evaluated for each
set of parameters.  But I'd say for the range of parameters mentioned
above this should give pretty good results.  To formulate it
explicitly, note that the "52" above was really rounded off, and
the fractional part was thrown away.  This was done to give a nice
integer argument for the binomial function C(m,n).  However, to avoid
that roundoff, let's use the gamma function G to give our factorials,
so we can define g(x,y) = G(x+1)/(G(y+1)G(x-y+1)).

What I computed above was the probability of failure, whereas
P(N,B,K) was defined as the probability of success, i.e., every
color has at least one urn to itself.  So the above result is
P(100,10,8) = 0.99272.  With this convention the general (and
definitely approximate) formula for P(N,B,K) is

/      g( N(1 - (1 - B/N)^(K-1) , B ) \ K
P(N,B,K)  =  ( 1  -  ------------------------------  )
\                 g(N,B)              /

Actually, the overall exponent "K" should really be K-1 when K is
very small (like 2), because the degrees of freedom require the two
colors to cover each other.  However, for K greater than about 4 the
independence is adequate to justify the full exponent K.

Anyway, this formula gives the following results for N=100 and K=8:

B    1-P(100,B,8)            B    1-P(100,B,8)
---   -----------            ---   -----------
0      1.0000                13      0.00992
1      0.4303                14      0.01154
2      0.1227                15      0.01372
3      0.0485                16      0.01659
4      0.02490               17      0.02033
5      0.01549               18      0.02515
6      0.01114               19      0.03133
7      0.00898               20      0.03921
8      0.00792               25      0.1198
9      0.00751 <--min        30      0.3153
10      0.00755               35      0.6227
11      0.00797               40      0.8830
12      0.00875               45      0.9834

Notice that the above function is a very nice "bathtub curve".