## And Then There Were None

```Suppose we have n objects, each with a 1/m chance of disappearing in
any given second. What is the mean of the number of seconds until
all of them are gone?

This may seem like a well-posed question, but it's actually ambiguous,
because it doesn't clearly specify whether "1/m per second" is to be
applied continuously as a constant rate, discontinuously on a sequence
of discrete 1-second intervals, or as any of an infinite number of
other distributions yielding a 1/m chance of disappearing in any given
second (not to mention the ambiguity in the conditional, i.e., an
object could have a uniform probability of exactly 1/m during each
of m intervals).  If we're looking for an exact answer, the precise
interpretation of the "rate" is important.

The continuous constant-rate interpretation is easiest to solve,
and can be viewed as a linear Markov chain with n+1 states, where
the initial state has all n objects present, the next state has
n-1 objects present, and so on.  The transition from the state
with k objects to the next state has a constant exponential rate
of k(1/m), and therefore a mean transition time of m/k.  The mean
transition time of the initial state with k=n down to the empty
state with k=0 is just the sum of the individual mean transition
times, which is

m (1 + 1/2 + 1/3 + 1/4 + ... + 1/n)

The purely discrete interpretation leads to a somewhat more complicated
problem (as usual).  Each individual object has a probability of 1.0
of being present at the beginning of the first 1-second interval, and
a probability of (m-1)/m of being present at the end of the first
interval.  In general, each object has a probability of [(m-1)/m]^k
of being present at the end of the kth interval, and so it's
probability of NOT being present at the end of the kth interval
is simply the complement of this value, i.e., 1 - [(m-1)/m]^k.

It follows that the probability of ALL n objects being gone at the end
of the kth interval is the product of their individual probabilities of
being gone, so it is given by {1 - [(m-1)/m]^k}^n.  Of course, this is
the cumulative probability, which approaches 1.0 as k increases, but
we need to know the probability of the system entering this "empty
state" during the kth second.  Thus we need to subtract the probability
of being in the empty state at the end of the (k-1)th state.  This
gives us the probability of entering the "empty state" during the kth
discrete 1-second interval:

P{k} =  {1 - [(m-1)/m]^k}^n  -  {1 - [(m-1)/m]^(k-1)}^n

The mean number of intervals required to reach the empty state is
then given by the weighted sum

inf
k_mean  =  SUM  k P{k}
k=1

For small values of n we can just algebraically expand the expression
for P{k} into a sum of powers of (m-1)/m, and then determine the
closed-form expression for the infinite sum of powers.  However, for
large values of n this becomes tedious, and it may be best to simply
compute the sum directly - or approximate it by the continuous
exponential solution given above.

Here's a little table of k_mean for n objects with 1/m chance of
disappearing during each discrete 1-second interval:

m
---------------------------------------------
n      1       2        3          4           5
---   -----  -------  -------  ----------  ----------
1      1       2        3          4           5
2      1      8/3     21/5       40/7        65/9
3      1     22/7    477/95    1780/259    1595/183
4      1   368/105  6963/1235 50128/6475 221405/22509

The difference between the continuous and discrete interpretations
can be seen by comparing the answers for 4 objects that each have
a probability of 1/5 of disappearing during each second (on the
condition that they haven't already disappeared).  The solution
for the discrete interpretation is

k_mean = 221405/22509 = 9.836287 sec

whereas the solution for the continuous-rate interpretation is

k_mean = 5 [ 1 + 1/2 + 1/3 + 1/4 ] =  125/12  = 10.416666 sec

so they differ by 0.58 seconds, nearly 6%.

Incidentally, we can simplify the formula

inf
k_mean  =  SUM   k [ (1 - r^k)^n  - (1 - r^(k-1))^n ]
k=1

where r = (m-1)/m, by noting that the first few terms of the
summation are

1(1 - r^1)^n  -  1(1 - r^0)^n

+ 2(1 - r^2)^n  -  2(1 - r^1)^n

+ 3(1 - r^3)^n  -  3(1 - r^2)^n

+         etc.

Taking advantage of the cancellation on consecutive terms, we can
express the sum of the first t terms as

t-1
t(1 - r^t)^n  -  SUM  (1 - r^k)^n
k=1

Adding and subtracting t-1, this can be written as

t-1
t(1 - r^t)^n  - (t-1) +  SUM  (1 - (1 - r^k)^n)
k=1

In the limit as t goes to infinity the two leading terms go to
1 (because r^t goes to zero), so we are left with the result

inf
k_mean  =  1  +  SUM  (1 - (1 - r^k)^n)
k=1

Noting that the summand would be 1 with k=0, we can bring the
1 inside the summation and write

inf
k_mean  =    SUM  (1 - (1 - r^k)^n)
k=0

This is confirmed by considering the simple case when n=1, for
which we know the mean survival time is just m.  In this case
the summand in the preceeding formula reduces to r^k, so by
the geometric series we have the sum 1/(1-r) = m.

In the same way we can easily derive simple closed-form expressions
for small values of n.  For example, if n=2 the summand reduces
to 2r^k - (r^2)^k, and so the infinite sum is 2/(1-r) - 1/(1-r^2),
which equals m(3m-2)/(2m-1).  In general, for any fixed n, we have
the closed-form expression

C(n,1)       C(n,2)                 C(n,n)
k_mean[n]  =   --------  -  -------- + ... -(-1)^n --------
1 - r^1      1 - r^2                 1 - r^n

where C(i,j) is the binomial coefficient.  For example, with n=5 we
have
5       10        10         5         1
k_mean[5]  =  ----- - ------- + ------- - ------- + -------
1 - r   1 - r^2   1 - r^3   1 - r^4   1 - r^5

In terms of m this can be written in the form

k_mean[5] =

/    10m      10m^2          5m^3               m^4            \
m( 5 - ---- + --------- - -------------- + --------------------   )
\    2m-1   3m^2-3m+1   4m^3-6m^2+4m-1   5m^4-10m^3+10m^2-5m+1 /

It's interesting that for sufficiently large m (meaning a LOW rate
of disappearance) we can neglect all but the leading term in each
denominator, and so the expression reduces to

/     10     10     5     1  \
m( 5 - ---- + ---- - --- + ---  )
\      2      3     4     5  /

which, not surprisingly, is exactly equal to to solution for the
case of a constant exponential rate

/     1     1     1     1  \
m( 1 + --- + --- + --- + ---  )
\     2     3     4     5  /

Thus we're led to the nice little combinatorial identity

n   1          n         -C(n,k)
SUM  ---   =   SUM  (-1)^k --------
k=1  k         k=1           k

```