Combining Probabilities

A re-formatted version of this article is here.

Suppose that Mr. Smith, who is correct 75% of the time, claims that a
certain event X will NOT occur.  It would seem on this basis that the
probability of X ocurring is 0.25.  On the other hand, Mr. Jones, who 
is correct 60% of the time, claims that X WILL occur.  Given both of
these predictions, with their respective reliabilities, what is the 
probability that X will occur?

The problem is underspecified.  Essentially the overall context has 3 
parameters, S (Smith's prediction), J (Jones's prediction), R (the real
outcome).  Thus, letting "y" and "n" respectively denote X and not X, 
the eight possibilities along with their probabilities are 

                   S  J  R     probability
                   -  -  -     -----------
                   n  n  n        p0
                   n  n  y        p1
                   n  y  n        p2
                   n  y  y        p3
                   y  n  n        p4
                   y  n  y        p5
                   y  y  n        p6
                   y  y  y        p7

where
          p0 + p1 + p2 + p3 + p4 + p5 + p6 + p7 = 1
                      
Also, since S=R 75% of the time, we have

                p0 + p2 + p5 + p7 = 0.75

and since J=R 60% of the time, we have

                p0 + p3 + p4 + p7 = 0.60

From this we want to determine the probability that R=y given that 
S=n and J=y.  Thus, we need to find the value of p3/(p2+p3), which 
is the probability of [n y y] divided by the probability of [n y *], 
where "*" indicates "either y or n".  Clearly the problem is under- 
specified.

Setting A=p0+p7, B=p1+p6, C=p2+p5, and D=p3+p4, the conditions can 
be written as
                     A+B+C+D = 1.00
                     A  +C   = 0.75
                     A    +D = 0.60

which is three linear equations in four unknowns (with the extra
constraint that each probability is in the interval 0 to 1), so there
are infinitely many solutions.  For example, we can set A=0.5, B=0.25,
C=0.15, and D=0.10, and satisfy all three equations, but we could
also set A=0.6, B=0.25, C=0.15, and D=0.00.  Furthermore, even if
we arbitrarily select one of these solutions, there are still
infinitely many ways of partitioning C and D to give the values 
of p2 and p3.  

For example, suppose we take the solution with C=0.15 and D=0.10. 
We then have  p2+p5 = 0.15  and  p3+p4 = 0.10.  If we take p4=0.10 
and p5=0.00, we have p3=0.00 and p2=0.15, so the probability of
X is 0.  On the other hand, we can equally well take p4=0.00 and 
p5=0.15, which gives p3=0.10 and p2=0.00, so the probability of X
is 1.  Thus, any answer from 0 to 1 is strictly consistent with the 
stated conditions.

Nevertheless, in real life the problems we confront are often (always?)
underspecified, and our customers will likely not be satisfied with 
that as an answer.  Are there any "reasonable" assumptions we could 
make, in the absence of more information, that would enable us to give
a "reasonable" answer?  One approach would be to estimate (guess) how 
much correlation exists between the correctness of J and S.  For 
example, since S seems to be smarter than J, we might assume that S 
is correct whenever J is correct, as well as being correct on some 
experiments when J is incorrect.  This would imply that p3=p4=0 and 
p2>0, so the probability of X is 0.

Another approach would be to assume that the correctness of S and J's
predictions are statistically independent, in the sense that they
are each just as likely to be right regardless of whether the other 
is right or wrong.  This assumption implies

                        p0+p7            p3+p4
              0.60 = ------------- = --------------
                      p0+p2+p5+p7     p1+p3+p4+p6

and
                        p0+p7          p2+p5
              0.75 = ------------ = -------------
                      p0+p3+p4+p7    p1+p2+p5+p6

Letting u=0.6 and v=0.75 denote the probabilities of correctness for
Jones and Smith respectively, these equations together with the 
previous constraints uniquely determine the four sums

             p0+p7 =     uv     = 9/20
             p1+p6 = (1-u)(1-v) = 2/20
             p2+p5 =   (1-u)v   = 6/20
             p3+p4 =   u(1-v)   = 3/20

but this still doesn't uniquely determine the value of p3/(p2+p3).
We need at least one more assumption.  I would suggest that we assume
symmetry between "y" and "n".  In other words, assume that probability
of any combination is equal to the probability of the complementaty
combination, given by changing each "y" to an "n" and vice versa.
This amounts to the assumption that X (the real answer) has an a_priori
probability of 1/2, AND that the probability of predicting correctly 
is the same regardless of whether R is "y" or "n".  On this basis we 
have p0=p7, p1=p6, p2=p5, and p3=p4, so we have p2=6/40, p3=3/40, and 
p3/(p2+p3) = 1/3.

Therefore, assuming S and J are not correlated, and assuming "y" and 
"n" are symmetrical, the probability of X, given that S[75%] says X
will not occur and J[60%] says it will, is 33.3%.  Any (positive) 
correlation between S and J would tend to lower this probability.

Notice that our resulting value for the probability of X is not equal
to the a_priori value of 1/2 that we assumed by imposing symmetry
between "y" and "n".  If we have some a_priori reason to believe the
probability of X is something different than 1/2, we could re-do the
calculation using this value.  (Of course, we cannot use the computed
probability of X for the particular conditions at hand, because the
a_priori probability of X applies to all possible conditions, not just
when Smith says it won't occur and Jones says it will.)  To account
for this additional information (if we have it), we can let x denote
the a_priori probability of X, and then write the individual state 
probabilities as

         p0 = uv(1-x)          p7 = uvx
         p1 = (1-u)(1-v)x      p6 = (1-u)(1-v)(1-x)
         p2 = (1-u)v(1-x)      p5 = (1-u)vx
         p3 = u(1-v)x          p4 = u(1-v)(1-x)

On this basis the probability of X is

                    p3               u(1-v)x
        Pr{X}  =  -------  =  ---------------------
                  p2 + p3     (1-u)v(1-x) + u(1-v)x

Naturally if we have no knowledge of the a_priori probability of X,
we just assume x=1/2, and this formula reduces to the one given
previously.

For a slightly more complicated case, suppose Mr. Red's ability to 
correctly identify the outcome of a TRUE/FALSE experiment is 75%, Mr. 
Green's is 60% and Mr. Blue's is 55%.  If Mr. Blue, Mr. Green, and Mr.
Red all agree that the outcome of the experiment is TRUE, is the 
resulting probability of "TRUE" 75% or is it weighted somewhere 
between 75% and 55% ?

Again this is underspecified, but if we impose the assumptions of 
(1) pairwise independence and (2) "y"/"n" symmetry, then in the general 
case of N prognosticators these two assumptions are sufficient to
uniquely determine the answer.  In other words, if N people with 
reliabilities r1, r2, ..., rN have each predicted the outcome will 
be 'TRUE', and if we assume the correctness of their predictions have
no correlation, and that there is symmetry between TRUE and FALSE 
outcomes, then the probability of a "TRUE" outcome is

                               (r1)(r2)...(rN)
          Pr{TRUE} = ---------------------------------------
                     (r1)(r2)...(rN) + (1-r1)(1-r2)...(1-rN)

Thus, in the particular example described above with r1=3/4, r2=3/5, 
and r3=11/20, the probability of "TRUE" is 11/13 (i.e., about 84.6%).

Let Q=[q1,q2,...,qN] denote a logical vector (i.e., each component qj
is either "TRUE" or "FALSE") and let Q' denote the complement of Q.
Also, define 
                              /  1-r    if q=FALSE
                   f(r,q) =  (
                              \   r     if q=TRUE

and let F(Q) denote the product of f(ri,qi), i=1 to N.  Then the
probability that the outcome will be TRUE given the predictions Q is
given by
                               F(Q)
                Pr{TRUE} = -------------
                            F(Q) + F(Q')

Incidentally, a more perspicacious (but equivalent) way of expressing 
these relations, letting qj = +1 or -1 accordingly as the jth prognosticator
predicts TRUE or FALSE, is

                 Pr{TRUE}       N  /   r1   \qj
               ------------ = PROD( -------- )
               1 - Pr{TRUE}    j=1 \ 1 - r1 /

These results are formally correct, given the stated assumptions, but 
as discussed earlier, the most important thing to realize about these 
problems is that they are underspecified and have no definite answer.
For example, if the a_priori probability of the outcome "TRUE" is
known to be x, then the above formula becomes

                                 x F(Q)
              Pr{TRUE}  =  --------------------
                           x F(Q) + (1-x) F(Q')

Given various sets of assumptions, all of which satisfy the stated
conditions of the problem, the correct probability can have any
value from 0.0 to 1.0.

The formula P = F(Q)/(F(Q)+F(Q')) is valid ONLY for one specific
set of assumptions, and those assumptions are not particularly
realistic.  It assumes that the correctness of Smith's predictions 
is totally uncorrelated with the correctness of Jones's predictions,
which would almost certainly NOT be the case in any realistic
situation.  (It's much more likely that Jones and Smith use at least 
some of the same criteria for making their predictions).  

To really answer the original question we would need to supply more
information, specifically, the probabilities of each of the eight
possible combinations of predictions and outcomes, as discussed
previously.

For another example, suppose the Yankees and the Red Sox are playing,
and the Red Sox have won 70% of their games, and the Yankees have won 
50% of their games.  What is the probability that the Yankees will
win?  Again the context is clearly underspecified, because the 
conditions of the question can be met by many different contexts, 
leading to many different outcome distributions.  However, if we 
need to assign a probability based on this information alone, it's 
clear that our answer must assume the probability of Y beating R is 
some function of y and w (the fraction of games wone by Y and R 
respectively).  Thus we need a function F(y,r) such that

                 Pr{Y beats R}  =  F(y,r)

It follows that

                   F(y,r) + F(r,y) = 1

and 0 <= F(x,y) <= 1  for any x,y in [0,1].  One class of functions
that satisfies this requirement is

                           f(y)
               F(y,r) = -----------
                        f(y) + f(r)

where f is any mapping from [0,1] to [0,+inf].  For example, suppose
y=0.5 and r=0.7.  Taking  f(x) = x  this gives Y a 41.7% chance of
winning and R a 58.3% chance of winning.  More generally, if we set
f(x) = x^k  and reduce the exponent k so it approaches 0, the
probabilities approach 50/50, whereas with k greater than 1 the
probability of Y winning goes to zero.

What is the "best" or optimal choice for f(x)?  We might assume each
team has a "skill level", and this level is distributed binomially.
Then, given the percentage of games won by a certain team we could 
infer the skill level by integration over the whole population, 
assuming that each team plays every other team the same number of 
times, and assuming Pr{i beats j} = si/(si+sj).

Another approach that is sometimes suggested is to use the expression
(Y)(R)/((Y)(R) + (Y')(R')), where Y' and R' are the conjugates of Y 
and R. The two possible outcomes are Ywins-Rloses, and Yloses-Rwins. 
To find the probability (only from w/l record) of R winning we would 
then have

        (Rwin)(Ylose)                     (.5)(.3)
  -----------------------------   =  -------------------  =  0.3
  (Rwin)(Ylose) + (Ywin)(Rlose)      (.5)(.3) + (.5)(.7)


This formula has a certain aesthetic appeal, but it also has some
possibly counter-intuitive consequences.  For example, suppose the two
best teams in the league, X and Y, win x=99% and y=97% of their games,
respectively.  We might expect these two teams to be fairly evenly
matched, which would be consistent with the formula 

                                   x
                Pr{X beats Y} = -------  = 0.5051
                                 x + y

In contrast, the alternative formula gives

                                 x(1-y)              
           Pr{X beats Y}  =  ---------------  =  0.7538
                             x(1-y) + (1-x)y

It isn't obvious to me that a 99% team should be this heavily favored
over a 97% team.  If this really was the applicable formula, then 
the presence of a 99% team in the league would almost preclude the
existence of a 97% team, depending on how many teams are in the 
league and how often these teams play each other.

One possible objection to the simple weighting function 

                          f(y)
                      -----------
                      f(y) + f(r)

with f(x)=x is that it seems the "system" will tend towards equilibrium.
For systems of more than two teams, the teams will always come to 
equalibrium regardless of the initial conditions. In other words, 
each team would converge to the same win/loss record.  On the other 
hand, a team with a winning percentage of .800 will have ample 
opportunity to sustain their winning ways by using the latter 
expression 
                         y(1-x)
                    ---------------
                    y(1-x) + (1-y)x

as a model.  It's a good idea to impose the overall equilibrium 
requirement on the whole population when deriving a model.  Of course, 
the second model is really a special case of the "simple weighted"
model.  In other words, we have

                                  y(1-x)
            Pr{Y beats X}  =  ---------------
                              y(1-x) + (1-y)x

and dividing the numerator and denominator by (1-x)(1-y) gives the
equivalent form
                           y/(1-y)               f(y)
     Pr{Y beats X}  =  ----------------   =  -----------
                       y/(1-y) + x/(1-x)      f(y) + f(x)

where f(z) = z/(1-z).  This particular function f(z) is not unique
in giving a self consistent population.

A more fundamental approach would be to model the underlying process.
For example, suppose there are 256 ranked players in world, with skill
levels ranging from 1 to 9 distributed binomially as follows

                    skill  number of
                    level  players
                      1      1
                      2      8
                      3     28
                      4     56
                      5     70
                      6     56
                      7     28
                      8      8
                      9      1

Of course, "skill" might be a matrix rather than a scalar, and you
could get into all sorts of interesting interactions (scissors cuts
paper, paper wraps stone, stone breaks scissors, etc), but let's
just assume that "skill" in this game can be modelled by a simple
scalar.  

Now we must also specify to what extent skill determines the outcome
of a contest.  If the game's outcome is largly determined by chance,
then the world's most skillful player may only beat the least
skillfull player 60% of the time.  One way of modelling this is 
to say that the probability of player P_m beating player P_n is

                                     (s_m)^k
         Pr{ m beats n }  =   --------------------
                               (s_m)^k  +  (s_n)^k

where s_j is the skill of player P_j and the constant k determines 
the importance of skill in this game.  As k goes to 0 all the
probabilities go to 0.5, meaning that the outcome of a game is 
only weakly determined by skill.  If k is very large, then the more
skillful player will almost always win.

Now we have a simple but complete model for which we can compute the
long-term win/loss records of each skill level.  In general, for a
league of 2^N players with binomially distributed skill levels and
assuming a skill factor of k (and every player plays every other
player equally often), the "winning percentage" of a player with 
skill level q is 
                    _                           _
                   |         N       C(N,j)      |    1
                   |  q^k  SUM   -------------   | - ---
                   |_       j=0  q^k + (j+1)^k  _|    2
         Win(q) =  -------------------------------------
                                  2^N - 1

where C(N,j) is the binomial coefficient N!/((n-j)! j!).  Taking N=8
and k=2, the winning percentages for each of the 9 skill levels are as
shown below:
                    skill  number of  winning
                    level   players   percentage
                      1        1       4.9393
                      2        8      16.4711
                      3       28      29.5133
                      4       56      41.5554
                      5       70      51.7592
                      6       56      60.0785
                      7       28      66.7554
                      8        8      72.0936
                      9        1      76.3722

Of course, the weighted average of all these winning percentages 
is 50%.  Also, since Win(q) is invertible, it follows that for any
system of this general type the formula for predicting winners can be
expressed in the form f(x)/(f(x)+f(y)).
Return to MathPages Main Menu