A re-formatted version of this article is here. Suppose that Mr. Smith, who is correct 75% of the time, claims that a certain event X will NOT occur. It would seem on this basis that the probability of X ocurring is 0.25. On the other hand, Mr. Jones, who is correct 60% of the time, claims that X WILL occur. Given both of these predictions, with their respective reliabilities, what is the probability that X will occur? The problem is underspecified. Essentially the overall context has 3 parameters, S (Smith's prediction), J (Jones's prediction), R (the real outcome). Thus, letting "y" and "n" respectively denote X and not X, the eight possibilities along with their probabilities are S J R probability - - - ----------- n n n p0 n n y p1 n y n p2 n y y p3 y n n p4 y n y p5 y y n p6 y y y p7 where p0 + p1 + p2 + p3 + p4 + p5 + p6 + p7 = 1 Also, since S=R 75% of the time, we have p0 + p2 + p5 + p7 = 0.75 and since J=R 60% of the time, we have p0 + p3 + p4 + p7 = 0.60 From this we want to determine the probability that R=y given that S=n and J=y. Thus, we need to find the value of p3/(p2+p3), which is the probability of [n y y] divided by the probability of [n y *], where "*" indicates "either y or n". Clearly the problem is under- specified. Setting A=p0+p7, B=p1+p6, C=p2+p5, and D=p3+p4, the conditions can be written as A+B+C+D = 1.00 A +C = 0.75 A +D = 0.60 which is three linear equations in four unknowns (with the extra constraint that each probability is in the interval 0 to 1), so there are infinitely many solutions. For example, we can set A=0.5, B=0.25, C=0.15, and D=0.10, and satisfy all three equations, but we could also set A=0.6, B=0.25, C=0.15, and D=0.00. Furthermore, even if we arbitrarily select one of these solutions, there are still infinitely many ways of partitioning C and D to give the values of p2 and p3. For example, suppose we take the solution with C=0.15 and D=0.10. We then have p2+p5 = 0.15 and p3+p4 = 0.10. If we take p4=0.10 and p5=0.00, we have p3=0.00 and p2=0.15, so the probability of X is 0. On the other hand, we can equally well take p4=0.00 and p5=0.15, which gives p3=0.10 and p2=0.00, so the probability of X is 1. Thus, any answer from 0 to 1 is strictly consistent with the stated conditions. Nevertheless, in real life the problems we confront are often (always?) underspecified, and our customers will likely not be satisfied with that as an answer. Are there any "reasonable" assumptions we could make, in the absence of more information, that would enable us to give a "reasonable" answer? One approach would be to estimate (guess) how much correlation exists between the correctness of J and S. For example, since S seems to be smarter than J, we might assume that S is correct whenever J is correct, as well as being correct on some experiments when J is incorrect. This would imply that p3=p4=0 and p2>0, so the probability of X is 0. Another approach would be to assume that the correctness of S and J's predictions are statistically independent, in the sense that they are each just as likely to be right regardless of whether the other is right or wrong. This assumption implies p0+p7 p3+p4 0.60 = ------------- = -------------- p0+p2+p5+p7 p1+p3+p4+p6 and p0+p7 p2+p5 0.75 = ------------ = ------------- p0+p3+p4+p7 p1+p2+p5+p6 Letting u=0.6 and v=0.75 denote the probabilities of correctness for Jones and Smith respectively, these equations together with the previous constraints uniquely determine the four sums p0+p7 = uv = 9/20 p1+p6 = (1-u)(1-v) = 2/20 p2+p5 = (1-u)v = 6/20 p3+p4 = u(1-v) = 3/20 but this still doesn't uniquely determine the value of p3/(p2+p3). We need at least one more assumption. I would suggest that we assume symmetry between "y" and "n". In other words, assume that probability of any combination is equal to the probability of the complementaty combination, given by changing each "y" to an "n" and vice versa. This amounts to the assumption that X (the real answer) has an a_priori probability of 1/2, AND that the probability of predicting correctly is the same regardless of whether R is "y" or "n". On this basis we have p0=p7, p1=p6, p2=p5, and p3=p4, so we have p2=6/40, p3=3/40, and p3/(p2+p3) = 1/3. Therefore, assuming S and J are not correlated, and assuming "y" and "n" are symmetrical, the probability of X, given that S[75%] says X will not occur and J[60%] says it will, is 33.3%. Any (positive) correlation between S and J would tend to lower this probability. Notice that our resulting value for the probability of X is not equal to the a_priori value of 1/2 that we assumed by imposing symmetry between "y" and "n". If we have some a_priori reason to believe the probability of X is something different than 1/2, we could re-do the calculation using this value. (Of course, we cannot use the computed probability of X for the particular conditions at hand, because the a_priori probability of X applies to all possible conditions, not just when Smith says it won't occur and Jones says it will.) To account for this additional information (if we have it), we can let x denote the a_priori probability of X, and then write the individual state probabilities as p0 = uv(1-x) p7 = uvx p1 = (1-u)(1-v)x p6 = (1-u)(1-v)(1-x) p2 = (1-u)v(1-x) p5 = (1-u)vx p3 = u(1-v)x p4 = u(1-v)(1-x) On this basis the probability of X is p3 u(1-v)x Pr{X} = ------- = --------------------- p2 + p3 (1-u)v(1-x) + u(1-v)x Naturally if we have no knowledge of the a_priori probability of X, we just assume x=1/2, and this formula reduces to the one given previously. For a slightly more complicated case, suppose Mr. Red's ability to correctly identify the outcome of a TRUE/FALSE experiment is 75%, Mr. Green's is 60% and Mr. Blue's is 55%. If Mr. Blue, Mr. Green, and Mr. Red all agree that the outcome of the experiment is TRUE, is the resulting probability of "TRUE" 75% or is it weighted somewhere between 75% and 55% ? Again this is underspecified, but if we impose the assumptions of (1) pairwise independence and (2) "y"/"n" symmetry, then in the general case of N prognosticators these two assumptions are sufficient to uniquely determine the answer. In other words, if N people with reliabilities r1, r2, ..., rN have each predicted the outcome will be 'TRUE', and if we assume the correctness of their predictions have no correlation, and that there is symmetry between TRUE and FALSE outcomes, then the probability of a "TRUE" outcome is (r1)(r2)...(rN) Pr{TRUE} = --------------------------------------- (r1)(r2)...(rN) + (1-r1)(1-r2)...(1-rN) Thus, in the particular example described above with r1=3/4, r2=3/5, and r3=11/20, the probability of "TRUE" is 11/13 (i.e., about 84.6%). Let Q=[q1,q2,...,qN] denote a logical vector (i.e., each component qj is either "TRUE" or "FALSE") and let Q' denote the complement of Q. Also, define / 1-r if q=FALSE f(r,q) = ( \ r if q=TRUE and let F(Q) denote the product of f(ri,qi), i=1 to N. Then the probability that the outcome will be TRUE given the predictions Q is given by F(Q) Pr{TRUE} = ------------- F(Q) + F(Q') Incidentally, a more perspicacious (but equivalent) way of expressing these relations, letting qj = +1 or -1 accordingly as the jth prognosticator predicts TRUE or FALSE, is Pr{TRUE} N / r1 \qj ------------ = PROD( -------- ) 1 - Pr{TRUE} j=1 \ 1 - r1 / These results are formally correct, given the stated assumptions, but as discussed earlier, the most important thing to realize about these problems is that they are underspecified and have no definite answer. For example, if the a_priori probability of the outcome "TRUE" is known to be x, then the above formula becomes x F(Q) Pr{TRUE} = -------------------- x F(Q) + (1-x) F(Q') Given various sets of assumptions, all of which satisfy the stated conditions of the problem, the correct probability can have any value from 0.0 to 1.0. The formula P = F(Q)/(F(Q)+F(Q')) is valid ONLY for one specific set of assumptions, and those assumptions are not particularly realistic. It assumes that the correctness of Smith's predictions is totally uncorrelated with the correctness of Jones's predictions, which would almost certainly NOT be the case in any realistic situation. (It's much more likely that Jones and Smith use at least some of the same criteria for making their predictions). To really answer the original question we would need to supply more information, specifically, the probabilities of each of the eight possible combinations of predictions and outcomes, as discussed previously. For another example, suppose the Yankees and the Red Sox are playing, and the Red Sox have won 70% of their games, and the Yankees have won 50% of their games. What is the probability that the Yankees will win? Again the context is clearly underspecified, because the conditions of the question can be met by many different contexts, leading to many different outcome distributions. However, if we need to assign a probability based on this information alone, it's clear that our answer must assume the probability of Y beating R is some function of y and w (the fraction of games wone by Y and R respectively). Thus we need a function F(y,r) such that Pr{Y beats R} = F(y,r) It follows that F(y,r) + F(r,y) = 1 and 0 <= F(x,y) <= 1 for any x,y in [0,1]. One class of functions that satisfies this requirement is f(y) F(y,r) = ----------- f(y) + f(r) where f is any mapping from [0,1] to [0,+inf]. For example, suppose y=0.5 and r=0.7. Taking f(x) = x this gives Y a 41.7% chance of winning and R a 58.3% chance of winning. More generally, if we set f(x) = x^k and reduce the exponent k so it approaches 0, the probabilities approach 50/50, whereas with k greater than 1 the probability of Y winning goes to zero. What is the "best" or optimal choice for f(x)? We might assume each team has a "skill level", and this level is distributed binomially. Then, given the percentage of games won by a certain team we could infer the skill level by integration over the whole population, assuming that each team plays every other team the same number of times, and assuming Pr{i beats j} = si/(si+sj). Another approach that is sometimes suggested is to use the expression (Y)(R)/((Y)(R) + (Y')(R')), where Y' and R' are the conjugates of Y and R. The two possible outcomes are Ywins-Rloses, and Yloses-Rwins. To find the probability (only from w/l record) of R winning we would then have (Rwin)(Ylose) (.5)(.3) ----------------------------- = ------------------- = 0.3 (Rwin)(Ylose) + (Ywin)(Rlose) (.5)(.3) + (.5)(.7) This formula has a certain aesthetic appeal, but it also has some possibly counter-intuitive consequences. For example, suppose the two best teams in the league, X and Y, win x=99% and y=97% of their games, respectively. We might expect these two teams to be fairly evenly matched, which would be consistent with the formula x Pr{X beats Y} = ------- = 0.5051 x + y In contrast, the alternative formula gives x(1-y) Pr{X beats Y} = --------------- = 0.7538 x(1-y) + (1-x)y It isn't obvious to me that a 99% team should be this heavily favored over a 97% team. If this really was the applicable formula, then the presence of a 99% team in the league would almost preclude the existence of a 97% team, depending on how many teams are in the league and how often these teams play each other. One possible objection to the simple weighting function f(y) ----------- f(y) + f(r) with f(x)=x is that it seems the "system" will tend towards equilibrium. For systems of more than two teams, the teams will always come to equalibrium regardless of the initial conditions. In other words, each team would converge to the same win/loss record. On the other hand, a team with a winning percentage of .800 will have ample opportunity to sustain their winning ways by using the latter expression y(1-x) --------------- y(1-x) + (1-y)x as a model. It's a good idea to impose the overall equilibrium requirement on the whole population when deriving a model. Of course, the second model is really a special case of the "simple weighted" model. In other words, we have y(1-x) Pr{Y beats X} = --------------- y(1-x) + (1-y)x and dividing the numerator and denominator by (1-x)(1-y) gives the equivalent form y/(1-y) f(y) Pr{Y beats X} = ---------------- = ----------- y/(1-y) + x/(1-x) f(y) + f(x) where f(z) = z/(1-z). This particular function f(z) is not unique in giving a self consistent population. A more fundamental approach would be to model the underlying process. For example, suppose there are 256 ranked players in world, with skill levels ranging from 1 to 9 distributed binomially as follows skill number of level players 1 1 2 8 3 28 4 56 5 70 6 56 7 28 8 8 9 1 Of course, "skill" might be a matrix rather than a scalar, and you could get into all sorts of interesting interactions (scissors cuts paper, paper wraps stone, stone breaks scissors, etc), but let's just assume that "skill" in this game can be modelled by a simple scalar. Now we must also specify to what extent skill determines the outcome of a contest. If the game's outcome is largly determined by chance, then the world's most skillful player may only beat the least skillfull player 60% of the time. One way of modelling this is to say that the probability of player P_m beating player P_n is (s_m)^k Pr{ m beats n } = -------------------- (s_m)^k + (s_n)^k where s_j is the skill of player P_j and the constant k determines the importance of skill in this game. As k goes to 0 all the probabilities go to 0.5, meaning that the outcome of a game is only weakly determined by skill. If k is very large, then the more skillful player will almost always win. Now we have a simple but complete model for which we can compute the long-term win/loss records of each skill level. In general, for a league of 2^N players with binomially distributed skill levels and assuming a skill factor of k (and every player plays every other player equally often), the "winning percentage" of a player with skill level q is _ _ | N C(N,j) | 1 | q^k SUM ------------- | - --- |_ j=0 q^k + (j+1)^k _| 2 Win(q) = ------------------------------------- 2^N - 1 where C(N,j) is the binomial coefficient N!/((n-j)! j!). Taking N=8 and k=2, the winning percentages for each of the 9 skill levels are as shown below: skill number of winning level players percentage 1 1 4.9393 2 8 16.4711 3 28 29.5133 4 56 41.5554 5 70 51.7592 6 56 60.0785 7 28 66.7554 8 8 72.0936 9 1 76.3722 Of course, the weighted average of all these winning percentages is 50%. Also, since Win(q) is invertible, it follows that for any system of this general type the formula for predicting winners can be expressed in the form f(x)/(f(x)+f(y)).

Return to MathPages Main Menu