Probabilities with Variable Failure Rates


Success is just failure that hasn't happened yet.

                                             Catrell Sprewell


The probability density function δ(t) for the occurrence of a specific event at the time t is defined as δ(t) = dP(t)/dt where P(t) is the cumulative probability, i.e., the probability that the event has occurred by the time t. Letting tE denote the time when event E occurs, the probability that the event will occur during the time interval from t1 to t2 is given by the integral



The density function for a specific event generally decreases to zero as time passes, because the event will almost certainly have already occurred after a sufficient amount of time. However, the density function for the event to occur given that it has not already occurred may be fairly constant. For example, if we repeatedly toss a six-sided die, the probability of the first occurrence of ‘4’ is 1/6 for the first toss, but it drops for subsequent tosses, because the first occurrence is likely to have already occurred. Thus it is a priori extremely unlikely that the first ‘4’ will occur on (say) the 20th toss. However, given that the first ‘4’ has not already occurred, the probability of the first ‘4’ occurring on the 20th toss (or any toss) is always 1/6.


This leads us to define the rate of occurrence, denoted by λ(t) for time t, as the probability density in the next increment of time after time t, given that the event has not already occurred by the time t. Thus letting dt denote the increment of time beginning at time t, and stipulating that P(0) = 0, we can express the definitions of the density function δ(t) and the rate function λ(t) as follows



where the vertical line signifies “given”. Here the expression tE Ď t signifies that the event E has not occurred by the time t. Recall that a basic property of probability is that, for any two events X and Y, we have



where P{X|Y} signifies the conditional probability of X given Y. Making use of this identity, the expression for λ(t) can be written as



The probability that event E has not occurred by the time t is simply 1 – P(t), so this shows that the rate of occurrence is related to the probability density function by



In other words, the rate is simply the probability density normalized by the probability that the event has not yet occurred. This is often a useful parameter because, as mentioned above, many naturally-occurring events have a constant (or nearly constant) rate. Remembering that δ(t) = dP/dt, the relation can be written as a linear first-order differential equation



If λ(t) is constant, this has the homogeneous solution P(t) = Ae−λt for some constant A, and the particular solution P(t) = 1. The general solution is the sum of these two solutions. Setting A = −[1 − P(0)] to match the initial value, we have



If P(0) = 0 this reduces to the familiar formula P(t) = 1 − e−λt for the probability based on the exponential density function δ(t) = λe−λt. The above equation can also be written in the form



If the rate function λ(t) is not constant, consider a sequence of incremental time intervals, each of duration Δt, during each of which the rate has the virtually constant values λ0, λ1, λ2, ... and so on. In that case we have the sequence of solutions



Multiplying these together, the probability at the end of these increments is given by



Thus in the limit as Δt goes to zero and NΔt goes to T, we have



Consequently we have the result



If the probability P(0) at the beginning of the flight is zero (as is typically the case), and if we make use of the approximation ex ≈ 1 + x, this equation simply says that P(T) is essentially just the integral of λ(t) from t = 0 to T, which is as expected, because λ(t) is nearly identical to the probability density function ρ(t) in these conditions. In other words, the above equation is just a convoluted way of expressing the tautological fact that the probability for the flight is the integral of the probability density for the flight.


Regulatory guidance for safety calculations sometimes splits up the total time interval T into several smaller intervals (also called phases) of duration T1, T2, ..., Tn, where Tj extends from tj−1 to tj, and considers separate rate functions λ1(t), λ2(t), ..., λn(t) during these intervals respectively. Then by the above derivation we immediately have the two equivalent expressions



However, these equations make little sense, because the rate functions are each defined and evaluated only over disjoint time intervals. In other words, the rate function λ1(t) covers only t = t0 to t1, and the rate function λ2(t) covers only t = t1 to t2, and so on. Thus these functions really comprise just a single overall rate function λ(t) covering the entire duration from t = 0 to T, and the probability is given by (3). There is no benefit in splitting the function in these terms. (If we wished to define individual rate functions beginning from time zero for each phase, then we could sum the integrals of λi for t = 0 to Ti.)


Another odd feature of the treatment of these calculations in the regulatory guidance is that they present “two cases”, writing the above equation (with product of exponentials) twice, once with P(0) = 0 and once with P(0) ≠ 0. Furthermore, in the second “case”, they re-write the above equation as



Admittedly this is algebraically equivalent to (3), and may have been intended to emphasize that it includes the initial probability, but it’s rather convoluted and unnecessarily elaborate. By the way, the guidance contains a misstatement, saying that the above equation with P(0) ≠ 0 gives “the probability that the element fails during one certain mission”, whereas it actually represents the probability of the fault being present during that mission, including the probability that it failed prior to the start of this mission. Since the failure condition under consideration is generally a catastrophic (or at least hazardous) condition, the relevance of the case P(0) ≠ 0 is dubious. It would be applicable to individual component failures that potentially contribute to the top event and that could be already failed at the beginning of the mission, but the above equation is stated to be for the probability of the total failure condition, not for component failures. (The probabilities of component failures would contribute to the applicable λ(t) functions in the above equations.)


The published regulatory guidance states that the probability per flight should always be calculated for the average flight length, but it also states that the probability for an average flight may vary, because one or more failed elements in the system can persist for multiple flights (latent, dormant, or hidden failures). The analysis must consider the relevant exposure times, e.g. time intervals between maintenance and operational checks/ inspections. In such cases the probability of the Failure Condition per flight increases with the number of flights during the latency period. To account for this, the guidance states that the probabilities per flight (assumed to be of average duration) for each flight during either the entire life of the aircraft or the least common multiple of the latency periods exposure times should be summed and then divided by the number of missions. If the system is verified to be fully healthy at the beginning of each flight, then the latency period is just one flight, then every flight of average duration would have the same probability, and hence this step in the calculation would be superfluous. In that case the average value of “probability per average flight” would simply be P(Tave), i.e., the probability of failure in a flight of average duration. The last step in the guidance is to divide this by the average flight duration, giving P(Tave)/Tave, and this is the value that is compared with the numerical threshold (e.g., 10−9/hr for catastrophic failures) to determine compliance.


The guidance does not mention (at least not explicitly) that, even if the system is verified to be fully healthy at the start of each flight, the overall probability of the failure condition occurring during the life of the airplane can also be affected by variations in flight length from the average. The only (tacit) acknowledgement of this is in the stated caveat that if P(Tave)/Tave “is likely to be significantly different from the predicted average rate of occurrence of that failure condition during the entire operational life of all airplanes of that type, then a risk model that better reflects the failure condition should be used”. The only way that P(Tave)/Tave can differ from the predicted long-term rate of occurrence is due to variations in flight length if the probability per flight is highly dependent on flight length. Recall that some failure modes contribute the same risk per mission, regardless of the duration of the mission, whereas the risk contributed by other failure modes varies in proportion to the mission duration. If we were limited to just these two kinds of failures, the probability of the jth mission with duration Tj could be written as Pj = Pc + kTj where Pc is the constant contribution and k is the proportionality of the scale-dependent contribution. If we add up the probabilities for N missions and then divide by N (as directed in the guidance), the result is Pave = Pc + kTave, which is exactly equal to the probability for the mission of the average duration. In view of this, one might think that variations in flight length have no effect on the long-term average. However, if the probability per flight varies as the square or cube or, in general, the nth power of the flight length, then it does have an effect on the long-term probability of occurrence in the life of the airplane. In such cases, the actual long-term average probability of occurrence is increased by the factor (Tn)ave/Taven as explained in the note on Regulating Risk. In most realistic cases this factor is close to 1, so it is often neglected, but it can be significant in extreme cases.


Return to MathPages Main Menu