Open-Loop and Closed-Loop Markov Models

 

The essential differences between open-loop and closed-loop Markov modeling methods can be illustrated by a simple 3-state model {S0, S1, S2} where S0 is the full-up condition, S1 is the partially failed condition, and S2 is a complete failure (a "shutdown"). Failure transitions S0→S1 and S1→S2 are exponentially distributed with rates "a" and "b" respectively. Periodically (every T hours) all partially failed systems are repaired. If a system experiences a shutdown it is repaired immediately and returned to service. Our objective is to determine the mean time between shutdowns as a function of the inspection/repair interval T.

 

A closed-loop Markov model for this system is shown below.

 

 

 

The repair transition S1→S0 is treated as exponential with rate c = 1/MTTR, where MTTR is the mean time to repair of partial faults. If T is sufficiently small the value of MTTR is approximately T/2, but to cover the whole range of possible T values we need to use

 

 

Strictly speaking the transition S2→S0 is also treated as exponential (with rate "d"), but since the rate of entering S2 is independent of the rate of leaving S2, this rate always drops out of the final result.

 

The steady-state solution of the closed-loop model is found by solving the equilibrium equations

 

 

which give

 

 

From this we can compute the rate of entry into S2 as follows

 

 

(The symbol (dP2/dt)+ signifies the positive component of dP2/dt, because we want the rate of entry into S2.) Setting c = 1/MTTR we have the result

 

 

Notice that as T increases to infinity the rate approaches ab/(a+b), which we recognize as the correct rate assuming partial failures are never repaired.

 

The open-loop model for this system is shown below.

 

at t

 

The idea of this approach is to simulate the actual periodic repair process (rather than treating the transition S1→S0 as exponential) by running the model without repairs for an interval of length T, and then inferring the effective shutdown rate from the probability of the shutdown state at time T.

 

Assuming P2(0) = 0 (i.e., the partial failure state is empty at t = 0) the dynamic solution of the open-loop model (with d = 0) is

 

 

A simple method of approximating the MTBF of this system would be to set P0(0) = 1 and compute the value of P2(T). Then if we assumed this probability accumulated in S2 at a constant rate λ, we could solve the equation P2(T) = 1 – e–λT for λ, which gives

 

 

This value of λopen-loop is plotted along with λclosed-loop in the figure below. (For illustration purposes we have taken a = 0.002 and b = 0.001)

 

119FIG1

 

As can be seen, the two values agree for small T, but they differ as T increases. We know the closed-loop model gives the correct asymptotic rate as T increases (which can be verified using numerical simulation), so there is evidently something wrong with the simple open-loop approach just described. The main problem is the assumption that the probability accumulates in S2 at a constant rate, which is not generally the case. The initial shutdown rate is quite low (because S1 starts out empty) and then increases. In general, two different probability density functions that give equal values of P2(T) can have different MTBFs, because of how the failures are distributed during the interval T. This means we cannot actually infer the true MTBF from the open-loop value of P2(T) alone. For non-constant failure rates the only way to determine the actual mean time for a system to fail is by integrating t times the failure density function δ(t) using the formula

 

 

Then the "effective failure rate" can be defined as 1/MTBF. Unfortunately the density function for a system with periodic repairs is somewhat complicated. The probability of State 2 as a function of time during a single inspection interval is

 

 

 

where t = 0 at the start of this interval. Letting Pj[k] denote the value of Pj(0) at the start of the kth period, we have P2[k] = 1 – P0[k] (because the partial failure state is empty at the start of each interval). Also, we have P0[0] = 1, and the above equation implies that

 

 

for all k > 0, where Tj is the duration of the jth period. Fortunately, all the Tj except possibly T0 are of equal duration, which we will call simply T. Therefore, if we define

 

 

we have

 

 

for all k>0. For convenience let q0 denote q(T0) and q denote q(T). Then the probability density function for the initial period T0 is

 

 

and for all subsequent periods the density function is

 

 

The MTBF of the system is then given by

 

 

Evaluating the integrals and summing the resulting geometric series we finally arrive at the result

 

 

Taking a strict open-loop approach (neglecting the repair transition shown as a dotted line in the open-loop schematic), the system always starts at t = 0 at the beginning of the initial inspection/repair interval, and so we have T0 = T. The resulting effective shutdown rate is shown in the figure below along with the closed-loop rate.

 

119FIG2

 

The match is a little better than the previous open-loop method - at least now they both have the correct asymptotic rate as T goes to infinity. However, the open-loop method approaches the asymptote much more quickly than the closed-loop method. Which method is correct?

 

The answer is that both methods are correct, but they represent different things. The open-loop method (without the dotted repair transition) always begins the first inspection/repair interval at t = 0. The MTBF of the system in those circumstances is as given by the open-loop model with T0 = T. However, in practice we don't re-synchronize our inspection/repair cycle each time we have a shutdown. For example, if a system fails half-way through our inspection interval we fix it and return it to service, so it's initial inspection interval (on the way to it's next failure) is only half of the normal interval. Since a shutdown can occur anywhere during the interval, the actual value of T0 can be anything from 0 to T, with a mean value of T/2.

 

The figure below shows the open-loop rate, taking T0 = T/2, along with the closed loop rate. As can be seen, the results of the open-loop method with T0 = T/2 are almost identical to the closed-loop prediction. This result is confirmed by numerical simulation of the system with periodic inspections every T hours, assuming the inspection cycle is not re-synchronized with each shutdown.

 

119FIG3

 

Thus we can achieve fairly consistent results using the open-loop with strictly periodic (as opposed to exponential) repairs. This shows that there is not much difference between periodic and exponential repair transitions. Of course, if we're willing to represent the repair transition from S1 to S0 as exponential with constant rate c = 1/MTTR, as we did with the closed-loop model, then we can get exactly the same answer as we got using the closed-loop model. On this basis the open-loop model is as shown below.

 

open with c

 

Since P2(t) = 1 – P0(t) – P1(t) the two governing equations are simply

 

 

The characteristic roots of this system are

 

 

 With the initial condition P0(0) = 1 the solution is

 

 

Consequently the cumulative probability function for state 2 is

 

 

and the probability density function for entering state 2 is the derivative of this, i.e.,

 

 

The mean time to reach state 2 from state 0 is given by the integral

 

 

Evaluating the integral give

 

 

which is exactly equal to the reciprocal of the closed-loop failure rate.

 

In summary, the it is usually possible to represent periodic repairs as exponential transitions, and it is possible to achieve consistent results using either the closed-loop or the open-loop approach. However, the open-loop approach generally requires much more effort, and for complicated systems it quickly becomes impractical.

 

Return to MathPages Main Menu