Downtimes and Repairs - Operational Elements

7 M ODEL B UILDING

7.4 Operational Elements

7.4.7 Downtimes and Repairs

It is not uncommon for resources and even locations to unexpectedly go down or become unavailable for one reason or another, such as a mechanical failure or a personal interruption. Downtimes usually occur periodically as a function of total elapsed time, time in use, or number of times used.

Downtimes Based on Total Elapsed Time

An example of a periodic downtime based on elapsed clock time might be a worker who takes a break every two hours. Scheduled maintenance is also a type of downtime performed at periodic intervals based on elapsed clock time.

Figure 7.10 illustrates how a downtime based on elapsed time would be simu-lated. Notice that the calculation of the interval between downtimes takes into account not only busy time, but also idle time and downtime. In other words, it is the total elapsed time from the start of one downtime to the start of the next.

Down- times based on total elapsed time are often scheduled downtimes during which operational statistics on the location or resource are suspended.

ProModel allows you to designate whether a particular downtime is to be counted as scheduled downtime or unscheduled downtime.

Sometimes it may be desirable to use elapsed time to deﬁne random equip-ment failures. This is especially true if this is how historical data were gathered on the downtime. When using historical data, it is important to determine if the time between failure was based on (1) the total elapsed time from one failure to the next, (2) the time between the repair of one failure to the time of the next failure (operational time between failures), or (3) the time that the machine was actually in operation (operating time between failures). ProModel accepts downtime deﬁ- nitions based on cases 1 and 3, but requires that case 2 be converted to case 1. This is done by adding the time until the next failure to the repair time of the last failure. For example, if the operational time between failures is exponentially distributed with a mean of 10 minutes and the repair time is exponentially distributed with a

FIGURE 7.10 Resource downtime occurring every 20 minutes based on total elapsed time.

Start Interrupt Interrupt

Time (minutes)

6 10 4 6 14

Idle Busy Idle Down Busy

FIGURE 7.11 Resource downtime occurring every 20 minutes, based on operating time.

Start

Time (minutes)

Interrupt

mean of 2 minutes, the time between failures should be deﬁned as xlast + E (10) where xlast is the last repair time generated using E (2) minutes.

Downtimes Based on Time in Use

Most equipment and machine failures occur only when the resource is in use. A mechanical or tool failure, for example, generally happens only when a machine is running, not while a machine is idle. In this situation, the interval between downtimes would be deﬁned relative to actual machine operation time.

A machine that goes down every 20 minutes of operating time for a three-minute repair is illustrated in Figure 7.11. Note that any idle times and downtimes are not included in determining when the next downtime occurs.

The only time counted is the actual operating time.

Because downtimes usually occur randomly, the time to failure is most accu- rately deﬁned as a probability distribution. Studies have shown, for example, that the operating time to failure is often exponentially distributed.

Downtimes Based on the Number of Times Used

The last type of downtime occurs based on the number of times a location was used. For example, a tool on a machine may need to be replaced every 50 cycles due to tool wear, or a copy machine may need paper added after a mean of 200 copies with a standard deviation of 25 copies. ProModel permits downtimes to be deﬁned in this manner by selecting ENTRY as the type of downtime and then spec- ifying the number of entity entries between downtimes.

Downtime Resolution

Unfortunately, data are rarely available on equipment downtime. When they are available, they are often recorded as overall downtime and seldom broken down into number of times down and time between failures. Depending on the nature of the downtime information and degree of resolution required for the simulation, downtimes can be treated in the following ways:

• Ignore the downtime.

• Simply increase processing times to adjust for downtime.

• Use average values for mean time between failures (MTBF) and mean time to repair (MTTR).

• Use statistical distributions for time between failures and time to repair.

Ignoring Downtime. There are several situations where it might make sense to ignore downtimes in building a simulation model. Obviously, one situation is where absolutely no data are unavailable on downtimes. If there is no knowledge

12 8

Idle Busy Idle Busy

of resource downtimes, it is appropriate to model the resource with no downtimes and document it as such in the ﬁnal write-up. When there are downtimes, but they are extremely infrequent and not likely to affect model performance for the period of the study, it is safe to ignore them. For example, if a machine fails only two or three times a year and you are trying to predict processing capacity for the next workweek, it doesn’t make sense to include the downtime. It is also safe to ignore occasional downtimes that are very small compared to activity times. If, for example, a downtime takes only seconds to correct (like clearing a part in a machine or an occasional paper jam in a copy machine), it could be ignored.

Increasing Processing Times. A common way of treating downtime, due in part to the lack of good downtime data, is to simply reduce the production capacity of the machine by the downtime percentage. In other words, if a machine has an effective capacity of 100 parts per hour and experiences a 10 percent downtime, the effective capacity is reduced to 90 parts per hour. This spreads the downtime across each machine cycle so that both the mean time between failures and the mean time to repair are very small and both are constant. Thus no consideration is given for the variability in both time between failures and time to repair that typ- iﬁes most production systems. Law (1986) has shown that this deterministic ad- justment for downtime can produce results that differ greatly from the results based on actual machine downtimes.

MTBF/MTTR. Two parts to any downtime should be defined when modeling downtime. One, time between failures, defines the interval between failures. The other, time to repair, defines the time required to bring a resource back online whenever it goes down. Often downtimes are defined in terms of mean time between failures (MTBF) and mean time to repair (MTTR). Using average times for these intervals presents the same problems as using average times for any ac-tivity in a simulation: it fails to account for variability, which can have a signifi-cant impact on system performance.

Using Statistical Distributions. Whenever possible, time between failures and time to repair should be represented by statistical distributions that reﬂect the variation that is characteristic of these elements. Studies have shown that the time until failure, particularly due to items (like bearings or tooling) that wear, tends to follow a Weibull distribution. Repair times often follow a lognormal distribution.

Elapsed Time or Usage Time?

When determining the distribution for time to failure, a distinction should be made between downtime events that can occur anytime whether the resource is operating or idle and downtime events that occur only when a resource is in use.

As explained earlier, downtimes that can occur anytime should be deﬁned as a function of clock time. If the resource goes down only while in operation, it should be deﬁned as a function of time in use.

Erroneously basing downtime on elapsed time when it should be based on operating time artiﬁcially inﬂates time between failures by the inclusion of idle

time. It also implies that during periods of high equipment utilization, the same amount of downtime occurs as during low utilization periods. Equipment failures should generally be based on operating time and not on elapsed time because elapsed time includes operating time, idle time, and downtime. It should be left to the simulation to determine how idle time and downtime affect the overall elapsed time between failures.

To illustrate the difference this can make, let’s assume that the following times were logged for a given operation:

Status Time (Hours)

In use 20

Down 5

Idle 15

Total time 40

If it is assumed that downtime is a function of total time, then the percentage of downtime would be calculated to be 5 hours/40 hours, or 12.5 percent. If, how-ever, it is assumed that the downtime is a function of usage time, then the down-time percentage would be 5 hours/20 hours, or 25 percent. Now let’s suppose that the system is modeled with increased input to the operation so that it is never starved (the idle time = 0). If downtime is assumed to be 12.5 percent, the total

time down will be .125 × 40 = 5 hours. If, on the other hand, we use the assump-tion that it is 25 percent, then the time down will end up being .25 × 40 hours = 10 hours. This is a difference of ﬁve hours, which means that if downtime is falsely assumed to be a function of total time, the simulation would realize ﬁve extra hours of production in a 40-hour period that shouldn’t have happened.

Handling Interrupted Entities

When a resource goes down, there might be entities that were in the middle of being processed that are now left dangling (that is, they have been preempted).

For example, a machine might break down while running the seventh part of a batch of 20 parts. The modeler must decide what to do with these entities.

Several alternatives may be chosen, and the modeler must select which alternative is the most appropriate:

• Resume processing the entity after the downtime is over.

• Find another available resource to continue the process.

• Scrap the entity.

• Delay start of the downtime until the entity is processed.

The last option is the easiest way to handle downtimes and, in fact, may be adequate in situations where either the processing times or the downtimes are rel- atively short. In such circumstances, the delay in entity ﬂow is still going to closely approximate what would happen in the actual system.

If the entity resumes processing later using either the same or another resource, a decision must be made as to whether only the remaining process time is

used or if additional time must be added. By default, ProModel suspends processing until the location or resource returns to operation. Alternatively, other logic may be deﬁned.

In document Simulation Using Promodel (Page 186-190)