Data Documentation and Approval - Reject the distribution if the chi-square statistic exceeds t

AND A NALYSIS

Step 9: Reject the distribution if the chi-square statistic exceeds the critical value. If the calculated test statistic is larger than the critical value

6.12 Data Documentation and Approval

When it is felt that all relevant data have been gathered, analyzed, and converted into a usable form, it is advisable to document the data using tables, relational di- agrams, and assumption lists. Sources of data should also be noted. This document should then be reviewed by those in a position to evaluate the validity of the data and approve the assumptions made. Where more formal documentation is required, a separate document will need to be created. This document will be helpful later if modiﬁcations need to be made to the model or to analyze why the actual system ends up working differently than the simulation.

In addition to identifying data used to build the model, the document should also specify factors that were deliberately excluded from the model because they were deemed insignificant or irrelevant. For example, if break times are not in-cluded in the system description because of their perceived insignificance, the document should state this. Justification for omissions should also be included if necessary. Stating why certain factors are being excluded from the system de-scription will help resolve later questions that may arise regarding the model premises.

Reviewing and approving input data can be a time-consuming and difﬁcult task, especially when many assumptions are made. In practice, data validation ends up being more of a consensus-building process where agreement is reached that the information is good enough for the purposes of the simulation. The data document is not a static document but rather a dynamic one that often changes as model building and experimentation get under way. Much if not all of the data documentation can be scripted right into the model. Most software provides the capability to write comments in the model. Where more formal documentation is required, a separate data document will need to be created.

6.12.1 Data Documentation Example

To illustrate how system data might be documented, imagine you have just col-lected information for an assembly operation for three different monitor sizes.

Here is an example of how the data collected for this system might be diagrammed and tabulated. The diagram and data should be sufﬁciently clear and complete for those familiar with the system to validate the data and for a model to be constructed. Re- view the data and see if they are sufﬁciently clear to formulate a mental image of the system being modeled.

Objective

The objective of the study is to determine station utilization and throughput of the system.

Entity Flow Diagram

Rejected monitors

19", 21", & 25"

Station 1 Inspection

19" & 21"

monitor

monitors

Reworked monitors

21" monitor

Station 3

Entities 19" monitor 21" monitor 25" monitor

Workstation Information

Workstation Buffer Capacity Defective Rate

Station 1 5 5%

Station 2 8 8%

Station 3 5 0%

Inspection 5 0%

Processing Sequence

Entity Station

Operating Time in Minutes (min, mode, max) 19" monitor Station 1 0.8, 1, 1.5

Station 2 0.9, 1.2, 1.8 Inspection 1.8, 2.2, 3 21" monitor Station 1 0.8, 1, 1.5 Station 2 1.1, 1.3, 1.9 Inspection 1.8, 2.2, 3 25" monitor Station 1 0.9, 1.1, 1.6

Station 2 1.2, 1.4, 2 Inspection 1.8, 2.3, 3.2 Station 3 0.5, 0.7, 1

Station 2

Handling Defective Monitors

• Defective monitors are detected at inspection and routed to whichever station created the problem.

• Monitors waiting at a station for rework have a higher priority than ﬁrst-time monitors.

• Corrected monitors are routed back to inspection.

• A reworked monitor has only a 2 percent chance of failing again, in which case it is removed from the system.

Arrivals

A cartload of four monitor assemblies arrives every four hours normally distributed with a standard deviation of 0.2 hour. The probability of an arriving monitor being of a particular size is

Monitor Size Probability

19" .6

21" .3

25" .1

Move Times

All movement is on an accumulation conveyor with the following times:

From To Time (seconds)

Station 1 Station 2 12

Station 2 Inspection 15

Inspection Station 3 12

Inspection Station 1 20

Inspection Station 2 14

Station 1 Inspection 18

Move Triggers

Entities move from one location to the next based on available capacity of the input buffer at the next location.

Work Schedule

Stations are scheduled to operate eight hours a day.

Assumption List

• No downtimes (downtimes occur too infrequently).

• Dedicated operators at each workstation are always available during the scheduled work time.

• Rework times are half of the normal operation times.

Simulation Time and Replications

The simulation is run for 40 hours (10 hours of warm-up). There are ﬁve replications.

6.13 Summary

Data for building a model should be collected systematically with a view of how the data are going to be used in the model. Data are of three types: structural, op-erational, and numerical. Structural data consist of the physical objects that make up the system. Operational data deﬁne how the elements behave.

Numerical data quantify attributes and behavioral parameters.

When gathering data, primary sources should be used first, such as historical records or specifications. Developing a questionnaire is a good way to request information when conducting personal interviews. Data gathering should start with structural data, then operational data, and finally numerical data. The first piece of the puzzle to be put together is the routing sequence because everything else hinges on the entity flow.

Numerical data for random variables should be analyzed to test for indepen-dence and homogeneity. Also, a theoretical distribution should be ﬁt to the data if there is an acceptable ﬁt. Some data are best represented using an empirical distribution. Theoretical distributions should be used wherever possible.

Data should be documented, reviewed, and approved by concerned individu- als. This data document becomes the basis for building the simulation model and provides a baseline for later modiﬁcation or for future studies.

6.14 Review Questions

1. Give two examples of structural data, operational data, and numerical data to be gathered when building a model.

2. Why is it best to begin gathering data by deﬁning entity routings?

3. Of the distributions shown in the chapter, which theoretical distribution most likely would be representative of time to failure for a machine?

4. Why would a normal distribution likely be a poor representation of an activity time?

5. Assume a new system is being simulated and the only estimate available for a manual operation is a most likely value. How would you handle this situation?

6. Under what circumstances would you use an empirical distribution instead of a standard theoretical distribution for an activity time?

7. Why is the distribution for interarrival times often nonstationary?

8. Assuming you had historical data on truck arrivals for the past year, how would you arrive at an appropriate arrival distribution to model the system for the next six months?

9. A new machine is being considered for which the company has no reliability history. How would you obtain the best possible estimate of reliability for the machine?

10. Suppose you are interested in looking at the impact of having workers inspect their own work instead of having a dedicated inspection station. If this is a new system requiring lots of assumptions to be made, how can simulation be useful in making the comparison?

11.State whether the following are examples of a discrete probability distribution or a continuous probability distribution.

a. Activity times.

b. Batch sizes.

c. Time between arrivals.

d. Probability of routing to one of six possible destinations.

12.Conceptually, how would you model a random variable X that represents an activity time that is normally distributed with a mean of 10 minutes and a standard deviation of 3 minutes but is never less than 8 minutes?

13. Using Stat::Fit, generate a list of 50 random values between 10 and 100. Choose the Scatter Plot option and plot the data. Now put the data in ascending order using the Input/Operate commands and plot the data. Explain the correlation, if any, that you see in each scatter plot.

14. How can you check to see if a distribution in ProModel is giving the right values?

15. Since many theoretical distributions are unbounded on the bottom, what happens in ProModel if a negative value is sampled for an activity time?

16. Go to a small convenience store or the university bookstore and collect data on the interarrival and service times of customers.

Make histograms of the number of arrivals per time period and the number of service completions per period. Note if these

distributions vary by the time of the day and by the day of the week. Record the number of service channels available at all times.

Make sure you secure permission to perform the study.

17. The following are throughput time values for 30 simulation runs. Calculate an estimate of the mean, variance, standard deviation, and coefﬁcient of variation for the throughput time.

Construct a histogram that has six cells of equal width.

10.7, 5.4, 7.8, 12.2, 6.4, 9.5, 6.2, 11.9, 13.1, 5.9, 9.6, 8.1, 6.3, 10.3, 11.5,

12.7, 15.4, 7.1, 10.2, 7.4, 6.5, 11.2, 12.9, 10.1, 9.9, 8.6, 7.9, 10.3, 8.3, 11.1

18. Customers calling into a service center are categorized according to the nature of their problems. Five types of problem categories (A

through E) have been identiﬁed. One hundred observations were made

of customers calling in during a day, with a summary of the data shown here. By inspection, you conclude that the data are most likely uniformly distributed. Perform a chi-square goodness-of-ﬁt test of this hypothesis.

Type A B C D E

Observations 10 14 12 9 5

19.While doing your homework one afternoon, you notice that you are frequently interrupted by friends. You decide to record the times between interruptions to see if they might be exponentially distributed.

Here are 30 observed times (in minutes) that you have recorded;

conduct a goodness-of-ﬁt test to see if the data are exponentially distributed. (Hint: Use the data average as an estimate of the mean. For the range, assume a range between 0 and inﬁnity. Divide the cells based on equal probabilities ( pi ) for each cell rather than equal cell

intervals.)

2.08 6.86 4.86 2.55 5.94

2.96 0.91 2.13 2.20 1.40

16.17 2.11 2.38 0.83 2.81

14.57 0.29 2.73 0.73 1.76

2.79 11.69 18.29 5.25 7.42

2.15 0.96 6.28 0.94 13.76

In document Simulation Using Promodel (Page 158-163)