AND A NALYSIS
Step 9: Reject the distribution if the chi-square statistic exceeds the critical value. If the calculated test statistic is larger than the critical value
6.12 Data Documentation and Approval
When it is felt that all relevant data have been gathered, analyzed, and converted into a usable form, it is advisable to document the data using tables, relational di- agrams, and assumption lists. Sources of data should also be noted. This docu- ment should then be reviewed by those in a position to evaluate the validity of the data and approve the assumptions made. Where more formal documentation is required, a separate document will need to be created. This document will be helpful later if modifications need to be made to the model or to analyze why the actual system ends up working differently than the simulation.
In addition to identifying data used to build the model, the document should also specify factors that were deliberately excluded from the model because they were deemed insignificant or irrelevant. For example, if break times are not in-cluded in the system description because of their perceived insignificance, the document should state this. Justification for omissions should also be included if necessary. Stating why certain factors are being excluded from the system de-scription will help resolve later questions that may arise regarding the model premises.
Reviewing and approving input data can be a time-consuming and difficult task, especially when many assumptions are made. In practice, data validation ends up being more of a consensus-building process where agreement is reached that the information is good enough for the purposes of the simulation. The data document is not a static document but rather a dynamic one that often changes as model building and experimentation get under way. Much if not all of the data documentation can be scripted right into the model. Most software provides the capability to write comments in the model. Where more formal documentation is required, a separate data document will need to be created.
6.12.1 Data Documentation Example
To illustrate how system data might be documented, imagine you have just col-lected information for an assembly operation for three different monitor sizes.
Here is an example of how the data collected for this system might be diagrammed and tabulated. The diagram and data should be sufficiently clear and complete for those familiar with the system to validate the data and for a model to be constructed. Re- view the data and see if they are sufficiently clear to formulate a mental image of the system being modeled.
Objective
The objective of the study is to determine station utilization and throughput of the system.
Entity Flow Diagram
Rejected monitors
19", 21", & 25"
Station 1 Inspection
19" & 21"
monitor
monitors
Reworked monitors
21" monitor
Station 3
Entities 19" monitor 21" monitor 25" monitor
Workstation Information
Workstation Buffer Capacity Defective Rate
Station 1 5 5%
Station 2 8 8%
Station 3 5 0%
Inspection 5 0%
Processing Sequence
Entity Station
Operating Time in Minutes (min, mode, max) 19" monitor Station 1 0.8, 1, 1.5
Station 2 0.9, 1.2, 1.8 Inspection 1.8, 2.2, 3 21" monitor Station 1 0.8, 1, 1.5 Station 2 1.1, 1.3, 1.9 Inspection 1.8, 2.2, 3 25" monitor Station 1 0.9, 1.1, 1.6
Station 2 1.2, 1.4, 2 Inspection 1.8, 2.3, 3.2 Station 3 0.5, 0.7, 1
Station 2
Handling Defective Monitors
• Defective monitors are detected at inspection and routed to whichever station created the problem.
• Monitors waiting at a station for rework have a higher priority than first-time monitors.
• Corrected monitors are routed back to inspection.
• A reworked monitor has only a 2 percent chance of failing again, in which case it is removed from the system.
Arrivals
A cartload of four monitor assemblies arrives every four hours normally distrib- uted with a standard deviation of 0.2 hour. The probability of an arriving monitor being of a particular size is
Monitor Size Probability
19" .6
21" .3
25" .1
Move Times
All movement is on an accumulation conveyor with the following times:
From To Time (seconds)
Station 1 Station 2 12
Station 2 Inspection 15
Inspection Station 3 12
Inspection Station 1 20
Inspection Station 2 14
Station 1 Inspection 18
Move Triggers
Entities move from one location to the next based on available capacity of the input buffer at the next location.
Work Schedule
Stations are scheduled to operate eight hours a day.
Assumption List
• No downtimes (downtimes occur too infrequently).
• Dedicated operators at each workstation are always available during the scheduled work time.
• Rework times are half of the normal operation times.
Simulation Time and Replications
The simulation is run for 40 hours (10 hours of warm-up). There are five replications.
6.13 Summary
Data for building a model should be collected systematically with a view of how the data are going to be used in the model. Data are of three types: structural, op-erational, and numerical. Structural data consist of the physical objects that make up the system. Operational data define how the elements behave.
Numerical data quantify attributes and behavioral parameters.
When gathering data, primary sources should be used first, such as historical records or specifications. Developing a questionnaire is a good way to request in- formation when conducting personal interviews. Data gathering should start with structural data, then operational data, and finally numerical data. The first piece of the puzzle to be put together is the routing sequence because everything else hinges on the entity flow.
Numerical data for random variables should be analyzed to test for indepen-dence and homogeneity. Also, a theoretical distribution should be fit to the data if there is an acceptable fit. Some data are best represented using an empirical dis- tribution. Theoretical distributions should be used wherever possible.
Data should be documented, reviewed, and approved by concerned individu- als. This data document becomes the basis for building the simulation model and provides a baseline for later modification or for future studies.
6.14 Review Questions
1. Give two examples of structural data, operational data, and numerical data to be gathered when building a model.
2. Why is it best to begin gathering data by defining entity routings?
3. Of the distributions shown in the chapter, which theoretical distribution most likely would be representative of time to failure for a machine?
4. Why would a normal distribution likely be a poor representation of an activity time?
5. Assume a new system is being simulated and the only estimate available for a manual operation is a most likely value. How would you handle this situation?
6. Under what circumstances would you use an empirical distribution instead of a standard theoretical distribution for an activity time?
7. Why is the distribution for interarrival times often nonstationary?
8. Assuming you had historical data on truck arrivals for the past year, how would you arrive at an appropriate arrival distribution to model the system for the next six months?
9. A new machine is being considered for which the company has no reliability history. How would you obtain the best possible estimate of reliability for the machine?
10. Suppose you are interested in looking at the impact of having workers inspect their own work instead of having a dedicated inspection station. If this is a new system requiring lots of assumptions to be made, how can simulation be useful in making the comparison?
11.State whether the following are examples of a discrete probability distribution or a continuous probability distribution.
a. Activity times.
b. Batch sizes.
c. Time between arrivals.
d. Probability of routing to one of six possible destinations.
12.Conceptually, how would you model a random variable X that represents an activity time that is normally distributed with a mean of 10 minutes and a standard deviation of 3 minutes but is never less than 8 minutes?
13. Using Stat::Fit, generate a list of 50 random values between 10 and 100. Choose the Scatter Plot option and plot the data. Now put the data in ascending order using the Input/Operate commands and plot the data. Explain the correlation, if any, that you see in each scatter plot.
14. How can you check to see if a distribution in ProModel is giving the right values?
15. Since many theoretical distributions are unbounded on the bottom, what happens in ProModel if a negative value is sampled for an activity time?
16. Go to a small convenience store or the university bookstore and collect data on the interarrival and service times of customers.
Make histograms of the number of arrivals per time period and the number of service completions per period. Note if these
distributions vary by the time of the day and by the day of the week. Record the number of service channels available at all times.
Make sure you secure permission to perform the study.
17. The following are throughput time values for 30 simulation runs. Calculate an estimate of the mean, variance, standard deviation, and coefficient of variation for the throughput time.
Construct a histogram that has six cells of equal width.
10.7, 5.4, 7.8, 12.2, 6.4, 9.5, 6.2, 11.9, 13.1, 5.9, 9.6, 8.1, 6.3, 10.3, 11.5,
12.7, 15.4, 7.1, 10.2, 7.4, 6.5, 11.2, 12.9, 10.1, 9.9, 8.6, 7.9, 10.3, 8.3, 11.1
18. Customers calling into a service center are categorized according to the nature of their problems. Five types of problem categories (A
through E) have been identified. One hundred observations were made
of customers calling in during a day, with a summary of the data shown here. By inspection, you conclude that the data are most likely uniformly distributed. Perform a chi-square goodness-of-fit test of this hypothesis.
Type A B C D E
Observations 10 14 12 9 5
19.While doing your homework one afternoon, you notice that you are frequently interrupted by friends. You decide to record the times between interruptions to see if they might be exponentially distributed.
Here are 30 observed times (in minutes) that you have recorded;
conduct a goodness-of-fit test to see if the data are exponentially distributed. (Hint: Use the data average as an estimate of the mean. For the range, assume a range between 0 and infinity. Divide the cells based on equal probabilities ( pi ) for each cell rather than equal cell
intervals.)
2.08 6.86 4.86 2.55 5.94
2.96 0.91 2.13 2.20 1.40
16.17 2.11 2.38 0.83 2.81
14.57 0.29 2.73 0.73 1.76
2.79 11.69 18.29 5.25 7.42
2.15 0.96 6.28 0.94 13.76