• No results found

Simulation

In document IBM SPSS Statistics Base 22 (Page 167-198)

Predictive models, such as linear regression, require a set of known inputs to predict an outcome or target value. In many real world applications, however, values of inputs are uncertain. Simulation allows you to account for uncertainty in the inputs to predictive models and evaluate the likelihood of various outcomes of the model in the presence of that uncertainty. For example, you have a profit model that includes the cost of materials as an input, but there is uncertainty in that cost due to market volatility. You can use simulation to model that uncertainty and determine the effect it has on profit.

Simulation in IBM SPSS Statistics uses the Monte Carlo method. Uncertain inputs are modeled with probability distributions (such as the triangular distribution), and simulated values for those inputs are generated by drawing from those distributions. Inputs whose values are known are held fixed at the known values. The predictive model is evaluated using a simulated value for each uncertain input and fixed values for the known inputs to calculate the target (or targets) of the model. The process is repeated many times (typically tens of thousands or hundreds of thousands of times), resulting in a distribution of target values that can be used to answer questions of a probabilistic nature. In the context of IBM SPSS Statistics, each repetition of the process generates a separate case (record) of data that consists of the set of simulated values for the uncertain inputs, the values of the fixed inputs, and the predicted target (or targets) of the model.

You can also simulate data in the absence of a predictive model by specifying probability distributions for variables that are to be simulated. Each generated case of data consists of the set of simulated values for the specified variables.

To run a simulation, you need to specify details such as the predictive model, the probability

distributions for the uncertain inputs, correlations between those inputs and values for any fixed inputs. Once you've specified all of the details for a simulation, you can run it and optionally save the

specifications to asimulation planfile. You can share the simulation plan with other users, who can then run the simulation without needing to understand the details of how it was created.

Two interfaces are available for working with simulations. The Simulation Builder is an advanced interface for users who are designing and running simulations. It provides the full set of capabilities for designing a simulation, saving the specifications to a simulation plan file, specifying output and running the simulation. You can build a simulation based on an IBM SPSS model file, or on a set of custom equations that you define in the Simulation Builder. You can also load an existing simulation plan into the Simulation Builder, modify any of the settings and run the simulation, optionally saving the updated plan. For users who have a simulation plan and primarily want to run the simulation, a simpler interface is available. It allows you to modify settings that enable you to run the simulation under different conditions, but does not provide the full capabilities of the Simulation Builder for designing simulations.

To design a simulation based on a model file

1. From the menus choose: Analyze>Simulation...

2. Click Select SPSS Model Fileand clickContinue. 3. Open the model file.

The model file is an XML file that contains model PMML created from IBM SPSS Statistics or IBM SPSS Modeler. See the topic “Model tab” on page 164 for more information.

4. On the Simulation tab (in the Simulation Builder), specify probability distributions for simulated inputs and values for fixed inputs. If the active dataset contains historical data for simulated inputs, click Fit Allto automatically determine the distribution that most closely fits the data for each such

input as well as determining correlations between them. For each simulated input that is not being fit to historical data, you must explicitly specify a distribution by selecting a distribution type and entering the required parameters.

5. ClickRunto run the simulation. By default, the simulation plan, specifying the details of the simulation, is saved to the location specified on the Save settings.

The following options are available:

v Modify the location for the saved simulation plan. v Specify known correlations between simulated inputs.

v Automatically compute a contingency table of associations between categorical inputs and use those associations when data are generated for those inputs.

v Specify sensitivity analysis to investigate the effect of varying the value of a fixed input or varying a distribution parameter for a simulated input.

v Specify advanced options such as setting the maximum number of cases to generate or requesting tail sampling.

v Customize output.

v Save the simulated data to a data file.

To design a simulation based on custom equations

1. From the menus choose:

Analyze>Simulation...

2. ClickType in the Equationsand clickContinue.

3. ClickNew Equationon the Model tab (in the Simulation Builder) to define each equation in your predictive model.

4. Click the Simulation tab and specify probability distributions for simulated inputs and values for fixed inputs. If the active dataset contains historical data for simulated inputs, clickFit Allto

automatically determine the distribution that most closely fits the data for each such input as well as determining correlations between them. For each simulated input that is not being fit to historical data, you must explicitly specify a distribution by selecting a distribution type and entering the required parameters.

5. ClickRunto run the simulation. By default, the simulation plan, specifying the details of the simulation, is saved to the location specified on the Save settings.

The following options are available:

v Modify the location for the saved simulation plan. v Specify known correlations between simulated inputs.

v Automatically compute a contingency table of associations between categorical inputs and use those associations when data are generated for those inputs.

v Specify sensitivity analysis to investigate the effect of varying the value of a fixed input or varying a distribution parameter for a simulated input.

v Specify advanced options such as setting the maximum number of cases to generate or requesting tail sampling.

v Customize output.

v Save the simulated data to a data file.

To design a simulation without a predictive model

1. From the menus, choose:

Analyze>Simulation...

3. On the Model tab (in the Simulation Builder), select the fields that you want to simulate. You can select fields from the active dataset or you can define new fields by clickingNew.

4. Click the Simulation tab and specify probability distributions for the fields that are to be simulated. If the active dataset contains historical data for any of those fields, clickFit Allto automatically

determine the distribution that most closely fits the data and to determine correlations between the fields. For fields that are not fit to historical data, you must explicitly specify a distribution by selecting a distribution type and entering the required parameters.

5. Click Runto run the simulation. By default, the simulated data are saved to the new dataset specified on the Save settings. In addition, the simulation plan, which specifies the details of the simulation, is saved to the location specified on the Save settings.

The following options are available:

v Modify the location for the simulated data or the saved simulation plan. v Specify known correlations between simulated fields.

v Automatically compute a contingency table of associations between categorical fields and use those associations when data are generated for those fields.

v Specify sensitivity analysis to investigate the effect of varying a distribution parameter for a simulated field.

v Specify advanced options such as setting the number of cases to generate.

To run a simulation from a simulation plan

Two options are available for running a simulation from a simulation plan. You can use the Run Simulation dialog, which is primarily designed for running from a simulation plan, or you can use the Simulation Builder.

To use the Run Simulation dialog: 1. From the menus choose:

Analyze>Simulation...

2. Click Open an Existing Simulation Plan.

3. Make sure theOpen in Simulation Builder check box is not checked and clickContinue. 4. Open the simulation plan.

5. Click Runin the Run Simulation dialog. To run the simulation from the Simulation Builder: 1. From the menus choose:

Analyze>Simulation...

2. Click Open an Existing Simulation Plan.

3. Select theOpen in Simulation Builder check box and clickContinue. 4. Open the simulation plan.

5. Modify any settings you want to modify on the Simulation tab. 6. Click Runto run the simulation.

Optionally, you can do the following:

v Set up or modify sensitivity analysis to investigate the effect of varying the value of a fixed input or varying a distribution parameter for a simulated input.

v Refit distributions and correlations for simulated inputs to new data. v Change the distribution for a simulated input.

v Customize output.

Simulation Builder

The Simulation Builder provides the full set of capabilities for designing and running simulations. It allows you to perform the following general tasks:

v Design and run a simulation for an IBM SPSS model defined in a PMML model file.

v Design and run a simulation for a predictive model defined by a set of custom equations that you specify.

v Design and run a simulation that generates data in the absence of a predictive model.

v Run a simulation based on an existing simulation plan, optionally modifying any plan settings.

Model tab

For simulations based on a predictive model, the Model tab specifies the source of the model. For simulations that do not include a predictive model, the Model tab specifies the fields that are to be simulated.

Select an SPSS model file.This option specifies that the predictive model is defined in an IBM SPSS model file. An IBM SPSS model file is an XML file that contains model PMML created from IBM SPSS Statistics or IBM SPSS Modeler. Predictive models are created by procedures, such as Linear Regression and Decision Trees within IBM SPSS Statistics, and can be exported to a model file. You can use a different model file by clickingBrowseand navigating to the file you want.

PMML models supported by Simulation v Linear Regression

v Generalized Linear Model v General Linear Model v Binary Logistic Regression v Multinomial Logistic Regression v Ordinal Multnomial Regression v Cox Regression v Tree v Boosted Tree (C5) v Discriminant v Two-step Cluster v K-Means Cluster v Neural Net

v Ruleset (Decision List) Note:

v PMML models that have multiple target fields (variables) or splits are not supported for use in Simulation.

v Values of string inputs to binary logistic regression models are limited to 8 bytes in the model. If you are fitting such string inputs to the active dataset, make sure that the values in the data do not exceed 8 bytes in length. Data values that exceed 8 bytes are excluded from the associated categorical

distribution for the input, and are displayed as unmatched in the Unmatched Categories output table. Type in the equations for the model.This option specifies that the predictive model consists of one or more custom equations to be created by you. Create equations by clickingNew Equation. This opens the Equation Editor. You can modify existing equations, copy them to use as templates for new equations, reorder them and delete them.

v The Simulation Builder does not support systems of simultaneous equations or equations that are non-linear in the target variable.

v Custom equations are evaluated in the order in which they are specified. If the equation for a given target depends on another target, then the other target must be defined by a preceding equation. For example, given the set of three equations below, the equation forprofitdepends on the values of

revenueandexpenses, so the equations forrevenueand expensesmust precede the equation for profit. revenue = price*volume

expenses = fixed + volume*(unit_cost_materials + unit_cost_labor) profit = revenue - expenses

Create simulated data without a model.Select this option to simulate data without a predictive model. Specify the fields that are to be simulated by selecting fields from the active dataset or by clickingNew to define new fields.

Equation Editor

The Equation Editor allows you to create or modify a custom equation for your predictive model. v The expression for the equation can contain fields from the active dataset or new input fields that you

define in the Equation Editor.

v You can specify properties of the target such as its measurement level, value labels and whether output is generated for the target.

v You can use targets from previously defined equations as inputs to the current equation, allowing you to create coupled equations.

v You can attach a descriptive comment to the equation. Comments are displayed along with the equation on the Model tab.

1. Enter the name of the target. Optionally, clickEdit under the Target text box to open the Defined Inputs dialog, allowing you to change the default properties of the target.

2. To build an expression, either paste components into the Numeric Expression field or type directly in the Numeric Expression field.

v You can build your expression using fields from the active dataset or you can define new inputs by clicking theNew button. This opens the Define Inputs dialog.

v You can paste functions by selecting a group from the Function group list and double-clicking the function in the Functions list (or select the function and click the arrow adjacent to the Function group list). Enter any parameters indicated by question marks. The function group labeledAllprovides a listing of all available functions. A brief description of the currently selected function is displayed in a reserved area in the dialog box.

v String constants must be enclosed in quotation marks.

v If values contain decimals, a period (.) must be used as the decimal indicator.

Note: Simulation does not support custom equations with string targets.

Defined Inputs: The Defined Inputs dialog allows you to define new inputs and set properties for targets.

v If an input to be used in an equation does not exist in the active dataset, you must define it before it can be used in the equation.

v If you are simulating data without a predictive model, you must define all simulated inputs that do not exist in the active dataset.

Target.You can specify the measurement level of a target. The default measurement level is continuous. You can also specify whether output will be created for this target. For example, for a set of coupled equations you may only be interested in output from the target for the final equation, so you would suppress output from the other targets.

Input to be simulated.This specifies that the values of the input will be simulated according to a specified probability distribution (the probability distribution is specified on the Simulation tab). The measurement level determines the default set of distributions that are considered when finding the distribution that most closely fits the data for the input (by clickingFit orFit Allon the Simulation tab). For example, if the measurement level is continuous, then the normal distribution (appropriate for continuous data) would be considered but the binomial distribution would not.

Note: Select a measurement level of String for string inputs. String inputs that are to be simulated are restricted to the Categorical distribution.

Fixed value input.This specifies that the value of the input is known and will be fixed at the known value. Fixed inputs can be numeric or string. Specify a value for the fixed input. String values should not be enclosed in quotation marks.

Value labels.You can specify value labels for targets, simulated inputs and fixed inputs. Value labels are used in output charts and tables.

Simulation tab

The Simulation tab specifies all properties of the simulation other than the predictive model. You can perform the following general tasks on the Simulation tab:

v Specify probability distributions for simulated inputs and values for fixed inputs.

v Specify correlations between simulated inputs. For categorical inputs, you can specify that associations that exist between those inputs in the active dataset are used when data are generated for those inputs. v Specify advanced options such as tail sampling and criteria for fitting distributions to historical data. v Customize output.

v Specify where to save the simulation plan and optionally save the simulated data.

Simulated Fields

To run a simulation, each input field must be specified as fixed or simulated. Simulated inputs are those whose values are uncertain and will be generated by drawing from a specified probability distribution. When historical data are available for the inputs to be simulated, the distributions that most closely fit the data can be automatically determined, along with any correlations between those inputs. You can also manually specify distributions or correlations if historical data are not available or you require specific distributions or correlations.

Fixed inputs are those whose values are known and remain constant for each case generated in the simulation. For example, you have a linear regression model for sales as a function of a number of inputs including price, and you want to hold the price fixed at the current market price. You would then specify price as a fixed input.

For simulations based on a predictive model, each predictor in the model is an input field for the simulation. For simulations that do not include a predictive model, the fields that are specified on the Model tab are the inputs for the simulation.

Automatically fitting distributions and calculating correlations for simulated inputs.If the active dataset contains historical data for the inputs that you want to simulate, then you can automatically find the distributions that most closely fit the data for those inputs as well as determine any correlations between them. The steps are as follows:

1. Verify that each of the inputs that you want to simulate is matched up with the correct field in the active dataset. Inputs are listed in the Input column and the Fit to column displays the matched field in the active dataset. You can match an input to a different field in the active dataset by selecting a

In document IBM SPSS Statistics Base 22 (Page 167-198)