• No results found

Bootstrapping with OMS

In document Spss Example (Page 162-166)

Bootstrapping is a method for estimating population parameters by repeatedly resampling the same sample—computing some test statistic on each sample and then looking at the distribution of the test statistic over all the samples. Cases are selected randomly, with replacement, from the original sample to create each new sample. Typically, each new sample has the same number of cases as the original sample; however, some cases may be randomly selected multiple times and others not at all. In this example, we

„ use a macro to draw repeated random samples with replacement;

„ run theREGRESSIONcommand on each sample;

„ use theOMScommand to save the regression coefficients tables to a data file;

„ produce histograms of the coefficient distributions and a table of confidence intervals, using the data file created from the coefficient tables.

The command syntax file used in this example is oms_bootstrapping.sps.

OMS Commands to Create a Data File of Coefficients

Although the command syntax file oms_bootstrapping.sps may seem long and/or complicated, theOMScommands that create the data file of sample regression coefficients are really very short and simple:

„ ThePRESERVEcommand saves your currentSETcommand specifications, andSET TVARS NAMESspecifies that variable names—not labels—should be displayed in tables. Since variable names in data files created byOMSare based on table column labels, using variable names instead of labels in tables tends to result in shorter, less cumbersome variable names.

„ DATASET DECLAREdefines a dataset name that will then be used in theREGRESSION command.

„ The firstOMScommand prevents subsequent output from being displayed in the Viewer until anOMSENDis encountered. This is not technically necessary, but if you are drawing hundreds or thousands of samples, you probably don’t want to see the output of the corresponding hundreds or thousands ofREGRESSIONcommands.

„ The secondOMScommand will select coefficients tables from subsequentREGRESSION commands.

„ All of the selected tables will be saved in a dataset named bootstrap_example. This dataset will be available for the rest of the current session but will be deleted automatically at the end of the session unless explicitly saved. The contents of this dataset will be displayed in a separate Data Editor window.

„ TheCOLUMNSsubcommand specifies that both the ‘Variables’ and ‘Statistics’ dimension elements of each table should appear in the columns. Since a regression coefficients table is a simple two-dimensional table with ‘Variables’ in the rows and ‘Statistics’ in the columns, if both dimensions appear in the columns, then there will be only one row (case) in the generated data file for each table. This is equivalent to pivoting the table in the Viewer so that both

‘Variables’ and ‘Statistics’ are displayed in the column dimension.

Figure 9-11

Variables dimension element pivoted into column dimension

Sampling with Replacement and Regression Macro

The most complicated part of theOMSbootstrapping example has nothing to do with theOMS command. A macro routine is used to generate the samples and run theREGRESSIONcommands.

Only the basic functionality of the macro is discussed here.

DEFINE regression_bootstrap (samples=!TOKENS(1)

samples=100 depvar=salary

indvars=salbegin jobtime.

„ A macro named regression_bootstrap is defined. It is designed to work with arguments similar to IBM® SPSS® Statistics subcommands and keywords.

„ Based on the user-specified number of samples, dependent variable, and independent variable, the macro will draw repeated random samples with replacement and run theREGRESSION command on each sample.

„ The samples are generated by randomly selecting cases with replacement and assigning weight values based on how many times each case is selected. If a case has a value of 1 for sampleWeight, it will be treated like one case. If it has a value of 2, it will be treated like two cases, and so on. If a case has a value of 0 for sampleWeight, it will not be included in the analysis.

„ TheREGRESSIONcommand is then run on each weighted sample.

„ The macro is invoked by using the macro name like a command. In this example, we generate 100 samples from the employee data.sav file. You can substitute any file, number of samples, and/or analysis variables.

Ending the OMS Requests

Before you can use the generated dataset, you need to end theOMSrequest that created it, because the dataset remains open for writing until you end theOMSrequest. At that point, the basic job of creating the dataset of sample coefficients is complete, but we’ve added some histograms and a table that displays the 2.5th and 97.5th percentiles values of the bootstrapped coefficient values, which indicate the 95% confidence intervals of the coefficients.

OMSEND.

DATASET ACTIVATE bootstrap_example.

FREQUENCIES

VARIABLES=salbegin_B salbegin_Beta jobtime_B jobtime_Beta /FORMAT NOTABLE

/PERCENTILES= 2.5 97.5 /HISTOGRAM NORMAL.

RESTORE.

„ OMSENDwithout any additional specifications ends all activeOMSrequests. In this example, there were two: one to suppress all Viewer output and one to save regression coefficients in a data file. If you don’t end bothOMSrequests, either you won’t be able to open the data file or you won’t see any results of your subsequent analysis.

„ The job ends with aRESTOREcommand that restores your previousSETspecifications.

Figure 9-12

95% confidence interval (2.5th and 97.5th percentiles) and coefficient histograms

In document Spss Example (Page 162-166)