This section is a guide to the reporting and other fine-tuning global controls you may want to set before you grow your trees. These parameters are contained in the Options dialog accessed by selecting Options… from the Edit menu (or clicking , on the toolbar icon).
If you are in the Model Setup dialog box, you must first click on the [Continue] button to access Options from the Edit menu.
General Text Report Preferences
CART is actually part of an integrated data mining system offering several analytical methods. The CART 6.0 -Standard Edition product only offers the CART subsystem at this time but in the future other modules will become available. The Options—
General tab controls report and display preferences that are common across several data mining technologies (including TreeNet and RandomForests). The screen shot below shows one set of user preferences:
The report preferences allow you to turn on and off the following parts in the CART classic output (with command-line equivalents included):
♦ Summary stats for all model variables—mean, standard deviation, min, max, etc. In classification models the stats are reported for the overall train and test samples and then separately for each level of the target.
LOPTIONS MEANS=YES | NO
♦ Prediction success tables - confusion matrix with misclassification counts and
%’s by class level.
LOPTIONS PREDICTIONS=YES | NO
♦ Report analysis time - CPU time required for each stage of the analysis.
LOPTIONS TIING=YES | NO
♦ Report Gains tables.
LOPTIONS GAINS=YES | NO
♦ Report ROC tables.
LOPTIONS ROC=YES | NO
♦ Decimal places - precision to which the numerical output is printed.
FORMAT = <N>
♦ Exponential notation for near-zero values - exponential notation used for values close to zero.
FORMAT = <N> / UNDERFLOW
ROC Graph Labels
ROC graphs are traditionally labeled differently in different industries. You can select from the two labeling schemes displayed below:
Press the [Save as Defaults] button to save your preferences permanently. If you have made some temporary changes and wish to restore your previously-saved defaults, press the [Recall Defaults] button.
Use Short Command Notation
Sets the minimal number of predictors that triggers a short command notation in the command log. When the number of predictors is small, each predictor is printed in the command log (for example, KEEP or CATEGORY commands). However, when the number of predictors exceeds the limit, CART uses “dash” convention to indicate ranges of predictors (for example, X1-X5).
This setting only affects the GUI logging mechanism. The command parser supports both short and standard command notations.
Window to Display When File Is Opened
When you open a data file CART gives you three choices for what to do next:
Classic Output
This is the classic text mainframe style output suitable for diehard UNIX and Linux gurus. You will be greeted with a plain text screen looking something like:
Data Description/Activity Window
This new window can function as a brief description of your data file and a control panel for other data exploration and analysis activities.
From this screen you can conveniently request summary statistics, a spreadsheet view of the data, or the model set-up dialog, and you can also move directly to scoring the data using a previously-saved model.
Once you close this window it can be reopened by clicking on the toolbar icon (hammer and wrench icon).
Model Setup
This is the window that came up automatically in CART 4.0 and CART 5.0 and you can also put CART 6.0 into this mode.
Default Variable Sorting Order
Many GUI displays include a list of variables and you can always change the sort order between Alphabetical and File Order (the order in which the variables appear in your data file). This setting allows you to determine the ordering that will always show first when a dialog is opened.
Controlling CART Report Details
The parameters controlling the contents of the CART Output window can be set in the Options—CART tab. This is the middle tab on the Options dialog. The default Reporting settings are shown below:
Full Node Detail or Summaries Only
Previous versions of CART printed full node detail for CART trees. These reports can be voluminous as they contain about one text page for every node in an optimal tree.
If you elect to produce these details you can easily end up with more than the equivalent of 1000 pages of plain text reports.
We have now set the default to printing only summary tables, as most users do not refer to the classic text node detail.
You can always recover the full node detail text report from any saved grove file via the TRANSLATE facility. Thus, there is no longer any real need to produce this text during the normal tree-growing process.
Summary Plots
These are classic mainframe line printer style plots for a few classic CART graphs.
You can see these plots in the GUI so they are turned off by default.
Number of Surrogates to Report
Sets the maximum number of surrogates that can appear in the text report and the navigator displays.
This setting only affects the displays in the text report and the Navigator windows. It does not affect the number of surrogates calculated.
The maximum number of surrogates calculated is set in the Best Tree tab of the Model Setup dialog.
You can elect to try to calculate 10 surrogate splitters for each node but then display only the top five. No matter how many surrogates you request you will get only as many as CART can find. In some nodes there are no surrogates found and the displays will be empty.
The command-line equivalent of the number of surrogates to report is:
BOPTIONS PRINT=<N>
Number of Competitors to Report
Sets the maximum number of competitors that appear in reports.
Every variable specified in your KEEP list or checked off as an allowed predictor on your Model Set Up is a competitor splitter. Normally we do not want or need to see how every one of them performed. The default setting displays the top five but there is certainly no harm in setting this number to a much larger value.
CART tests every allowed variable in its search for the best splitter. This means that CART always measures the splitting power of every predictor in every node. You only need to choose how much of this information you would like to be able to see in a navigator. Choosing a large number can increase the size of saved navigators/groves.
Command-line equivalent BOPTIONS COMPETITORS=<N>
Number of Trees to List in the Tree Sequence Summary
Each CART run prints a summary of the nested sequence of trees generated during growing and pruning. The number of trees listed in the tree-sequence summary can be increased or decreased from the default setting of 10 by entering a new value in the text box.
This option only affects CART’s classic output.
Command-line equivalent BOPTIONS TREELIST=<N>
Cross-validation Details: Classic Text Report
If you use the cross validation testing method, you can request a text report for each of the maximal trees generated in each cross validation run by clicking on the corresponding radio button for this option.
For example, if testing is set to the default 10-fold cross validation, a report for each of the ten cross-validated trees will follow the report on the final pruned tree in the text output. For this option to have full effect be sure to uncheck the “Only summary tables of node information.” The GUI offers more a convenient way to review these CV details.
Command-line equivalent BOPTIONS BRIEF
BOPTIONS COPIOUS
Controlling Random-Number Seed Values
As illustrated below, the Options—CART tab also allows you to set the random-number seed and to specify whether the seed is to remain in effect after a tree is built or data are dropped down a tree. Normally the seed is reset to 13579, 12345, and 131 on start-up and after each tree is constructed or after data are dropped down a tree. The seed will retain its latest value after the tree is built if you click on the Retain most recent values for succeeding run radio button.
Command-line equivalent.
SEED <N1>, <N2>, <N3>, NORETAIN SEED <N1>, <N2>, <N3>, RETAIN
Setting Directory Preferences
The Option—Directories tab allows you to set default directory preferences for input (data, model and command), output (model, scoring results, translation code and text report), and temporary files. By default, all input and output directories are initially set to the CART installation directory; the temporary directory is your machine’s temporary Windows directory. Below we have set directory preferences for our input and output files.
To change any of the default directories, click on the button next to the appropriate directory and specify a new directory in the Select Default Directory dialog box. CART will retain default directory settings in subsequent analysis sessions.
When the Most Recently Used File list checkbox is marked, CART adds the list of recently-used files to the File->Open menu.
Input Files
Data: –input data sets (train and test) for modeling
Model information: –previously-saved model files (navigators and groves)
Command: –command files
Output Files
Model information: –model files (groves) will be saved here
Prediction results: –output data sets from scoring and translation code Run report: –classic output
Temporary Files
Temporary: –where CART will write temporary work files as needed –where CART will write the command log audit trail
We suggest dedicating a separate temporary folder to CART.
Make it a habit to routinely check the Temporary Files Directory for unwanted scratch files. These should only appear if for some reason your system crashed or was powered down in a way that did not permit CART to clean up.
Depending on your preferences, you may choose one of two working styles:
(1) using the same location for both input and output files (2) using separate locations for input and output files
The files with names like CART06125699_.txt are valuable records of your work sessions and provide an audit trail of your modeling activity. Think of them as emergency copies of your command log. You can delete these files if you are confident that your other records are adequate.
Make sure that the drive where the temporary folder is located will have enough space (at least the size of the largest data set you are planning to use).
Additional Control Functions
–Control icon that automatically copies your Data file info to all other locations in the dialog (except the Temporary File location).
–Control icon that lets the user browse among directories.
–Control that allows the user to select from a list of previously-specified directories.
-Control that allows the user to specify how many recently-used files to remember in the File-Open menu. The maximum allowed is 20 files.