The overall performance of the current tree is summarized in the four Summary Reports dialog tabs. To access the reports, click the [Summary Reports…] button at the bottom of the Navigator window (or select Tree Summary Reports… from the Tree menu).
Profit
The Profit tab provides a useful model summary in terms of the profit associated with each node. It is assumed that each record in a dataset is associated with a certain continuous amount of profit. This information is either represented by the continuous target itself (in which case the profit value is the actual target of modeling), or by any other continuous variable present in the dataset (cross-evaluation of model).
First, choose the Profit Variable carrying information about the profit associated with each record in the dataset. By default, this variable is set to the target variable in regression runs; however, it could be changed to any of the continuous auxiliary variables that were specified in the Model tab of the Model Setup dialog.
Second, specify the Default Sort Order. This setting will control how the terminal nodes of the currently-selected tree are ordered on the table and the graph above.
Currently, sorting either by Profit Learn (node sum of profit values in the Learn data) or Average Profit Learn (Profit Learn divided by node size) is available.
Third, choose one of the four possible measures to be displayed on the vertical axis of the graph by pressing the following group of buttons:
Profit—within-node accumulated profit.
Ave. Profit—Profit divided by the node case count.
Cum. Profit—same as Profit but accumulated over all nodes in the sorted sequence up until the current node.
Cum. Ave. Profit—Cum. Profit divided by the total number of cases in all nodes in the sorted sequence up until the current node.
All four measures, as well as node case counts, are reported on the table.
In the presence of the explicit Test sample, the user can also choose among Learn, Test, and Pooled measures using the corresponding buttons.
The Zoom and Chart Type controls change the visual appearance of the graph.
Terminal Nodes
The Terminal Nodes tab displays box plots for the node distributions of the target sorted by the mean. Hover over any of the boxes to see detailed information about the node.
When separate learn and test parts of the data are used, [Learn] and [Test]
buttons allow switching between learn and test distributions. No matter which button is pressed, the nodes are always sorted by the learn means to quickly assess node stability.
Root Splits
The Root Splits lists ALL root node competitors sorted in descending node by split improvement. The report also shows split details in terms of case counts.
While the competitor information is also available for all internal nodes by clicking on the node itself, it is usually limited to only the top five entries.
Variable Importance
The Variable Importance tab: same as classification but importance scores are now based on regression improvements. (See Chapter 3: CART BASICS for discussion of Variable Importance.)
Detailed Node Reports
To see what else we can learn about our regression tree, return to the Navigator by closing the Summary Reports window. To request a detailed node information display, simply click on the node of interest, for example, left-click on the left child of the root node (internal node 2).
The Competitors and Surrogates tab
As illustrated below, the first of the four tabs in the non-terminal node report provides node-specific information on both the competitor and surrogate splits for the selected node (in this case, the root node). This results tab is discussed in detail in Chapter 3:
CART BASICS.
The Box Plots tab
The Box Plots tab shows the current node box plot on the left-hand side and two children box plots on the right-hand side. This helps to interpret the nature of the split.
The blue box depicts the inter-quartile range, with the top of the box (or upper hinge) marking the 75th quartile and the bottom (lower hinge) marking the 25th quartile for the target variable MV. The horizontal green line denotes the node-specific median while the whiskers (or upper and lower fences) extend to plus/minus 1.5 times the inter-quartile range. Red plusses represent values outside the fences, usually referred to as “outliers.”
The Rules tab
The third tab in the node report, the Rules tab, is displayed as follows. For reference, we display the Rules tab for Node 2. Non-terminal and terminal node reports (with the exception of the root node) contain a Rules tab. This tab is discussed in detail in Chapter 3: CART BASICS.
The Splitter tab
When the main splitter is continuous, the left- and right-child summary statistics of the target are displayed in table form.
When the main splitter is categorical, the partition of the splitter’s levels between the left and right sides is displayed. This results tab is discussed in more detail in Chapter 3: CART BASICS.
Terminal Node Report
To view node-specific information for a terminal (red) node, click on the terminal node (or right-click and select Node Report). For our example, left-click on terminal node 18 (far right terminal node).
The Node Statistics tab
The Node Statistics tab shows the current node target box plot in comparison with the target box plot for the root node (the entire learn sample). This helps us to see whether the high-end or the low-end segment of the population is contained in the current node. Node-specific summary statistics are also reported. Both the color-coding and the relative position of this node compared to the root node suggest that the highly-priced segment is contained in this node.
The Rules tab has been described above.
For further discussion of regression tree modeling, splitting rules, and interpreting regression node statistics, see the CART Reference Manual.
Viewing Rules
There are several flexible ways to look at the rules associated with an entire tree or some specific parts of the tree.
In the Navigator window, you can tag terminal nodes for further use by hovering the mouse over, right-mouse clicking, and selecting Tag Node menu item. In the following example we tagged all nodes color coded in red and pink (high-end neighborhoods).
Next we request an overall Rules display either via View->Rules… menu or by right-mouse clicking on the root node and choosing the Rules item.
The resulting window contains rules for the entire tree when [All] is pressed or only for the tagged terminal nodes when [Tagged] is pressed.
Both Classic and SQL rule notations are supported.
You can also limit the rules display to a specific branch in a tree by right-mouse clicking on the branch root and choosing the Rules item. The resulting window will only list rules for the terminal nodes covered by the selected branch as well as rules leading to the given branch.
The Main Tree Rules display only gives node-based rules, ignoring missing value handling mechanisms entirely.
To request a full display of the tree logic, including missing value handling, check the chapter called Translating Model in this manual.
Chapter