5.2 Analysis of Random Trees
5.2.2 Deriving the Number of Trees and Height of Trees using Theory of
Theory of Coupon Collector Problem
In the previous section, we observe that the performance of random trees will be decreasing when the number of features increases. In this section, we intend to derive a relationship
(a) Performance with syn-1
(b) Performance with syn-2
Fig. 5.5: Variation of AUC with the number of trees used in the syn-1 and syn-2 datasets
between the number of trees and height of trees required in term of number of features present in a dataset by casting the random trees to coupon collector problem.
Number of Nodes Needed with Noisy Features
Not all features are informative, only a subset I of F are meaningful whereas F \I are spurious. Let m = |F| be the total number of features, and k = |I| be the number of informative features, som−k = |F| − |I|is the number of noisy features which are not contributing to anomaly detection. In constructing a random forest, at each node, a feature is randomly selected fromm features. The probability that an informative feature will be
(a) Performance with syn-1
(b) Performance with syn-2
Fig. 5.6: Variation of AUC with the height of the trees used in the syn-1 and syn-2 datasets
selected at each tree node is:
P r(SelectInf orm) =k/m.
If h is the height of each tree, then the number of nodes in it is node(treesi) = 2h −1
sincetreesiis a complete binary tree. LetF(treesi)be the number of informative features
covered by treetreesi, then
The expected number of informative features selected is(2h −1)k
m will be small if k m is
small. If we wish to construct an ensemble of trees such that each informative feature is presented at least in one tree, then, we need more nodes to cover more informative features. If the heights of trees are the same, then we will need more trees in our forest to cover as many informative features.
We now discuss how to find the expected number of trees and height for the trees when the number of features are fixed.
The coupon collector’s problem - Analysis of tree height
In the coupon collector’s problem [47], there aredtypes of coupons and they are drawn at random at each trial. Letr be the number of trials for one to collect at least one copy of each of the dtypes of coupons. The goal of the coupon collector’s problem is to find out what is the relationship betweenrandd.
The similarity between the random trees and the coupon collector’s problem is that if we treat each feature as a type of coupon, each detection path in the tree can be treated as an experiment with n trials – wheren is the number of nodes in a random tree of height
h. If an anomaly is jointly described by d features, each tree should capture at least one copy of each of the dfeatures. To study the relationship between the number of nodes n
and number of features present, we adopt the theoretical results for the coupon collector’s problem.
We show that when the tree height is
h= log2(βdlnd+ 1),
the probability that at least one of the features is not captured is bounded byd−(β−1), where
β >1.
least one copy of each type of thedfeatures. The expected number of nodes is E[Xd] =d d X i=1 1 i =dHd
whereHdis the harmonic sum [47].
Let σn
i be the event that featurei is not selected nuin n nodes, the probability of this
event is:
P r[σin] = (1− 1
d) n≤
e−nd,
forn =βdlnd, this bound isd−β, whereβ >1is a constant.
Thus, the probability that at least one of the features is not captured in thennodes is
P r[∪d i=1σ n i]≤ d X i=1 P r[σni]≤ d X i=1 d−β =d−(β−1),
for a random tree with number of nodes n = βdlnd, consequently, the tree height h = log2(n+ 1) = log2(βdlnd+ 1).
Number of treesT for a given tree heighthand number of featuresd
Given tree heighth, each tree hasn = 2h−1nodes. ForT such trees, the total number of
nodes isnT. Number of treesT is chosen such that the probability that each feature occurs at least in one of theT trees should be larger than1−ν. From the results from the coupon collector problem, we have:
P r(Xd=k) = d−1 X j=0 (−1)j d−1 j (1−1 +j d ) k−1 .
It is desired that: 1−(1−P r[d≤Xd≤nT])≥1−ν (5.2) P r[d≤Xd≤nT]≥1−ν (5.3) nT X i=d P r(Xd=i)≥1−ν (5.4) nT X i=d d−1 X j=0 (−1)j d−1 j 1− 1 +j d k−1 ≥1−ν (5.5)
This is a combinatorial problem, and numerical solutions are shown in Figure 5.7.