2 Literature Review 8
3.5 Knowledge Gained from Experiments 65
Some observations made from the results of performing these preliminary experiments include:
- If a set of class labels exists already and can be specified for
all training instances, then supervised learning is preferred.
- For any supervised classification algorithm to perform their best,
it is important to first and foremost ensure that the size of the labelled train dataset is larger than the test dataset (it is assumed in this paper, based on the experiments performed that this should be around a ratio of 1:25).
- When the number of test instances to be classified is small,
Increasing the number of folds increases the accuracy of random forest with nominal data (only by a non-significant difference though if the train data set is large).
- 66 -
- Increasing the number of folds from 3 to 10 increases accuracy
of random forest with numeric data (only by a little due to a larger train dataset size used).
- Increasing the number of folds from 3 to 10 increase accuracy of
random forest with mixed data (only by a little due to a larger train dataset size used).
- For random forest, when the total number of instances is really
small e.g. 24 or 30, its best to use 3 folds. Increasing its number of folds only reduces its performance in such cases.
- We cannot use Naïve Bayes for numeric dataset, and it is very
important to train the autoML system designed with these limitations by default.
- Increasing folds from 3 to 10 for NB will improve the accuracy
(only a little but the time taken to build the model is much faster than RF) for a large train dataset.
- For Naive Bayes, it is best to use 3 folds if the dataset for
training is really small.
- Unsupervised learning is preferable if no pre-existing class
label exists,
- Unsupervised learning is preferable if the training set is way
smaller than the sample set to be tested.
- When the class attribute type is ‘numeric’, use the RF algorithm. - When the class attribute is ‘nominal’, and all other attributes
are nominal and the total number of attributes are less than 10, and the number of instances are less than 50 with missing values <1% in total, then use the J48.
- When the class attribute type is ‘nominal’, but the other
attributes contain ‘String’ type attributes, then use the ZeroR or Stacking algorithm.
- When the class attribute type is ‘nominal’, but we have at least
half as many numeric attributes as there are nominal (i.e. the ratio of numeric to nominal is close to the scale of 1:2), then use the RF algorithm.
- When the class attribute is ‘nominal’, and the total number of
attributes are less than 10 with all other attributes as ‘numeric’,
and there are no missing values, and the total number of instances are greater than 500, then using the SGD algorithm is favourable.
- When the class attribute type is ‘nominal’, and the total number
of instances are less than 500, and we have more or all other
attributes as ‘numeric’, then use the RF.
- When the class attribute type is ‘nominal’, and the number of
numeric attributes to nominal attributes are not any close to a ratio of 1 to 2, then use the NB algorithm.
- When the class attribute type is ‘nominal’, and the total number
of instances is greater than 500, and the total number of attributes is greater than 10, and we have more numeric attributes than nominal, then use RF.
- When the class attribute type is ‘nominal’, and the total number
- 67 -
attributes is greater than 10, and all other attribute types are nominal, and the missing values are not up to 1% (i.e. they are <1%), then we can use NB.
- When the class attribute type is ‘nominal’, and the total number
of attributes is greater than 100, and the total number of instances are greater than 1000, and the number of missing values are > 50%, then we can choose to use the SGD.
- Last but not the least, when the class attribute type is ‘nominal’,
and the total number of attributes are greater than 10, and all nominal, with missing values > 1% present in the dataset, then we use the RF.
The conclusions derived from these experiments allows us to easily describe the decision learning (learning to learn) process of the auto ML system proposed as a set of Rules. Below in the following subsection, we will be discussing the Meta learning algorithm designed to this effect. As well as provide us with more details about the auto Machine Learning (autoML) system modelled in this research and from the observations listed above.
Summary
This chapter describes and discusses a combination of research methodologies e.g. experimental, theoretical and systems design used in this thesis. Therefore, allowing us to eliminate as much as
possible every limitation that can be encountered with the
individual methods themselves. For example, experimental research methodology has a limitation because the experiments are performed mainly in a controlled environment and might not reflect properly
some practices performed ‘in the wild’. But combining this with some survey and prototype (system’s) design, reduced such limitations. The knowledge gained from carrying out preliminary experimentation is used in the next following chapter to design and model the
- 68 -