Knowledge Gained from Experiments 65

2 Literature Review 8

3.5 Knowledge Gained from Experiments 65

Some observations made from the results of performing these preliminary experiments include:

- If a set of class labels exists already and can be specified for

all training instances, then supervised learning is preferred.

- For any supervised classification algorithm to perform their best,

it is important to first and foremost ensure that the size of the labelled train dataset is larger than the test dataset (it is assumed in this paper, based on the experiments performed that this should be around a ratio of 1:25).

- When the number of test instances to be classified is small,

Increasing the number of folds increases the accuracy of random forest with nominal data (only by a non-significant difference though if the train data set is large).

- 66 -

- Increasing the number of folds from 3 to 10 increases accuracy

of random forest with numeric data (only by a little due to a larger train dataset size used).

- Increasing the number of folds from 3 to 10 increase accuracy of

random forest with mixed data (only by a little due to a larger train dataset size used).

- For random forest, when the total number of instances is really

small e.g. 24 or 30, its best to use 3 folds. Increasing its number of folds only reduces its performance in such cases.

- We cannot use Naïve Bayes for numeric dataset, and it is very

important to train the autoML system designed with these limitations by default.

- Increasing folds from 3 to 10 for NB will improve the accuracy

(only a little but the time taken to build the model is much faster than RF) for a large train dataset.

- For Naive Bayes, it is best to use 3 folds if the dataset for

training is really small.

- Unsupervised learning is preferable if no pre-existing class

label exists,

- Unsupervised learning is preferable if the training set is way

smaller than the sample set to be tested.

- When the class attribute type is ‘numeric’, use the RF algorithm. - When the class attribute is ‘nominal’, and all other attributes

are nominal and the total number of attributes are less than 10, and the number of instances are less than 50 with missing values <1% in total, then use the J48.

- When the class attribute type is ‘nominal’, but the other

attributes contain ‘String’ type attributes, then use the ZeroR or Stacking algorithm.

- When the class attribute type is ‘nominal’, but we have at least

half as many numeric attributes as there are nominal (i.e. the ratio of numeric to nominal is close to the scale of 1:2), then use the RF algorithm.

- When the class attribute is ‘nominal’, and the total number of

attributes are less than 10 with all other attributes as ‘numeric’,

and there are no missing values, and the total number of instances are greater than 500, then using the SGD algorithm is favourable.

- When the class attribute type is ‘nominal’, and the total number

of instances are less than 500, and we have more or all other

attributes as ‘numeric’, then use the RF.

- When the class attribute type is ‘nominal’, and the number of

numeric attributes to nominal attributes are not any close to a ratio of 1 to 2, then use the NB algorithm.

- When the class attribute type is ‘nominal’, and the total number

of instances is greater than 500, and the total number of attributes is greater than 10, and we have more numeric attributes than nominal, then use RF.

- When the class attribute type is ‘nominal’, and the total number

- 67 -

attributes is greater than 10, and all other attribute types are nominal, and the missing values are not up to 1% (i.e. they are <1%), then we can use NB.

- When the class attribute type is ‘nominal’, and the total number

of attributes is greater than 100, and the total number of instances are greater than 1000, and the number of missing values are > 50%, then we can choose to use the SGD.

- Last but not the least, when the class attribute type is ‘nominal’,

and the total number of attributes are greater than 10, and all nominal, with missing values > 1% present in the dataset, then we use the RF.

The conclusions derived from these experiments allows us to easily describe the decision learning (learning to learn) process of the auto ML system proposed as a set of Rules. Below in the following subsection, we will be discussing the Meta learning algorithm designed to this effect. As well as provide us with more details about the auto Machine Learning (autoML) system modelled in this research and from the observations listed above.

Summary

This chapter describes and discusses a combination of research methodologies e.g. experimental, theoretical and systems design used in this thesis. Therefore, allowing us to eliminate as much as

possible every limitation that can be encountered with the

individual methods themselves. For example, experimental research methodology has a limitation because the experiments are performed mainly in a controlled environment and might not reflect properly

some practices performed ‘in the wild’. But combining this with some survey and prototype (system’s) design, reduced such limitations. The knowledge gained from carrying out preliminary experimentation is used in the next following chapter to design and model the

- 68 -

Chapter 4

In document Hybrid Automated Machine Learning System for Big Data (Page 75-78)