Chapter 2 Literature Review
2.3 Data Analysis
2.3.2 Machine Learning
When considering acquiring knowledge, there are methods which can constitute an overlap between knowledge acquisition and data mining: while knowledge acquisition methods focus on extracting knowledge from a human expert, there are also methods which attempt to acquire knowledge through automated means (Witten & Frank, 2005). Machine learning as a term describes any method whose goal is for a computer system to obtain new knowledge about a subject, in a reproducible format, from a set of data. While this incorporates the goals of knowledge acquisition and has an obvious overlap with knowledge discovery, machine learning methods involve the computer taking a much more active role in the learning process: the focus is on how the computer system can identify relevant information about the subject, with minimal human input. These methods take the approach that the reduction of human input is the key to avoiding the knowledge acquisition bottleneck, and also facilitates the removal of unintentional bias, to allow purely statistical and logical methods to find points of interest in the data (Grefenstette, Ramsey, & Schultz, 1990; Hong, Wang, Wang, & Chien, 2000).
2.3.2.1
History
Machine learning as a field came into existence largely because of perceived shortcomings with knowledge acquisition (Grefenstette, et al., 1990). While knowledge acquisition methods showed success in some applications, research and development in the expert systems area discovered that the most significant problem faced, negatively impacting on both the effectiveness and cost of creating an expert system, was the knowledge acquisition phase. As has been mentioned, this ―knowledge acquisition bottleneck‖ caused a change of attitude in the area, shifting the focus from trying to model human expertise directly, towards automated processes of deriving expertise (B. G. Buchanan & Shortliffe, 1984; Grefenstette, et al., 1990; Hong, et al., 2000; Sester, 2000). Machine learning is the result of that
shift. It has been described as the use of statistical analysis of data to derive knowledge about how a domain functions (Witten & Frank, 2005).
The major benefit of this is being able to create an expert system or to derive domain knowledge by analysing collected data, with limited expertise required: removing the necessity of having a human expert in the domain expend considerable time and effort developing and engineering knowledge in the system (Quinlan, 1986; Witten & Frank, 2005). This is of particular benefit in subject domains where an expert’s time is quite valuable. Machine learning methods also allow the possibility of discovering the knowledge in a different manner to the way in which the expert would describe it – this may be an advantage or a disadvantage, depending on the domain and the ability of the experts to communicate domain knowledge. For example, the method may discover relationships that would otherwise go unexplored because the current expertise in the field does not suggest any such relationship could exist; or it may be a disadvantage, because relationships may be discovered which are present in the dataset but which are not present in the wider domain. It may also be disadvantageous because the method of discovering the relationships can be less efficient, effective or comprehensible than those used by an expert (Piatetsky-Shapiro, 1990).
2.3.2.2
Machine Learning Drawbacks
Machine learning methods are generally most effective in applications where the data that is being used for acquiring or discovering knowledge is sufficiently detailed that conclusions can be drawn from it alone, without further domain knowledge being applied – typically data that has been classified as being of a certain type, or that can easily be categorised according to type, allows statistical methods to find new relationships from the existing relationships and other data (Witten & Frank, 2005). The existing classifications represent a level of domain expertise that has been applied to the data, either from an expert who has examined each case and provided the classifications as extra information, or from an expert who knows which attributes of the set are important.
Machine learning methods are also only particularly effective in domains where the target knowledge (i.e. the knowledge the method is trying to discover) is relatively simplistic: complex relationships which have a practical use are difficult to derive
without also deriving large amounts of other relationships which are meaningless, coincidental, or overly specific to the dataset (Witten & Frank, 2005). When the goal of the machine learning is knowledge discovery, not just data mining for the purposes of training an expert system, it is required for an expert in the domain to examine the relationships discovered and to determine what is useful and what is not (Abe & Yamaguchi, 2005). If the relationships are too many or too complex then this will be a highly difficult and time consuming process, negating the advantages of this approach.
Another drawback is that machine learning can only discover knowledge that is present within the dataset being used: if the dataset is of insufficient size, or happens to contain statistical relationships which are not representative of the domain, then the method will either miss relationships or find misleading relationships; whereas an expert can use their extended knowledge of the domain to make judgements on what is likely to be coincidence and what is likely to be supported by further data (Hall & Smith, 1998).