Alzheimer’s disease costs its victims their lives, and it has a tremendous effect on health organisations and the economy all around the world. Statistics show that around 1 in 10 people over the age of 65 will be affected by Alzheimer’s. Unfortunately, there are no effective cures for this disease and no one is immune [16]. Getting old and weak is something most people find hard to accept. It’s a struggle that mature people over the age of 50 are facing but the concern doubles with the fear of losing their memory due to dementia. Elderly people who are affected by dementia are living the experience of watching themselves die slowly, fade away from their world, live in constant confusion, and no longer able to understand their surroundings. It is a horrible experience to endure for Alzheimer’s disease victims, their carers, and their families. Having Alzheimer’s disease means losing loving memories, the ability to recognise family members, and childhood memories, or even the ability to follow simple instructions e.g. making their usual morning cup of coffee, remembering how to use the toilet, and maintain self-hygiene [2][3][4].
Taking action to fight Alzheimer’s disease will enable people to live longer and more independently. Unfortunately, what exactly causes Alzheimer’s disease to develop is still undiscovered, however, research shows there are several known risk factors that contribute to its development. These risk factors fall into an array of categories; medical history, lifestyle, family dementia history, characteristics, and demography. Moreover, these risk factors are
47 | P a g e
classed as either behavioural markers or biological markers of Alzheimer’s disease. Many research initiatives fighting Alzheimer’s disease mostly focus on either new drug development or investigating the disease by studying its biological markers. With the expansion of computer science, several research approaches have emerged using the power of data science and machine learning to study Alzheimer’s disease. Unfortunately, because of the challenges more specifically data limitations, researchers were inclined to carry out their study on biological markers of Alzheimer’s disease and almost neglected its behavioural markers. Our research comprehensively studies Alzheimer’s disease risk factors using both behavioural and biological markers to seek possible early prediction, or an onset diagnosis of the disease.
48 | P a g e
Literature Review
Introduction
The world is almost fully dependent on the aid of computers. We use computers in almost everything such as agriculture, medical care, trade, travel, manufacturing, and communication. Computers are heavily used to aid us with decision making or to achieve tasks very quickly that the human mind wouldn’t be able to achieve.
Data analysis and machine learning are interdisciplinary fields, where the former uses different scientific methods to collect, store, and extract data, and the latter provides systems with the ability to learn and improve from experiences using data without being explicitly programmed. These two fields of study are closely related and have tremendously impacted and accelerated the development of technology.
The different methods and techniques from the fields of data analysis and machine learning were applied throughout this research; therefore, this chapter presents a literature review of machine learning and data analysis, as well as an overview of the related work to the research problem in this thesis.
An Overview on Data Science
Data is used commonly used to extract knowledge and insights that would help us with decision making. However, to acquire useful information that would help us make decisions, we would
49 | P a g e
need to collect a data set relevant to our problems, then analyse the data in an effective way using scientific methods and algorithms to give us the information we need.
Manual extraction and analysis of data is extremely difficult, and, in some cases, it is impossible to do it without the help of machine learning tools and methods, especially, when we have large datasets. For example, companies such as Amazon and eBay use data science and machine learning to analyse a mixture of structured, semi-structured and unstructured data in search of valuable business information and insights [68] [69] [70].
The usefulness of data science is that it helps to uncover hidden patterns and unknown correlations in the data sets, which helps companies or corporations to understand market trends, customer preferences and other useful business information [69]. When working with large data sets, data scientists are responsible for the analysis, capture, duration, search, sharing, storage, transfer, visualisation, the privacy of this data, and extraction of useful meaningful knowledge [68] [71].
The research aim in this thesis is to present a framework to predict Alzheimer’s disease at a very early stage, the experimentation in this thesis to demonstrate the framework uses risk factors data related to behavioural markers datasets acquired from the Alzheimer’s disease Neuroimaging Initiative (ADNI). Scientific methods and computational models will be employed to ensure the data used is as accurate as possible, and clean from errors. Working on data for patients with Alzheimer’s disease , such as ADNI [72], will require solid knowledge of data analysis and cleansing.
50 | P a g e
The ADNI database contains incomplete datasets [73], which means that before using such data and to make sense of the data it is important for this data to be cleaned. The data cleaning process ensures that the data is valid, clean, accurate, complete, consistent and uniform. Especially, when dealing with data for Alzheimer’s disease patients that contain large sets of data with hundreds of variations, it is crucial for the data to be valid and of a high quality. A good example of data cleansing and challenges in working with ADNI data is in Qu’s work, titled: “A Predictive Model for Identifying Possible MCI to AD Conversions in the ADNI Database,”[73], he realized that there are some tests in the ADNI database in which a limited number of patients have participated and the corresponding values were marked as “–1000” for the rest of the subjects. Therefore, he had to clean the database and remove all the incomplete fields. It is crucial to have a clean dataset before applying data analytical tools such as principal component analysis (PCA), Pattern Recognition Tools (PR Tools) or any other machine learning tools.
Concept of Machine Learning
This section discusses the study of Machine Learning (ML), one of the Artificial Intelligence subfields. Artificial Intelligence, often referred to as AI, is the field of study of intelligent behaviour and is a description given to smart software or machines that have the capability to think and learn independently. John McCarthy, who coined the term in 1955, defines it as "the science and engineering of making intelligent machines". Today AI is widely used by large corporations and businesses such as Tesla, Google and Apple, and Militaries around the world.
51 | P a g e
AI is developed on algorithms and artificial neural networks, which are inspired by biological neural networks in the central nervous systems of humans. The overall goal of AI is to develop systems that can learn and mimic the human response and behaviour in different circumstances. AI is highly complicated and very much a specialised field to study, which focuses on reasoning, knowledge, planning, learning (Machine learning), communication, perception and the ability to move and manipulate objects.
Machine Learning (ML), is a term used to describe the cover of providing computers with the ability to learn from experience from data and search for patterns without being programmed. It is widely used across almost all disciplines, for diverse purposes ranging from commercial use by businesses and healthcare to academia to conduct research studies. It is used by companies like Facebook to show personalised advertisements, or for image recognition to allow users to tag their friends. It is also used by gaming companies like the Nintendo Wii that uses real time image recognition and an algorithm called random forest to track users’ movements, which, allow users to interact with the game by only moving their body and hands without a joystick. Machine learning is used by virtual reality technology companies to build virtual reality video games, and by mobile phone companies that provide a keyboard voice tool that most people are familiar with in modern smartphones, which uses machine learning algorithms for voice recognition to convert speech to text. Another example of the use of machine learning is in robotics, e.g. building walking dogs robots that use reinforcement machine learning algorithms that allow the robot to learn how to walk on its own.
Machine learning searches for patterns in data to enhance the performance of the system and change its actions accordingly, without human interference. This concept of learning from
52 | P a g e
experience without explicit programming will leave a huge impact on the future of technology and computer science in general.
Machine learning is a technology that the future will be built on. With 3.7 billion humans having access to the World Wide Web, the amount of data generated each day exceeds 2.5 quintillion bytes (equivalent to 2,328,305,664 Gigabytes). From 2010 to 2018 the growth of generated data has reached 50 times to an estimated 40,900 exabytes of data. These statistics are astonishing and show that with the growth of data there is an essential need for machine learning to process and analyse this large amount of data.
It is true that we humans are smarter than computers but when it comes to remembering, executing complex tasks and analysing data they’re better than us and more accurate if designed correctly. The following sections in this chapter will discuss the different types of machine learning and the approaches, and their use for classification problems and prediction.