Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
1
8. Machine Learning
Applied Artificial Intelligence
Prof. Dr. Bernhard Humm
Faculty of Computer Science
Retrospective
Natural Language Processing
• Name and explain different areas of NLP
• What are the “7 levels of language understanding“?
• What is tokenizing, sentence splitting, POS tagging, and parsing?
• What do language resources offer to NLP? Give examples
• What do NLP frameworks offer? Give examples
• What do NLP services offer? Give examples
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
• Services / Product Map
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
What is Machine Learning (ML)?
Generating a model based on inputs
and using it for making decisions or predictions
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
• Services / Product Map
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Applications of ML:
Spam filtering
• Task: classify new e-mails as spam or not spam
Spam filter
New e-mails
Automatically
classified
Manually
classified
Corrections
ML input
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Stock market analysis
• Task: make recommendations on buying and selling stocks
7
Prediction
Current stock values
History of
stock values
ML input
Recommendation
Decision
Detecting credit card fraud
• Task: Detect fraud in credit card payments
Fraud detection
CC payments
Automatically
classified
Manually
classified
Corrections
ML input
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Recommender systems
• Task: Recommending customers suitable products
9
Recommender system
Order
Recommendation
of related products
ML input
Purchasing behaviour
of other customers
or customer groups
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Categories of ML tasks
• P.S. Other categorizations / groupings are possible
11
Machine Learning
Task
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Classifi-cation
Regression
Clustering
Feature
selection /
extraction
Topic
modeling
Categories of ML tasks
• Given: Example inputs and desired outputs
• Goal: Learn a general rule that maps inputs to
outputs
Supervised
learning
• Given: Data inputs (e.g., documents)
• Goal: Find structure in the inputs
Unsupervised
learning
• Setting: An agent interacts with a dynamic
environment in which it must perform a goal
• Goal: Improving the agent‘s behaviour
Reinforcement
learning
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Supervised learning subcategories
• Given: Training inputs (records) which are
divided into two or more classes
• Goal: Produce model to classify new inputs
• Examples: spam filter, fraud detection, …
Classification
• Given: Training data (records) with
continuous (not discrete) output values
• Goal: Produce model to predict output
values for new inputs
• Example: stock value prediction
Regression
Unsupervised learning subcategories
•Given: Set of input records
•Goal: Identifying clusters (groups of similar records)
•Example: Customer grouping
Clustering
•Given: Set of input records with attributes („features“)
•Goal: Find a subset of the original attributes that are
equally well suited for classification / clustering tasks
Feature
selection /
extraction
•Given: Set of text documents
•Goal: Find abstract topics that occur in several
documents and classify documents accordingly
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
• Services / Product Map
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Decision Tree Learning
• Used for supervised learning
(classification, regression)
• Training input: Training data
(records) with output values
(discrete or continuous
• Learning result: decision tree that
allows classifying / predicting output
values of new data records
• Example (figure): Decision tree for
classfying passengers on the Titanic
in survived / died
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Artificial Neural Networks
(ANN)
• Inspired by brain / nervous system:
- Neurons connected via dentrites
- Reduce resistance if fired repeatedly
• Artificial Neuron:
- Weighted inputs
- Function, e.g., weighted sum
- Filter, e.g, threshold
output
• Artificial Neural Network (ANN):
- Input layer, output layer, and possibly
intermediate layers of neurons
- Training phase: weights are adjusted via
known cases
- Regognition phase: output is produced for
new cases
Bayesian
Networks
• Directed acyclic graph (DAG) with:
- Nodes: random variables
+ probability function
- Edges: conditional dependencies
• Example:
- Probablility of rain
- Sprinkler is turned on if it hasn‘t rained for a while
- Grass is wet if it is raining or the sprinkler is turned on
• Bayes Network inference allows answering questions like:
- What is the probability that it is raining, given the grass is wet?
- What is the impact of turning the sprinkler on?
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Inductive Logic Programming
• Given:
- Set of logic facts (background knowledge), e.g.
male(Tom), female(Eve), parent (Tom, Eve)
- Positive and / or negative examples, e.g.,
daughter (Eve, Tom)
• Learning goal:
- General rules that are consistent with the examples and the
background knowledge, e.g.,
parent(p1, p2) and female(p2)
daughter(p2, p1)
19
George
Tom
Mary
Helen
Nancy
Eve
parent
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
WEKA
21
Tasks supported by WEKA
• Numerous approaches for supervised and unsupervised learning
• Choose and modify the data being acted on
Preprocess
• Train and test learning schemes that classify or
perform regression
Classify
• Learn clusters for the data
Cluster
• Learn association rules for the data
Associate
• Select the most relevant attributes in the data
Select attributes
• View an interactive 2D plot of the data
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
WEKA Datasets
• Collection of examples
• Each instance consists of attributes
• Attribute types:
- Nominal (enumeration)
- Numeric (real or integer number)
- String
• Example:
23
@relation golfWeatherMichigan_1988/02/10_14days
@attribute outlook {sunny, overcast, rainy}
@attribute windy {TRUE, FALSE}
@attribute temperature real
@attribute humidity real
@attribute play {yes, no}
@data
sunny,FALSE,85,85,no
sunny,TRUE,80,90,no
overcast,FALSE,83,86,yes
rainy,FALSE,70,96,yes
rainy,FALSE,68,80,yes
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
• Services / Product Map
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
ML Services Map
ML libraries
ML services
ML development environments /
frameworks
for experimenting with
IDEs and frameworks
different ML
approaches and
configuring solutions
Web services for for
experimenting with
different ML
approaches and
configuring solutions
Algorithms for classification, regression, clustering, feature selection /
extraction, tropic modelling, etc. using different approaches, e.g., decision
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
ML Product Map
27
ML libraries
ML services
ML development environments /
frameworks
bigml, wise.io, procog,
ersatz, …
Eblearn, OpenNN,
aisolver, CURRENNT, …
WEKA, Orange, Shogun,
ML product map (table)
Product ML library ML development environment / framework
ML service
Java Neural Network Framework Neuroph
x x
Fast Artificial Neural Network Library
x
eblearn x
Jaden x x
OpenNN - Open Neural Networks Library
x
aisolver x
CURRENNT x
WEKA x x
Orange x x
Shogun x x
scikit-learn x x
bigml x
wise.io x