Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

(1)

Unsupervised Learning and

Data Mining Unsupervised Learning

and Data Mining

Unsupervised Learning and

Data Mining Unsupervised Learning

and Data Mining

Clustering Clustering

Supervised Learning Supervised Learning

ó

ó Decision trees Decision trees

ó

ó Artificial neural nets Artificial neural nets

ó

ó K-nearest neighbor K-nearest neighbor

ó

ó Support vectors Support vectors

ó ó Linear regression Linear regression

ó ó Logistic regression Logistic regression

ó ó ... ...

Supervised Learning Supervised Learning

ó

ó F(x): true function (usually not known) F(x): true function (usually not known)

ó

ó D: training sample drawn from F(x) D: training sample drawn from F(x)

5757

, ,

M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 00 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 11 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 00 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 11 84,F,210,1,135,105,39,24,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 84,F,210,1,135,105,39,24,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 00 89,F,135,0,120,95,36,28,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0 89,F,135,0,120,95,36,28,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0 11 49,M,195,0,115,85,39,32,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0 49,M,195,0,115,85,39,32,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0 00 40,M,205,0,115,90,37,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 40,M,205,0,115,90,37,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 74,M,250,1,130,100,38,26,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 74,M,250,1,130,100,38,26,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 77,F,140,0,125,100,40,30,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1 77,F,140,0,125,100,40,30,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1 11

…

(2)

Supervised Learning Supervised Learning

ó ó F(x): true function (usually not known) F(x): true function (usually not known)

ó

ó D: training sample drawn from F(x) D: training sample drawn from F(x)

57

, ,

M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 00 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 11 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 00 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 11

ó ó G(x): model learned from training sample D G(x): model learned from training sample D

71,M,160,1,130,105,38,20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 71,M,160,1,130,105,38,20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0

? ?

ó ó Goal: E<(F(x)-G(x)) Goal: E<(F(x)-G(x)) ² ² > is small (near zero) for > is small (near zero) for future samples drawn from F(x)

future samples drawn from F(x)

Supervised Learning Supervised Learning

Well Defined Goal:

Learn G(x) that is a good approximation Learn G(x) that is a good approximation

to F(x) from training sample D to F(x) from training sample D Know How to Measure Error:

Know How to Measure Error:

Accuracy, RMSE, ROC, Cross Entropy, ...

Clustering

≠

Supervised Learning Clustering

≠

Supervised Learning

Clustering

=

Unsupervised Learning Clustering

=

Unsupervised Learning

(3)

Supervised Learning Supervised Learning

Train Set:

5757

, ,

M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 11 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 00 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 11 84,F,210,1,135,105,39,24,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 84,F,210,1,135,105,39,24,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 00 89,F,135,0,120,95,36,28,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0 89,F,135,0,120,95,36,28,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0 11 49,M,195,0,115,85,39,32,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0 49,M,195,0,115,85,39,32,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0 00 40,M,205,0,115,90,37,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 40,M,205,0,115,90,37,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 74,M,250,1,130,100,38,26,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 74,M,250,1,130,100,38,26,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 77,F,140,0,125,100,40,30,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1 77,F,140,0,125,100,40,30,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1 11

…

Test Set:

71,M,160,1,130,105,38,20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 71,M,160,1,130,105,38,20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0

? ?

Un-Supervised Learning Un-Supervised Learning

Train Set:

5757

, ,

M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 00 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 11 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 00 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 11 84,F,210,1,135,105,39,24,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 84,F,210,1,135,105,39,24,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 00 89,F,135,0,120,95,36,28,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0 89,F,135,0,120,95,36,28,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0 11 49,M,195,0,115,85,39,32,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0 49,M,195,0,115,85,39,32,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0 00 40,M,205,0,115,90,37,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 40,M,205,0,115,90,37,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 74,M,250,1,130,100,38,26,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 74,M,250,1,130,100,38,26,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 77,F,140,0,125,100,40,30,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1 77,F,140,0,125,100,40,30,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1 11

…

Test Set:

71,M,160,1,130,105,38,20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 71,M,160,1,130,105,38,20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0

? ?

Un-Supervised Learning Un-Supervised Learning

Train Set:

57

, ,

M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 11 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 00 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 11 84,F,210,1,135,105,39,24,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 84,F,210,1,135,105,39,24,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 00 89,F,135,0,120,95,36,28,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0 89,F,135,0,120,95,36,28,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0 11 49,M,195,0,115,85,39,32,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0 49,M,195,0,115,85,39,32,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0 00 40,M,205,0,115,90,37,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 40,M,205,0,115,90,37,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 74,M,250,1,130,100,38,26,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 74,M,250,1,130,100,38,26,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 00 77,F,140,0,125,100,40,30,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1 77,F,140,0,125,100,40,30,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1 11

…

Test Set:

71,M,160,1,130,105,38,20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 71,M,160,1,130,105,38,20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0

? ?

Un-Supervised Learning Un-Supervised Learning

Data Set:

57

, ,

M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0 84,F,210,1,135,105,39,24,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 84,F,210,1,135,105,39,24,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 89,F,135,0,120,95,36,28,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0 89,F,135,0,120,95,36,28,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0 49,M,195,0,115,85,39,32,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0 49,M,195,0,115,85,39,32,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0 40,M,205,0,115,90,37,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 40,M,205,0,115,90,37,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 74,M,250,1,130,100,38,26,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 74,M,250,1,130,100,38,26,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 77,F,140,0,125,100,40,30,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1 77,F,140,0,125,100,40,30,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1

…

(4)

Supervised vs. Unsupervised Learning Supervised vs. Unsupervised Learning

Supervised Supervised

ó ó y=F(x): true function y=F(x): true function

ó ó D: labeled training set D: labeled training set

ó ó D: {x D: {x

_i_i

, ,y y

_i_i

} }

ó ó y=G(x): model trained to y=G(x): model trained to predict labels D

predict labels D

ó ó Goal: Goal:

E<(F(x)-G(x)) E<(F(x)-G(x))

²²

> > ≈ ≈ 0 0

ó ó Well defined criteria: Well defined criteria:

Accuracy, RMSE, ...

Unsupervised Unsupervised

ó ó Generator: true model Generator: true model

ó ó D: unlabeled data sample D: unlabeled data sample

ó ó D: {x D: {x

_i_i

} }

ó ó Learn Learn

??????????

ó ó Goal: Goal:

??????????

ó

ó Well defined criteria: Well defined criteria:

??????????

What to Learn/Discover?

ó ó Statistical Summaries Statistical Summaries

ó

ó Generators Generators

ó ó Density Estimation Density Estimation

ó ó Patterns/Rules Patterns/Rules

ó

ó Associations Associations

ó

ó Clusters/Groups Clusters/Groups

ó

ó Exceptions/Outliers Exceptions/Outliers

ó

ó Changes in Patterns Over Time or Location Changes in Patterns Over Time or Location

Goals and Performance Criteria?

ó

ó Statistical Summaries Statistical Summaries

ó

ó Generators Generators

ó

ó Density Estimation Density Estimation

ó

ó Patterns/Rules Patterns/Rules

ó ó Associations Associations

ó ó Clusters/Groups Clusters/Groups

ó

ó Exceptions/Outliers Exceptions/Outliers

ó ó Changes in Patterns Over Time or Location Changes in Patterns Over Time or Location

Clustering

(5)

Clustering Clustering

ó ó Given: Given:

–

– Data Set D (training set) Data Set D (training set) –

– Similarity/distance metric/information Similarity/distance metric/information

ó ó Find: Find:

– – Partitioning of data Partitioning of data –

– Groups of similar/close items Groups of similar/close items

Similarity?

ó ó Groups of similar customers Groups of similar customers –

– Similar demographics Similar demographics –

– Similar buying behavior Similar buying behavior –

– Similar health Similar health

ó

ó Similar products Similar products –

– Similar cost Similar cost –

– Similar function Similar function –

– Similar store Similar store –

– … …

ó ó Similarity usually is domain/problem specific Similarity usually is domain/problem specific

Types of Clustering Types of Clustering

ó

ó Partitioning Partitioning –

– K-means clustering K-means clustering –

– K- K-medoids medoids clustering clustering –

– EM (expectation maximization) clustering EM (expectation maximization) clustering

ó ó Hierarchical Hierarchical

– – Divisive clustering (top down) Divisive clustering (top down) – – Agglomerative clustering (bottom up) Agglomerative clustering (bottom up)

ó ó Density-Based Methods Density-Based Methods

– – Regions of dense points separated by sparser regions Regions of dense points separated by sparser regions of relatively low density

of relatively low density

Types of Clustering Types of Clustering

ó

ó Hard Clustering: Hard Clustering:

–

– Each object is in one and only one cluster Each object is in one and only one cluster

ó

ó Soft Clustering: Soft Clustering:

–

– Each object has a probability of being in each cluster Each object has a probability of being in each cluster

(6)

Two Types of Data/Distance Info Two Types of Data/Distance Info

ó

ó N-dim vector space representation and distance metric N-dim vector space representation and distance metric

D1:D1: 5757

, ,

M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 D2:

D2: 78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,078,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0 ...

...

Dn

Dn:: 18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,018,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Distance (D1,D2) = ???

ó ó Pairwise Pairwise distances between points (no N-dim space) distances between points (no N-dim space)

««

Similarity/dissimilarity matrix (upper or lower diagonal) Similarity/dissimilarity matrix (upper or lower diagonal)

««

Distance: Distance: 0 = near, 0 = near, ∞ ∞ = far = far

«

Similarity: Similarity: 0 = far, 0 = far, ∞ ∞ = near = near

-- 1 2 3 4 5 6 7 8 9 10 1 - d d d d d d d d d 2 - d d d d d d d d 3 - d d d d d d d 4 - d d d d d d 5 - d d d d d 6 - d d d d 7 - d d d 8 - d d 9 - d

Agglomerative Clustering Agglomerative Clustering

ó ó Put each item in its own cluster (641 singletons) Put each item in its own cluster (641 singletons)

ó ó Find all pairwise Find all pairwise distances between clusters distances between clusters

ó ó Merge the two closest Merge the two closest clusters clusters

ó

ó Repeat until everything is in one cluster Repeat until everything is in one cluster

ó

ó Hierarchical clustering Hierarchical clustering

ó

ó Yields a clustering with each possible # of clusters Yields a clustering with each possible # of clusters

ó

ó Greedy clustering: not optimal for any cluster size Greedy clustering: not optimal for any cluster size

Agglomerative Clustering of Proteins

Agglomerative Clustering of Proteins Merging: Closest Clusters Merging: Closest Clusters

ó

ó Nearest centroids Nearest centroids

ó

ó Nearest medoids Nearest medoids

ó

ó Nearest neighbors (shortest link) Nearest neighbors (shortest link)

ó

ó Nearest average distance (average link) Nearest average distance (average link)

ó ó Smallest greatest distance (maximum link) Smallest greatest distance (maximum link)

ó ó Domain specific similarity measure Domain specific similarity measure –

– word frequency, TFIDF, KL-divergence, ... word frequency, TFIDF, KL-divergence, ...

ó

ó Merge clusters that optimize criterion after merge Merge clusters that optimize criterion after merge

– – minimum mean_point_happiness minimum mean_point_happiness

(7)

Mean Distance Between Clusters Mean Distance Between Clusters

Mean _ Dist (c

₁

,c

₂

) =

Dist (i, j)

j Œc₂ iŒc₁

Â Â

1

jŒc₂ i Œc

Â

₁

Â

Minimum Distance Between Clusters Minimum Distance Between Clusters

Min _ Dist (c

₁

,c

₂

) = MIN

iŒc1, jŒc2

(Dist (i, j))

Mean Internal Distance in Cluster Mean Internal Distance in Cluster

Mean _ Internal _ Dist (c) =

Dist (i, j)

j Œc, i ≠ j

Â Â

iŒc

1

j Œc, i ≠ j

Â Â

i Œc

Mean Point Happiness Mean Point Happiness

†

Mean _ Happiness =

d

_ij

• Dist (i, j)

j≠i

Â

i

Â

d

ij j≠i

Â

i

Â

d

ij

= 1 when cluster(i) = cluster( j) 0 when cluster(i) ≠ cluster( j) Ï Ì

Ó

¸ ˝

˛

(8)

Recursive Clusters

Recursive Clusters Recursive Clusters Recursive Clusters

Recursive Clusters

Recursive Clusters Recursive Clusters Recursive Clusters

(9)

Mean Point Happiness

Mean Point Happiness Mean Point Happiness Mean Point Happiness

Recursive Clusters + Random Noise

Recursive Clusters + Random Noise Recursive Clusters + Random Noise Recursive Clusters + Random Noise

(10)

Clustering Proteins Clustering Proteins

Distance Between Helices Distance Between Helices

ó

ó Vector representation of protein data in 3-D space Vector representation of protein data in 3-D space that gives x,y,z coordinates of each atom in helix that gives x,y,z coordinates of each atom in helix

ó ó Use a program developed by chemists ( Use a program developed by chemists (fortran fortran) to ) to convert 3-D atom coordinates into average atomic convert 3-D atom coordinates into average atomic distances in angstroms between aligned helices distances in angstroms between aligned helices

ó

ó 641 helices 641 helices = = 641 * 640 / 2 641 * 640 / 2

= = 205,120 pairwise 205,120 pairwise distances distances

Agglomerative Clustering of Proteins

(11)

Agglomerative Clustering of Proteins

Agglomerative Clustering of Proteins Agglomerative Clustering of Proteins Agglomerative Clustering of Proteins

Agglomerative Clustering of Proteins

Agglomerative Clustering of Proteins Agglomerative Clustering of Proteins Agglomerative Clustering of Proteins

(12)

Agglomerative Clustering Agglomerative Clustering

ó

ó Greedy clustering Greedy clustering –

– once points are merged, never separated once points are merged, never separated –

– suboptimal w.r.t. clustering criterion suboptimal w.r.t. clustering criterion

ó

ó Combine greedy with iterative refinement Combine greedy with iterative refinement –

– post processing post processing – – interleaved refinement interleaved refinement

Agglomerative Clustering Agglomerative Clustering

ó ó Computational Cost Computational Cost –

– O(N O(N

²²

) just to read/calculate pairwise ) just to read/calculate pairwise distances distances –

– N-1 merges to build complete hierarchy N-1 merges to build complete hierarchy

«

scan pairwise scan pairwise distances to find closest distances to find closest

««

calculate pairwise calculate pairwise distances between clusters distances between clusters

«

fewer clusters to scan as clusters get larger fewer clusters to scan as clusters get larger

– – Overall O(N Overall O(N

³³

) for simple implementations ) for simple implementations

ó ó Improvements Improvements –

– sampling sampling –

– dynamic sampling: add new points while merging dynamic sampling: add new points while merging –

– tricks for updating tricks for updating pairwise pairwise distances distances

(13)

K-Means Clustering K-Means Clustering

ó ó Inputs: data set and k (number of clusters) Inputs: data set and k (number of clusters)

ó

ó Output: each point assigned to one of k clusters Output: each point assigned to one of k clusters

ó ó K-Means Algorithm: K-Means Algorithm:

– – Initialize the k-means Initialize the k-means

«

assign from randomly selected points assign from randomly selected points

«

randomly or equally distributed in space randomly or equally distributed in space

– – Assign each point to nearest mean Assign each point to nearest mean – – Update means from assigned points Update means from assigned points – – Repeat until convergence Repeat until convergence

K-Means Clustering: Convergence K-Means Clustering: Convergence

ó ó Squared-Error Criterion Squared-Error Criterion

ó

ó Converged when SE criterion stops changing Converged when SE criterion stops changing

ó

ó Increasing K reduces SE - can’ Increasing K reduces SE - can ’t determine K by t determine K by finding minimum SE

finding minimum SE

ó

ó Instead, plot SE as function of K Instead, plot SE as function of K

Squared _ Error = (Dist (i, mean(c)))

²

Â

i Œc

Â

c

K-Means Clustering K-Means Clustering

ó

ó Efficient Efficient –

– K << N, so assigning points is O(KN) < O(N K << N, so assigning points is O(KN) < O(N

²²

) ) –

– updating means can be done during assignment updating means can be done during assignment –

– usually # of iterations << N usually # of iterations << N –

– Overall O(NKiterations) closer to O(N) than O(N Overall O(NKiterations) closer to O(N) than O(N

²²

) )

ó ó Gets stuck in local minima Gets stuck in local minima – – Sensitive to initialization Sensitive to initialization

ó ó Number of clusters must be pre-specified Number of clusters must be pre-specified

ó

ó Requires vector space date to calculate means Requires vector space date to calculate means

Soft K-Means Clustering Soft K-Means Clustering

ó

ó Instance of EM (Expectation Maximization) Instance of EM (Expectation Maximization)

ó

ó Like K-Means, except each point is assigned to Like K-Means, except each point is assigned to each cluster with a probability

each cluster with a probability

ó ó Cluster means updated using weighted average Cluster means updated using weighted average

ó

ó Generalizes to Standard_Deviation/Covariance Generalizes to Standard_Deviation/Covariance

ó ó Works well if cluster models are known Works well if cluster models are known

(14)

Soft K-Means Clustering (EM) Soft K-Means Clustering (EM)

–

– Initialize model parameters: Initialize model parameters:

«

means means

««

std_ std_devs devs

«

... ...

–

– Assign points probabilistically to Assign points probabilistically to each cluster

each cluster –

– Update cluster parameters from Update cluster parameters from weighted points

weighted points

– – Repeat until convergence to local Repeat until convergence to local minimum

minimum

What do we do if we can’t calculate cluster means?

-- 1 2 3 4 5 6 7 8 9 10 1 - d d d d d d d d d 2 - d d d d d d d d 3 - d d d d d d d 4 - d d d d d d 5 - d d d d d 6 - d d d d 7 - d d d 8 - d d 9 - d

K-Medoids Clustering K-Medoids Clustering

Medoid (c) = pt Œc s.t. MIN ( Dist(i, pt)

Â

iŒc

⁾

cluster medoid

K-Medoids Clustering K-Medoids Clustering

ó

ó Inputs: data set and k (number of clusters) Inputs: data set and k (number of clusters)

ó

ó Output: each point assigned to one of k clusters Output: each point assigned to one of k clusters

ó ó ó

ó Initialize k medoids Initialize k medoids –

– pick points randomly pick points randomly

ó ó Pick medoid Pick medoid and non- and non-medoid medoid point at random point at random

ó ó Evaluate quality of swap Evaluate quality of swap –

– Mean point happiness Mean point happiness

ó

ó Accept random swap if it improves cluster quality Accept random swap if it improves cluster quality

(15)

Cost of K-Means Clustering Cost of K-Means Clustering

ó

ó n cases; d dimensions; k centers; i iterations n cases; d dimensions; k centers; i iterations

ó

ó compute distance each point to each center: O(ndk) compute distance each point to each center: O(ndk)

ó ó assign each of n cases to closest center: O(nk) assign each of n cases to closest center: O(nk)

ó ó update centers (means) from assigned points: O(ndk) update centers (means) from assigned points: O(ndk)

ó ó repeat i times until convergence repeat i times until convergence

ó

ó overall: O(ndki) overall: O(ndki)

ó

ó much better than O(n much better than O(n

²²

)-O(n )-O(n

³³

) for HAC ) for HAC

ó

ó sensitive to initialization - run many times sensitive to initialization - run many times

ó

ó usually don usually don’ ’t know k - run many times with different k t know k - run many times with different k

ó

ó requires many passes through data set requires many passes through data set

Graph-Based Clustering Graph-Based Clustering

Scaling Clustering to Big Databases Scaling Clustering to Big Databases

ó

ó K-means is still expensive: O(ndkI) K-means is still expensive: O(ndkI)

ó

ó Requires multiple passes through database Requires multiple passes through database

ó

ó Multiple scans may not be practical when: Multiple scans may not be practical when:

–

– database doesn database doesn’ ’t fit in memory t fit in memory –

– database is very large: database is very large:

««

10 10

⁴⁴

-10 -10

⁹⁹

(or more) records (or more) records

«

>10 >10

²²

attributes attributes

–

– expensive join over distributed databases expensive join over distributed databases

Goals Goals

ó

ó 1 scan of database 1 scan of database

ó

ó early termination, on-line, anytime algorithm early termination, on-line, anytime algorithm yields current best answer

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Unsupervised Learning and

Data Mining Unsupervised Learning

and Data Mining

Unsupervised Learning and

Data Mining Unsupervised Learning

and Data Mining

Clustering Clustering

Supervised Learning Supervised Learning

ó

ó Decision trees Decision trees

ó

ó Artificial neural nets Artificial neural nets

ó

ó K-nearest neighbor K-nearest neighbor

ó

ó Support vectors Support vectors

ó ó Linear regression Linear regression

ó ó Logistic regression Logistic regression

ó ó ... ...

Supervised Learning Supervised Learning

ó

ó F(x): true function (usually not known) F(x): true function (usually not known)

ó

ó D: training sample drawn from F(x) D: training sample drawn from F(x)

, ,

Supervised Learning Supervised Learning

ó ó F(x): true function (usually not known) F(x): true function (usually not known)

ó

ó D: training sample drawn from F(x) D: training sample drawn from F(x)

, ,

ó ó G(x): model learned from training sample D G(x): model learned from training sample D

? ?

ó ó Goal: E<(F(x)-G(x)) Goal: E<(F(x)-G(x)) 2 2 > is small (near zero) for > is small (near zero) for future samples drawn from F(x)

future samples drawn from F(x)

Supervised Learning Supervised Learning

Well Defined Goal:

Well Defined Goal:

Learn G(x) that is a good approximation Learn G(x) that is a good approximation

to F(x) from training sample D to F(x) from training sample D Know How to Measure Error:

Know How to Measure Error:

Accuracy, RMSE, ROC, Cross Entropy, ...

Accuracy, RMSE, ROC, Cross Entropy, ...

Clustering

≠

Supervised Learning Clustering

≠

Supervised Learning

Clustering

=

Unsupervised Learning Clustering

=

Unsupervised Learning

Supervised Learning Supervised Learning

Train Set:

Train Set:

, ,

Test Set:

Test Set:

? ?

Un-Supervised Learning Un-Supervised Learning

Train Set:

Train Set:

, ,

Test Set:

Test Set:

? ?

Un-Supervised Learning Un-Supervised Learning

Train Set:

Train Set:

, ,

Test Set:

Test Set:

? ?

Un-Supervised Learning Un-Supervised Learning

Data Set:

Data Set:

, ,

Supervised vs. Unsupervised Learning Supervised vs. Unsupervised Learning

Supervised Supervised

ó ó Goal: E<(F(x)-G(x)) Goal: E<(F(x)-G(x)) ² ² > is small (near zero) for > is small (near zero) for future samples drawn from F(x)