machine learning
A Hierarchy of Limitations in
Machine Learning
Momin M. Malik, Data Science Postdoctoral Fellow, Berkman Klein Center
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Objectives
What are the trade-offs associated with
each branch?
When are we justified traveling down the
machine learning (“predictive”) branch?
What are the consequences of using
machine learning when it is not justified?
Outline
1.
Quantitative: Problems of meanings,
measurement, and constructs
2.
Probability-based: Problems of central
tendencies, variability as nuisance
3.
Predictive: Problems of correlation over
causation
4.
Cross-validation: Problems of dependencies
Meaning-making
“During the writing of this book, my first
grandchild was born. The hospital records
document her weight, height, health[;] the
mother’s condition, length of labor, time of birth,
and hospital stay… These are physiological and
institutional metrics. When aggregated across
many babies and mothers, they provide trend
data about the beginning of life—birthing.”
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Meaning-making
“But nowhere in the hospital records will you find
anything about what the birth of Calla Quinn
means. Her existence is documented but not what
she means to our family, what decision-making
process led up to her birth, the experience and
meaning of the pregnancy, the family experience
of the birth process, and the familial, social,
Measurement and constructs
Constructs
: primitives of
social science
–
What we care about
–
Often unobservable (and
hypothetical/subjective,
e.g. friendship)
–
Proxies always give errors
(for binary-valued
constructs: false
negatives and false
positives)
X
Features
Y
“Ground Truth”
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Constructs: Subjective, multifaceted
Validating measurements
Kinds of Validity
Construct Validity
(measurement)
Inference Validity
(studies)
“Translation”
Face
Content
Predictive
Concurrent
Convergent
Discriminant
Criterion
Internal
External
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Kinds of Validity
Construct Validity
(measurement)
Inference Validity
(studies)
“Translation”
Face
Content
Predictive
Concurrent
Convergent
Discriminant
Criterion
Internal
External
(ML: Only external validity)
“Thin description”
“what exactly is thin description? In a thin world, surfaces
should be valid and deep meanings superfluous…
“The [quantitative social science] focus on behavior was a
strategy to make social science into real science, something
more like physics and less subject to values and prejudices
because restricted to observable phenomena of the sort
that could be registered by instruments. The sacrifice of
human meaning seemed not just a price worth paying for
solid results, but the liberating essence of a proper
objective methodology that now would rise above
stubborn tradition and invisible culture.” (Porter, 2012)
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Consequences
The world is, ultimately, “thick”: the same behavior can
have infinitely many different meanings
What is it we ultimately care about? Relating to human
experience? Or “solid results”?
Quantification requires choosing one set of meanings;
nothing subsequent can “unpack” this (it has to be done
again), and there is never one “best” meaning
Quantification solidifies that meaning, which lets us build
upwards
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Probability: signal and noise
“Probability is used in two distinct, although
interrelated, ways in statistics,
phenomenologically to describe haphazard
variability arising in the real world and
epistemologically to represent uncertainty of
knowledge.” (Cox, 1990)
Implies a philosophical commitment: the world
is made up of entities that are interchangeable,
where the important thing is central tendency
The world as a data matrix
“it is striking how absolutely these assumptions contradict
those of the major theoretical traditions of sociology.
Symbolic interactionism rejects the assumption of fixed
entities and makes the meaning of a given occurrence
depend on its location — within an interaction, within an
actor's biography, within a sequence of events.
“Both the Marxian and Weberian traditions deny explicitly
that a given property of a social actor has one and only one
set of causal implications… Marx, Weber, and work deriving
from them in historical sociology all approach social
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Concretely: “Flaw of averages”
Source: Todd Rose. Illustration: Future for Learning.
Consequences
These problems are not necessarily unique to
probability-based modeling
–
SIR equations are “equilibrium” solution
–
Agent-based modeling often has interchangeable agents,
and summarizes outcomes over multiple simulations with
summary statistics
Neither statistics nor machine learning can do
anything with an
n
of 1: cannot account for
individuality, nor do anything with it
Planning to the central tendency punishes outliers
“Prediction” is not prediction!
“It’s not prediction at all! I have not found a
single paper predicting a future result. All of
them claim that a prediction could have
been made; i.e. they are post-hoc analysis
and, needless to say, negative results are
rare to find.” (Gayo-Avello, “I Wanted to
Predict Elections with Twitter and all I got
was this Lousy Paper”, 2012)
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
35
30
25
20
10
5
15
0
0
5
10
15
Chocolate Consumption (kg/yr/capita)
N
obel L
au
reat
es p
er 10 M
illion P
op
ul
ation
Poland
Switzerland
Sweden
Norway
China
Brazil
Greece
Portugal
United
States
Germany
France
Finland
Italy
Australia
The Netherlands
Canada
Belgium
United Kingdom
Ireland
Spain
Austria
Denmark
r=0.791
Japan
“Predictions” are correlations
Cause
is resources
Resources
Consume
chocolate
Science
funding
Nobel
prizes
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Creates two types of modeling!
Informative models may not fit well
Breiman 2001: Information
Shmueli 2010: Explanation
Kleinberg et al 2015: Rain dance
Mullainathan & Spiess, 2017:
β
-hat
Correlations may “predict” well
Breiman, 2001: Prediction
Shmueli, 2010: Prediction
Kleinberg et al., 2015: Umbrella
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
“True” model can predict worse!
100
150
200
0.000
0.005
0.010
0.015
0.020
Mean Squared (test) Error over 1,000 runs
MSE
Density
True
Underspecified
All−subset
Stepwise
Ridge
Lasso
Sometimes, people want causality
“A project I worked on in the late 1970s was the
analysis of delay in criminal cases in state court
systems... A large decision tree was grown, and I
showed it on an overhead and explained it to the
assembled Colorado judges. One of the splits was on
District N which had a larger delay time than the
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Correlations and injustice
Julius C. Chappelle proposed a bill in
Massachusetts to ban charging Black people
more for life insurance
A lawyer opposing the bill “cited statistics from
around the nation showing shorter life spans for
blacks, including 1870 census figures showing a
17.28 death rate for ‘colored people’ against 14.74
for whites. These numbers, Williams argued, and
not any ‘discrimination on the ground of color’
motivated insurers’ rates. It was a ‘matter of
business,’ and any interference, he warned
ominously and presciently, ‘would probably cut
off insurance entirely from the colored race.’”
Correlations and injustice
“Chappelle’s allies noted that Williams’s
statistics, while bleak enough, answered
the wrong question. The question was
not whether blacks in slavery or
adjusting to freedom were poor
insurance risks, or even whether
southern blacks were poor risks. The
question was African Americans’
potential for equality and specifically
the present and future state of
Massachusetts’ African Americans—
about whom no statistics had been
offered by either side.” (Bouk, 2015)
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Sometimes, causality affects
prediction
2.5
5
0
2.5
5
0
2.5
5
0
2.5
ILI pe
rcentage
0
5
Data available as of 4 February 2008
Data available as of 3 March 2008
Data available as of 31 March 2008
Data available as of 12 May 2008
40
43
47
51
3
Week
7
11
15
19
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Real-world testing of ML results
van’t Veer et al.
(2002) found 70
genes correlated with
developing breast
cancer
Of course the
correlations were
optimal, post-hoc. But
did it generalize?
10
20
30
40
50
60
70
T
umours
5
10
15
T
umours
AL080059 Contig 63649RC LOC51203 Contig 46218RC Contig 38288RC AA555029RC Contig 28552RC F
LT1
MMP9 DC13 EXT1 AL137718 PK428 HEC ECT2 GMPS Contig 32185RC UCH37 Contig 35251RC KIAA1067 GNAZ SERF1A OXCT ORC6L L2DTL PRC1 AF052162 COL4A2 KIAA0175 RAB6B Contig 55725RC DCK CEN
P
A
SM20 MCM6 AKAP2 Contig 56457
RC
RFC4 DKFZP564D0462 SLC2
A3
MP1 Contig 40831RC Contig 24252RC FLJ11190 Contig 51464RC IGFBP5 IGFBP5 CCNE2 ESM1 Contig 20217RC NMU LOC57110 Contig 63102RC PECI AP2B1 CFFM4 PECI TGFB3 Contig 46223RC Contig 55377RC HSA250839 GSTM3 BBC3 CEGP1 Contig 48328RC WISP1 ALDH4 KIAA1442 Contig 32125RC FGF18
–1
0
1
–1
0
1
Sporadic breast tumours
patients <55 years
tumour size <5 cm
lymph node negative (LN0)
No distant metastases
>5 years
Distant metastases
<5 years
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
High
Both tests
agree, high
risk
Both tests
agree, low
risk
Low
High
Low
Chemo-therapy is
worse!
Chemo-therapy is
similar
Risk via
correlations
with gene
expression
Implementation testing
“Clinical” risk
Treat with
chemo
Don’t treat
with chemo
???
Cardoso et al., 2016, NEJM
High
Both tests
agree, high
risk
Both tests
agree, low
risk
Low
High
Low
Chemo-therapy is
worse!
Chemo-therapy is
similar
Risk via
correlations
with gene
expression
Implementation testing
“Clinical” risk
Treat with
chemo
Don’t treat
with chemo
Finding: Machine learning
alone would make things
worse. But as a secondary
diagnosis, on average it
catches false positives and
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Implementation testing: Details
Before
experiment
(training data)
High model risk,
low clinical risk:
randomize.
Chemo worse!
Low model risk,
high clinical
risk: chemo
makes no
difference
Baseline Survival (no chemotherapy)
Survival
without
Distant
Metastasis
(%)
100
80
90
70
60
40
30
10
50
20
0
0
1
2
3
4
5
6
7
8
9
Year
Low clinical, high genomic
Low clinical and genomic
High clinical,
low genomic
High clinical and genomic
Clinicial says high risk, Model says low risk
100
80
90
70
60
40
30
10
50
20
0
0
1
2
3
4
5
6
7
8
9
Year
Chemotherapy
No chemotherapy
9
0 1 2 3 4 5 6 7 8
80
0
85
90
95
100
Year
Clinicial says low risk, Model says high risk
100
80
90
70
60
40
30
10
50
20
0
0
1
2
3
4
5
6
7
8
9
Chemotherapy
No chemotherapy
9
0 1 2 3 4 5 6 7 8
85
0
95
80
90
100
Baseline Survival (no chemotherapy)
Survival
without
Distant
Metastasis
(%)
100
80
90
70
60
40
30
10
50
20
0
0
1
2
3
4
5
6
7
8
9
Year
Low clinical, high genomic
Low clinical and genomic
High clinical,
low genomic
High clinical and genomic
Clinicial says high risk, Model says low risk
100
80
90
70
60
40
30
10
50
20
0
0
1
2
3
4
5
6
7
8
9
Year
Chemotherapy
No chemotherapy
9
0 1 2 3 4 5 6 7 8
80
0
85
90
95
100
Year
Clinicial says low risk, Model says high risk
100
80
90
70
60
40
30
10
50
20
0
0
1
2
3
4
5
6
7
8
9
Chemotherapy
No chemotherapy
9
0 1 2 3 4 5 6 7 8
85
0
95
80
90
100
Baseline Survival (no chemotherapy)
Survival
without
Distant
Metastasis
(%)
100
80
90
70
60
40
30
10
50
20
0
0
1
2
3
4
5
6
7
8
9
Year
Low clinical, high genomic
Low clinical and genomic
High clinical,
low genomic
High clinical and genomic
Clinicial says high risk, Model says low risk
100
80
90
70
60
40
30
10
50
20
0
0
1
2
3
4
5
6
7
8
9
Year
Chemotherapy
No chemotherapy
9
0 1 2 3 4 5 6 7 8
80
0
85
90
95
100
Year
Clinicial says low risk, Model says high risk
100
80
90
70
60
40
30
10
50
20
0
0
1
2
3
4
5
6
7
8
9
Chemotherapy
No chemotherapy
9
0 1 2 3 4 5 6 7 8
85
0
95
80
90
100
Generalizability through CV
Non-experimentally, generalizability is shown
through cross validation
CV can go wrong in known ways:
–
improper splitting
–
publication bias (Gayo-Avello, 2012)
–
overfitting to the test set (Dwork et al. 2015, Park
2012)
Not systematically acknowledged: dependencies
A Hierarchy of Limitations in ML
https://MominMalik.com/nyucds2020.pdf
Classic argument for CV
Training:
Testing:
The difference is the optimism (Efron, 2004; Rosset & Tibshirani, 2018):
Err(ˆ
µ) =
n
1
E
f
kY
⇤
Y
b
k
2
2
=
n
1
h
tr ⌃ +
kµ
E( b
Y )
k
2
2
+ tr Var
f
( b
Y )
2 tr Cov
f
(Y
⇤
, b
Y )
i
<latexit sha1_base64="aLfMXUg7zKBOl7HJdPyniE2ZK1Q=">AAAO9HicpVdbb9s2FHa6W+etWbo97oVd4aEX15DcZM1LgDZNihZoi6xNc6nlBpRE2UQoUSOpxI7qf7K3Ya/7P9uv2aEkhzLldTcZBo7O952P5xxSIuWnjErlOL+vXPno408+/ezq5+0vvry2+tXa9a8PJM9EQN4EnHFx5GNJGE3IG0UVI0epIDj2GTn0Tx9r/PCMCEl5sq+mKRnGeJTQiAZYgetk7Q9vV4hb3hir3Iuz2W30/RbyVCRwkLuzPJkhb/ckQt57dPzuDrqHvHMaEk0+nnnv3/VP+sjz2o0QRiI1AJ9A3ms6ijG6qxVAXyvs3qqJ3K5U7pbsAyy8hDMaUyVPogUihPZL0mN+VidBYl20ICnoaKyG6GTtptNzigs1Dbcybraqa+/k+rUrXsiDLCaJChiWcuA6qRrmWCgaMDJre5kkKQ5O8Yjkvh9bDrjv1BwDMBMcEznMi2maoQ54QhRxAf9EocK7IIFjKaexD8wYq7G0Me1chg0yFW0Oc5qkmSJJUA4UZQwpjvSco5AKEig2BQMHgkI1KBhjmDAFK6PtCZKQ84DHMU5CL8IxZdOQRDhjsCZkVJmz9kI6WToShJwutkBnJmQkm17FObPcJE7H5KfFnuXpRLdGWoNhIfAUok2ad3IPVrXysZjlnsgYGdxzyWSYO72NVM3yfm+DTGZ2xJiLi1qE5gwrahVojTspJ67tQQ/g+Sru9GTCMn+6/+L5LN/ZdLd/cJqE/pzg7m5u7m40CffnhCdPHrvOgyZhfU5wnPWddWcxsUEQD3MZFeuhXmOuH2Yor5iHONc3Fk7qOGniWIximlQUnl5SK/8yOp4sp4PfpoMEPLUXpBlwiTRC8OSvQuaIHaJEkww+m+ZWLD+C15aFPTPYMxtzDObY2Ks55uevbGzXYLs2doCX5KydNhHefU2idmpi/TnOvT0z3p4tQ5MQ0JSI1LtR/LRpcfbNQtm34x+ZHjyysW2DbdvYjsF2bMx/bsDnNvjCYC9s7NBghzZ2ZLAjGzs22LGNvTXYWxs7N9i5jU0MNrGxqcGmNnZhsAsbCydmJkIbDF5WYIBZ/rLRU8zSMQaGH8PDWtg2wydqTihMGyeppIwnFeUMi7nDJsLpoeSAYWNSnwIquLRtxv7Y5FHaNmOHsEtGaduM17VRXi8dJaxphEs1VC0PtTSPEY4vRylteDkvbBuKnl6Ue522GPUFFtM85ZLqMxdNRl05ximR3RDe96I4iMmuD8EjwbMktPbJdBSljCvL6+sNvNiUAQcdBhPD8JSI3AjNmiAcPkgdhBQLQNbCujGmSbfGRAh1OkjymCAchkURmKEyrjjOhCTm7UbBsCKDroJTguzWa4fNnJ/LZn9wpjgsZFwnN1mFXtNd62Qvhach5iIdawHUQQmncorKljcCI6p0feXVQXCrIKxi6wNUwLkIaYLVkuD6nEFwKPC5DoZ1gwyEcATHLO1sow9dHWRa/qHF0ywekg0vWwqTWvOnulkwNZou1ZTBaWya8JDMtgYBFXCo7UJiNDjtomIHzmIkYUvdcnoPgrhbFLTlMyjlxqYzrKtwXxJxRsJ/reM6The6zFh131/UZdDnRP3f7FItAyL61IuTUU2HJgnMBXQJRPobzdwWZHRfyT/TWv8bKYXhg8yUVTUgI1B/1+qOkXVTZVV+X3+P1HUDRst+mfT+q7bb19pt+Gpy7W+kpnHQ77n3exs/rt98uF19P11tfdv6rnWr5bYetB62nrb2Wm9awcrDlWiFr6SrZ6s/r/6y+mtJvbJSxXzTWrhWf/sT5pOQXg==</latexit>