ePub
WU
Institutional Repository
Thomas Rusch and Patrick Mair and Kurt Hornik
The STOPS framework for structure-based hyperparameter selection in
multidimensional scaling
Conference or Workshop Item (Published)
(Refereed)
Original Citation:
Rusch, Thomas and Mair, Patrick and Hornik, Kurt (2018) The STOPS framework for
structure-based hyperparameter selection in multidimensional scaling.
In: Data Science, Statistics &
Visualisation (DSSV2018), 09.07.-11.07., Vienna, Austria.
This version is available at:
http://epub.wu.ac.at/6399/
Available in ePub
WU: July 2018
ePub
WU, the institutional repository of the WU Vienna University of Economics and Business, is
provided by the University Library and the IT-Services. The aim is to enable open access to the
scholarly output of the WU.
This document is the publisher-created published version.
The STOPS Framework
for Structure-Based Hyperparameter Selection in
Multidimensional Scaling
Slide Zero
This is joint work withPatrick Mair(Harvard) andKurt Hornik(WU)
Multidimensional Scaling
TheSTRESSobjective function with (transformed) distances d∗ij
(
X)
, (transformed) proximitiesδ
ij∗ and finite weights w∗ij isσ(
X) =
X
i<j
w∗ij
δ
∗ij−
d∗ij(
X)
2which is minimized to find theconfiguration X
arg min
X
σ(
X)
MDS provides anoptimal map into continuous space
R
M(objective 1)
We may also be interested in some structural appearance of X, e.g.,clustersorcircumplex(objective 2).
It can happen thatwhat is optimal for objective 1 is not very useful for objective 2
Motivation: Republican Mantras
“I’m a Republican, because ...”from Mair et al. (2014)
Supporters of the Republican Party have been asked why they are Republican (254 statements)
Natural language datathat was scraped and processed
=⇒
Sparse data matrix (document term matrix)
Objects are the words (we use only words that appeared at least 10 times)
We look for themes in the statements: “Mantras” (words that occur often together)
We use acosine distancefor word co-occurences andapply standard least squares MDS(SMACOF) for representation.
Motivation: Republican Mantras
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 Republican Mantras? Configurations D1 Configur ations D2 government freedom values responsibility country personal conservative life party strong limited america free individual liberty right small family taxes nation people american god principles work defense great constitution fiscal market national founding hard military will best lowWe findlack of (interesting) structurein MDS configuration
Multidimensional Scaling Extensions
More structure is often introducedby using transformations
δ
∗ij=
fij(δ
ij)
and dij(
X)
∗=
gij(
dij(
X))
and weights w∗ijMany MDS variants are aspecial caseof this general formulation, e.g.,
Metric MDS: gij(a) =a, fij(a) =a,Sammon mapping: w∗ij = δ
−1 ij Multiscale: fij(a) =gij(a) =log(a) POST-MDS: gij(a) =aκ,fij(a) =aλ,w∗ij =w ν ij,ALSCAL:κ = λ =2
LMDS: Box-Cox transformations for gij(·),fij(·),Isomap: gij(·)
isometric distance
Often transformations areparametrized by a hyperparameter vector
θ
, soδ
ij∗=
fij(δ
ij; θ)
and d∗ij=
gij(
dij; θ)
Power Stress MDS
Fitratio MDS with power transformationby setting, e.g.,
f
(δ
ij) = δ
ij20Structure isclearerbut thefit is now worse(0.373 versus 0.401) (essentially fits only
δ
very close to the maximum)Parameters chosen ad hoc, not always clear what is theright
θ
.● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 Republican Mantras?! Configurations D1 Configur ations D2 government freedom values responsibility country personal conservative life party strong limited america free individual liberty right small family taxes nation people american god principles work defense great constitution fiscal market national founding hard military will best low SLIDE 7 DSSV18, 09-07-2018
Structure Optimized Proximity
Scaling
Our suggestion is a framework to systemize this approach:Structure Optimized Proximity Scaling (STOPS).
Idea: Select the parameters for the transformations(
θ
) in a principled fashionby fit and structure considerationsThis offers a conceptual and computationalframework for hyperparameter selection in MDS variants
Building blocks:
θ–parametrized target function formisfit
Statistics measuring configuration structure (structuredness
indices)
Combinationof misfit and structure
Algorithm foroptimization
STOPS - I
We have the target function that measuresmisfit(e.g., Stress)
σ(
X, θ) =
L(∆
∗,
D∗(
X), θ)
which we minimize to find theconfiguration Xfor a
θ
X
(θ) =
arg minX
σ(
X, θ)
X
(θ)
has somestructural appearance(C-Structuredness). C-Structurednesschangeswith differentθ
STOPS - II
CaptureP structures in X
(θ)
by indices Ip(
X(θ); γ),
p=
1, . . . ,
P.Combine
σ(
X(θ), θ)
and Ip(
X(θ); γ)
tostoploss(
X(θ), ϑ; ∆)
Two STOPS models
Additive STOPS (aSTOPS)
stoploss(X(θ), ϑ; ∆) =v0· σ(X(θ), θ) +
P
X
p=1
vpIp(X(θ); γ)
Multiplicative STOPS (mSTOPS)
stoploss(X(θ), ϑ; ∆) = σ(X(θ), θ)v0·
P
Y
p=1
Ip(X(θ); γ)vp
v0.. stressweight (redundant), v1, ...,vP... structuredness weights,γ... (optional)
metaparameters for structuredness indices;ϑ ⊆ {θ,v0, . . . ,vk}
Structures and Indices
C-Structuredness indicescaptureessence of a particular structurein a configuration. Some examples:
C-Association: Pairwisenonlinear associationbetween principal
axes (pairwise maximal maximum information coefficient; Reshef et al. 2011)
C-Clusteredness: Aclustered appearance(normed OPTICS
Cordillera; Rusch et al., 2018)
C-Complexity:Complexity of the functional relationshipbetween
any principal axes (pairwise maximal minimum cell number; Reshef et al. 2011)
C-Manifoldness: Points lie close to asmooth submanifold(maximal
correlation; Sarmanov, 1958)
Optimization-I
We need to find
arg min
ϑ stoploss
(
X(θ), ϑ; ∆)
This can be seen as aprofile method
We use anested algorithm
1 First solve for X(θ) =arg maxXσ(X, θ)
2 Then minimize stoploss(X(θ), ϑ; ∆)overϑ
Advantages:
For finding X(θ)we can usestandard solutions(reasonably good) The inner part (1.) allowscomputationally flexible specificationsof MDS method
Ip(X)depends directlyonly on X(θ)
Dimensionality of outer problem isusually not very high
Optimization-II
Difficulties whenoptimizingover
ϑ
Inner minimization is verycostlyFor stoploss basically only know function evaluations
Estimation of Step 1 may benoisy(premature termination, local minimum)
This suggests to solve Step 2 withEfficient Global Optimization
akaBayesian Optimization.
One samples the “best” candidate for evaluationgiven a surrogate model and the current knowledge.
Optimization-III
Bayesian Optimization:
Choose a(flexible) surrogate model(prior)
Evaluatethe target function at some candidate values (data)
Updatethe prior with the function evaluations (posterior)
Maximize an acquisition functionover theposteriorsurface
This suggests acandidate parameter combination
Evaluate at candidate andrepeat
We useExpected Improvementfor acquisition andTreed
Gaussian Process with Jumps to Linear Models(Grammacy, 2007) orKriging(Roustant et al., 2012) for the surrogate model.
R Package stops
All of this is implemented in the R packagestops
High level function for STOPSstops(delta,loss,...)
Prespecified MDS models (argumentloss) arestrain, SMACOF (smacofSym),sammonmapping,elasticscaling, SMACOF on a sphere (smacofSphere),sstress,rstress,powerstress,
Sammon mapping and elastic scaling with powers (powersammon,
powerelastic). Planned: Isomap and LMDS
Optimization with Bayesian optimization (kriging,tgp) and some more (including simulated annealingSANNor a particle swarm algorithmpso).
Features various c-structuredness indices
S3 methods: plot, summary, print, coef, residuals, plot3d, plot3dstatic
Example: Republicans
Misfit: Power Stress MDS
Structuredness: C-ClusterednessandC-Manifoldness
Optimization withtreed gaussian process prior with jump to linear models(for 20 steps)
R> resc <- stops(dt.dist,loss="powermds",
+ structures=c("cmanifoldness","cclusteredness"))
R> resc
Call: stops(dis = dt.dist, loss = "powermds", theta = c(1, 1), structures = c("cmanifoldness", "cclusteredness"), strucpars = strucpars, optimmethod = "tgp",
lower = c(0.5, 0.3), upper = c(3, 10), verbose = 5, type = "additive", itmax = 20)
Model: additive STOPS with powermds loss function and theta parameters= 1.871 3.191 1 Number of objects: 37
MDS loss value: 0.2513
C-Structuredness Indices: cmanifoldness 0.9738 cclusteredness 0.3117 Structure optimized loss (stoploss): -0.3914
MDS loss weight: 1 c-structuredness weights: -0.5 -0.5 Number of iterations of tgp optimization: 20
Example: Republicans
kappa lambda −0.35 −0.35 −0.35 −0.35 −0.35 −0.35 −0.35 −0.35 −0.345 −0.345 −0.345 −0.345 −0.345 −0.345 −0.34 −0.34 −0.34 −0.34 −0.34 −0.34 −0.335 −0.335 −0.335 −0.335 −0.33 −0.33 −0.33 −0.325 −0.325 −0.325 −0.32 −0.32 −0.32 −0.315 −0.315 −0.315 −0.305 1.6 1.8 2.0 2.2 2.5 3.0 3.5 4.0 ● ● ● ● ● ● ● ● ● ● ● SLIDE 17 DSSV18, 09-07-2018Example: Republicans
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.1 0.0 0.1 0.2 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 Republican Mantras! Configurations D1 Configur ations D2 SLIDE 18 DSSV18, 09-07-2018Example: Republicans
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 Configuration Plot Configurations D1 Configur ations D2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● SLIDE 19 DSSV18, 09-07-2018Example: Republicans
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.1 0.0 0.1 0.2 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 Republican Mantras! Configurations D1 Configur ations D2 government freedom values responsibility country personal conservative life party strong limited america free individual liberty right small family taxes nation people american god principles work defense great constitution fiscal market national founding hard military will best low SLIDE 20 DSSV18, 09-07-2018Example: Republicans
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.1 0.0 0.1 0.2 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 Republican Mantras! Configurations D1 Configur ations D2 government freedom values responsibility country personal conservative life party strong limited america free individual liberty right small family taxes nation people american god principles work defense great constitution fiscal market national founding hard military will best low Fiscalcon Traditional Neocon+Liberalist Paleo+Populist Unclustered SLIDE 21 DSSV18, 09-07-2018Summary and Outlook
STOPS
A conceptual and computationalframework for hyperparameter optimizationin MDS based on structure considerations
Outlook
More models and (perhaps?) more structures
Extend to other dimension reduction techniques (e.g., the Gifi system)
References
Borg, I., Groenen, P. (2005). Modern multidimensional scaling: Theory and applications, 2nd Edition, Springer, New York.
Gramacy, R. B. (2007). tgp: an R package for Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian process models. Journal of Statistical Software, 19(9), 1–46.
Mair, P., Rusch, T., Hornik, K. (2014) The grand old party - A party of values? SpringerPlus, 3:697.
Reshef, D., Reshef, Y., Finucane, H., Grossman, S., McVean, G., Turnbaugh, P., Lander, E., Mitzenmacher, M., & Sabeti, P. (2011) Detecting novel associations in large data sets. Science, 334, 1518–1524.
Roustant, O., Ginsbourger, D., & Deville, Y. (2012). Dicekriging, Diceoptim: Two R packages for the analysis of computer experiments by kriging-based metamodelling and optimization. Journal of Statistical Software, 51(1), 1–54. Rusch, T., Hornik, K., Mair, P. (2018) Assessing and quantifying clusteredness: The OPTICS Cordillera. Journal of
Computational and Graphical Statistics, 27 (1), 220-233.
Rusch, T., Mair, P., Hornik, K. (in preparation). Structure based hyperparameter selection for dimensionality reduction: The STOPS framework for Structure Optimized Proximity Scaling.
Sarmanov, O. (1958). Maximum correlation coefficient (symmetric case). Doklady Akad. Nauk SSR, 120, 715–718.
Thank You for Your Attention
Thomas Rusch
Competence Center for Empirical Research Methods email:[email protected]
URL:http://wu.ac.at/methods/team/dr-thomas-rusch WU Vienna University of Economics and Business
Welthandelsplatz 1, 1020 Vienna Austria
License
Please attribute Thomas Rusch, Patrick Mair and Kurt Hornik. Except where otherwise noted, this work is licensed under CC-BY-SA:
https://creativecommons.org/licenses/by-sa/4.0/