• No results found

Representation Learning of Collider Events

N/A
N/A
Protected

Academic year: 2021

Share "Representation Learning of Collider Events"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

Representation Learning

of Collider Events

Jack Collins

IAS Program on High Energy

(2)
(3)
(4)

Conclusions

I have been training Variational Autoencoders to reconstruct jets or collider events using Earth Movers Distance as the reconstruction metric.

The learnt representation: • Is scale dependent • Is orthogonalized

• Is hierarchically organized by scale

• Has fractal dimension which relates to that of the data manifold

This is because:

• The VAE is trained to be parsimonious with information • The metric space is physically meaningful and structured

(5)

The Plain Autoencoder

(6)

The Plain Autoencoder

(7)

The Plain Autoencoder:

a toy example

AE

(1D latent space)

(8)

The Plain Autoencoder:

a toy example

(9)

The Plain Autoencoder:

a toy example

1.The AE learns some

dense packing

of the

data space

2.The latent representation is

highly

coupled with

the expressiveness of the

network architecture

of the encoder and

decoder

(10)

The Variational Autoencoder

Loss =

𝒙

𝑜𝑢𝑡

− 𝒙

𝑖𝑛

2

Τ

2𝛽

2

− σ

𝑖

1

2

1 + log 𝜎

𝑖

2

− 𝜇

𝑖

2

− 𝜎

𝑖

2

(11)

The Variational Autoencoder:

Information and the loss function

Loss = 𝒙

𝑜𝑢𝑡

− 𝒙

𝑖𝑛

2

− 2𝛽

2

σ

𝑖

1

2

1 + log 𝜎

𝑖

2

− 𝜇

𝑖

2

− 𝜎

𝑖

2

Loss =

𝒙

𝑜𝑢𝑡

− 𝒙

𝑖𝑛

2

Τ

2𝛽

2

− σ

𝑖

1

2

1 + log 𝜎

𝑖

2

− 𝜇

𝑖

2

− 𝜎

𝑖

2

(12)

Loss = 𝒙

𝑜𝑢𝑡

− 𝒙

𝑖𝑛

2

− 2𝛽

2

σ

𝑖

1

2

1 + log 𝜎

𝑖

2

− 𝜇

𝑖

2

− 𝜎

𝑖

2

Loss =

𝒙

𝑜𝑢𝑡

− 𝒙

𝑖𝑛

2

Τ

2𝛽

2

− σ

𝑖

1

2

1 + log 𝜎

𝑖

2

− 𝜇

𝑖

2

− 𝜎

𝑖

2

1) 𝜷 is dimensionful!

The same dimension as the distance metric,

e.g. GeV.

The Variational Autoencoder:

(13)

The Variational Autoencoder:

Information and the loss function

Loss = 𝒙

𝑜𝑢𝑡

− 𝒙

𝑖𝑛

2

− 2𝛽

2

σ

𝑖

1

2

1 + log 𝜎

𝑖

2

− 𝜇

𝑖

2

− 𝜎

𝑖

2

𝜷 → ∞

No info encoded in latent space

𝜷 ≪ Lengthscale

(14)

The Variational Autoencoder:

Information and the loss function

Loss = 𝒙

𝑜𝑢𝑡

− 𝒙

𝑖𝑛

2

− 2𝛽

2

σ

𝑖

1

2

1 + log 𝜎

𝑖

2

− 𝜇

𝑖

2

− 𝜎

𝑖

2

𝜷 → ∞

No info encoded in latent space

𝜷 ≪ Lengthscale

Info encoded in latent space

2) 𝜷 is the cost for encoding information

The encoder will only encode information about the input to the extent that its usefulness

for

reconstruction

is sufficient to justify the

cost

.

(15)

The Variational Autoencoder:

Information and the loss function

Loss =

𝒙

𝑜𝑢𝑡

− 𝒙

𝑖𝑛

2

Τ

2𝛽

2

− σ

𝑖

1

2

1 + log 𝜎

𝑖

2

− 𝜇

𝑖

2

− 𝜎

𝑖

2

3) 𝜷 is the distance resolution in

reconstruction space

The stochasticity of the latent sampling will smear the

reconstruction at scale ~ 𝛽

(16)

The Variational Autoencoder:

Bananas

Train VAE with a 10 dim latent space on this 2D dataset

(17)

The Variational Autoencoder:

Bananas

Latent Space Reconstruction Space

𝜷 < 𝟎. 𝟏

(18)

The Variational Autoencoder:

Bananas

Latent Space Reconstruction Space

𝜷 < 𝟎. 𝟏

Size = Τ

𝛽

𝜎

The VAE is doing non-linear PCA

(19)

The Variational Autoencoder:

Bananas

Latent Space Reconstruction Space

𝜷 < 𝟎. 𝟏

Size = Τ

𝛽

𝜎

The VAE is doing non-linear PCA

(20)

The Variational Autoencoder:

Bananas

Latent Space Reconstruction Space

𝟎. 𝟏 < 𝜷 < 𝟏. 𝟎

(21)

The Variational Autoencoder:

Bananas

Latent Space Reconstruction Space

𝜷 ≫ 𝟏

(22)

β

The Variational Autoencoder:

Dimensionality

𝐷

1

= ෍

𝑖

𝑑 𝐾𝐿

𝑖

𝑑 log 𝛽

𝐷

2

=

𝑑 ∆𝒙

2

𝑑𝛽

2

The scaling of KL with beta suggests a notion of dimensionality, that relates to how tightly small gaussians are being packed into the latent space.

A similar notion of dimensionality can be derived from the packing of gaussians into the data-space.

(23)

Distance between Jets:

EMD: Cost to transform one jet into another = Energy * distance

arXiv:1902.02346

Video taken from https://energyflow.network/docs/emd/, Eric Metediov, Patrick Komiske III, Jesse Thaler

Defines a metric space in which

jets or collider events form a

geometric manifold.

In practice I use a tractable approximation to EMD called Sinkhorn distance

(24)

Jet VAE

{𝑝

𝑇 𝑖

, 𝑦

𝑖

, 𝜑

𝑖

}

PFN

Large

Latent

space

Dense

{𝑝

𝑇 𝑖

, 𝑦

𝑖

, 𝜑

𝑖

}

Sinkhorn distance ≈ EMD

50 particles 1-100 particles

(25)

W Jets

(26)

W Jets

(27)

W Jets

(28)

Exploring the Learnt Representation:

Top Jets

(29)

Exploring the Learnt Representation:

Top Jets

𝜷 = 𝟒𝟎 𝑮𝒆𝑽

Latent Dimension 3 La ten t Dimension 4
(30)

Exploring the Learnt Representation:

Top Jets

(31)

Exploring the Learnt Representation:

Dimensionality

(32)
(33)

A Mixed Sample

VAE structure

𝑥, 𝑦

Dense

Continuous

{𝑧

𝑖

}

Dense

𝑥, 𝑦

Categorical

𝑐

1

, 𝑐

2

Dense

𝑃(𝑐|𝑥)

𝑃(𝑧|𝑥, 𝑐)

𝑃(𝑥|𝑐, 𝑧)

(34)

A Mixed Sample

VAE structure

Learnt Classifier

Ca

tego

ry 1

Ca

tegor

y 2

(35)

Conclusions

VAE latent spaces learn concrete representations of the manifolds on which they are trained.

A meaningful distance metric which encodes interesting physics at different scales leads to a meaningful learnt representation which encodes interesting physics at different scales.

For a sufficiently simple manifold, the VAE learnt representation is: • Orthogonalized

Hierarchically organized

Has a scale-dependent fractal dimension which directly relates to that of the true data manifold

(36)

The Variational Autoencoder:

Orthogonalization and Organization is Information-Efficient

vs

Orthogonalization:

vs

Organization:

(37)

Exploring the Learnt Representation:

Top Jets

https://energyflow.network/docs/emd/

References

Related documents

A MoU will be established between the Australian Energy Market Commission (AEMC), Australian Energy Regulator (AER) and the Australian Consumer and Competition Commission (ACCC),

To segment the consumers, the Hierarchical cluster analysis using Ward method was used to segment the respondents on the premise that consumers display different

This feature provides the Security Manager with finely tuned control tools for Web, Mail and FTP connections, including Anti-Virus checking for transferred files, access control

training services market, the relationships between the management strategies of education and training providers and the performance thereof, and the characteristics and

Class Presentation – Working in your assigned groups, prepare a class presentation on your selected group work method.. The presentation will describe the ideology and application of

What process-related measures, with respect to organizational goals, can be defined to sat- isfy the information needs of stakeholders in the context of software development

Clothing: Wear appropriate protective clothing to minimize contact with skin.. Respirators: Follow the OSHA respirator regulations found in 29 CFR 1910.134 or European

Our major findings are: (i) MRE11 is important for 5 0 -strand resection of ‘clean’ DNA ends; (ii) in the absence of MRE11, even the first nucleotide at the 5 0 -end cannot