### Recovering Nanostructures at Scale

Abhinav Sarje Computational Research Division Lawrence Berkeley National Laboratory

06.24.15 OLCF User Meeting

### Nanoparticle Systems

• Materials (natural or artificial) made up of nanoparticles.

• Sizes ranging from 1 nanometer to 1000s nanometers.

• Wide variety of applications in optical, electronic and biomedical fields.

E.g.:

• Inorganic nanomaterials in optoelectronics.

• Organic material based nano-devices such as Organic Photovoltaics (OPVs), OLEDs.

• Chemical catalysts, drug design and discovery, biological process dynamics. Importance of structural information:

• Nanomaterials exhibit shape and size-dependent properties, unlike bulk materials which have constant physical properties regardless of size.

• Nanoparticle characterization is necessary to establish understanding and control of material synthesis and applications.

### Nanoparticle Systems

• Materials (natural or artificial) made up of nanoparticles.

• Sizes ranging from 1 nanometer to 1000s nanometers.

• Wide variety of applications in optical, electronic and biomedical fields.

E.g.:

• Inorganic nanomaterials in optoelectronics.

• Organic material based nano-devices such as Organic Photovoltaics (OPVs), OLEDs.

• Chemical catalysts, drug design and discovery, biological process dynamics. Importance of structural information:

• Nanomaterials exhibit shape and size-dependent properties, unlike bulk

materials which have constant physical properties regardless of size.

• Nanoparticle characterization is necessary to establish understanding

### X-ray Scattering at Synchrotrons

### X-Ray Scattering: Complex Examples

Gratings

OPVs

### Computational Problems in Structure Recovery: Inverse Modeling

Start

### Computational Problems in Structure Recovery: Inverse Modeling

Start

Initial guess

forward simulation

### Computational Problems in Structure Recovery: Inverse Modeling

Start Initial guess forward simulation compute error w.r.t. experimental data### Computational Problems in Structure Recovery: Inverse Modeling

Start Initial guess forward simulation compute error w.r.t. experimental data tune model parameters### Computational Problems in Structure Recovery: Inverse Modeling

Start Initial guess forward simulation compute error w.r.t. experimental data tune model parameters### Computational Problems in Structure Recovery: Inverse Modeling

Start Initial guess forward simulation compute error w.r.t. experimental data tune model parameters End### Need for High-Performance Computing

Data generation and analysis gap:

• High measurement rates of current state-of-the-art light beam detectors.

• Wait for days for analyzing data with previous softwares.

• Extremely inefficient utilization of facilities due to mismatch.

### Need for High-Performance Computing

High computational and accuracy requirements:

• Errors are proportional to the resolutions of various computational

discretization.

• Higher resolutions require higher computational power.

• Example:

• O(107)toO(1015)kernel computations for one simulation.

• O(102_{)}_{experiments per material sample.}

• O(10)toO(103)forward simulations for inverse modeling per scattering pattern.

### Need for High-Performance Computing

Science Gap:

• Beam-line scientists lack access to high-performance algorithms and

codes.

• In-house developed codes limited in compute capabilities and

performance.

• Also, they are extremely slow – wait for days and weeks to obtain basic

### Forward Simulations: Computing Scattered Light Intensities

Given:

1 a sample structure model, and

2 experimental configuration,

simulate scattering patterns.

### Inverse Modeling

Forward simulation kernel: computing the scattered light intensities. E.g.

• FFT computations (SAXS)

• Complex form factor and structure factor computations (GISAXS)

Various inverse modeling algorithms:

• Reverse Monte-Carlosimulations for SAXS.

• Sophisticated optimization algorithms for GISAXS. • Gradient based: LMVM (Limited-Memory Variable-Metric.) • Derivative-free trust region-based: POUNDerS.

### Inverse Modeling

Forward simulation kernel: computing the scattered light intensities. E.g.

• FFT computations (SAXS)

• Complex form factor and structure factor computations (GISAXS)

Various inverse modeling algorithms:

• Reverse Monte-Carlosimulations for SAXS.

• Sophisticated optimization algorithms for GISAXS.

• Gradient based: LMVM (Limited-Memory Variable-Metric.)

• Derivative-free trust region-based: POUNDerS. • Stochastic: Particle Swarm Optmization.

Introduction Motivation X-Ray Scattering Inverse Modeling Reverse Monte-Carlo Simulations LMVM & POUNDerS PSO Conclusions

### Reverse Monte Carlo Simulations: Validation

Actual Models

Models

Recovered Models

Introduction Motivation X-Ray Scattering Inverse Modeling Reverse Monte-Carlo Simulations LMVM & POUNDerS PSO Conclusions

### Reverse Monte Carlo Simulations: Validation

Actual Models

Initial Models

### Reverse Monte Carlo Simulations: Validation

Actual Models Initial Models Recovered Models### Reverse Monte Carlo Simulations: Strong and Weak Scaling

Titan 103 104 105 106 107 108 1 2 4 8 16 32 64_{ 128}

_{ 256}

_{ 512}Time [ms]

Number of Nodes (GPUs)

t=1024, s=2048
t=1024, s=512
t=128, s=512 104
105
106
1 2 4 8 _{ 16} _{ 32} _{ 64}
128 256 512 _{ 1024}
Time [ms]

Number of Nodes (GPUs)

t=2p, s=2048
t=p, s=2048
t=2p, s=512
t=p, s=512
Hopper
104
105
106
107
108
109
1010
24 48 96
192 384 768 _{ 1536} _{ 3072} _{ 6144}
12288
Time [ms]
Number of Cores
t=1024, s=2048
t=1024, s=512
t=128, s=2048
t=128, s=512
104
105
106
107
24 48 96
192 384 768 _{ 1536} _{ 3072} _{ 6144}
12288
Time [ms]
Number of Cores
t=2p, s=2048
t=p, s=2048
t=p, s=512
t=p/2, s=512

### Limited-Memory Variable-Metric and POUNDerS

• Methods from the optimization package TAO.

• LMVM is a gradient-based method.

Introduction Motivation X-Ray Scattering Inverse Modeling Reverse Monte-Carlo Simulations LMVM & POUNDerS PSO Conclusions

### Limited-Memory Variable-Metric and POUNDerS: Two Parameter Case

A Single Cylindrical Nanoparticle

X-Ray Scattering Pattern

16 18 20 22 24 x1 16 18 20 22 x2 0 20 40 60 80 100 120

Objective Function Map

16 18 20 22 24 x1 16 18 20 22 24 x2 15 30 45 60 75 90 105 LMVM Convergence Map 16 18 20 22 24 x1 16 18 20 22 24 x2 3 6 9 12 15 18 21 24 27

### Limited-Memory Variable-Metric and POUNDerS: Two Parameter Case

A Single Cylindrical

Nanoparticle X-Ray Scattering Pattern

16 18 20 22 24 x1 16 18 20 22 x2 0 20 40 60 80 100 120

Objective Function Map

16 18 20 22 24 x1 16 18 20 22 24 x2 15 30 45 60 75 90 105 LMVM Convergence Map 16 18 20 22 24 x1 16 18 20 22 24 x2 3 6 9 12 15 18 21 24 27

### Limited-Memory Variable-Metric and POUNDerS: Two Parameter Case

A Single Cylindrical

Nanoparticle X-Ray Scattering Pattern

16 18 20 22 24 x1 16 18 20 22 24 x2 0 20 40 60 80 100 120 140 160 180

Objective Function Map

16 18 20 22 24 x1 16 18 20 22 x2 15 30 45 60 75 LMVM Convergence Map 16 18 20 22 24 x1 16 18 20 22 x2 3 6 9 12 15 18 21

### Limited-Memory Variable-Metric and POUNDerS: Two Parameter Case

A Single Cylindrical

Nanoparticle X-Ray Scattering Pattern

16 18 20 22 24 x1 16 18 20 22 24 x2 0 20 40 60 80 100 120 140 160 180

Objective Function Map

16 18 20 22 24 x1 16 18 20 22 24 x2 15 30 45 60 75 90 105 LMVM Convergence Map 16 18 20 22 24 x1 16 18 20 22 24 x2 3 6 9 12 15 18 21 24 27

### Limited-Memory Variable-Metric and POUNDerS: Six Parameter Case

Pyramidal Nanoparticles forming a

Lattice

X-Ray Scattering Pattern 36 37 38 39 40 41 42 43 44x1 46 48 50 52 x3 6.0 7.5 9.0 10.5 12.0 13.5

Objective Function Map

LMVM does not converge

36 37 38 39 40 41 42 43 44 x1 46 48 50 52 54 x3 12 14 16 18 20 22 24 26 28

### Limited-Memory Variable-Metric and POUNDerS: Six Parameter Case

Pyramidal Nanoparticles forming a

Lattice

X-Ray Scattering Pattern

36 37 38 39 40 41 42 43 44 x1 46 48 50 52 x3 6.0 7.5 9.0 10.5 12.0 13.5

Objective Function Map

LMVM does not converge

36 37 38 39 40 41 42 43 44 x1 46 48 50 52 54 x3 12 14 16 18 20 22 24 26 28

### Limited-Memory Variable-Metric and POUNDerS: Six Parameter Case

Pyramidal Nanoparticles forming a

Lattice

X-Ray Scattering Pattern 36 37 38 39 40 41 42 43 44x1 46 48 50 52 54 x3 6.0 7.5 9.0 10.5 12.0 13.5 15.0 16.5 18.0

Objective Function Map

LMVM does not converge

36 37 38 39 40 41 42 43 44 x1 46 48 50 52 x3 12 14 16 18 20 22 24

### Limited-Memory Variable-Metric and POUNDerS: Six Parameter Case

Pyramidal Nanoparticles forming a

Lattice

X-Ray Scattering Pattern 36 37 38 39 40 41 42 43 44x1 46 48 50 52 54 x3 6.0 7.5 9.0 10.5 12.0 13.5 15.0 16.5 18.0

Objective Function Map

LMVM does not converge

36 37 38 39 40 41 42 43 44 x1 46 48 50 52 54 x3 12 14 16 18 20 22 24 26 28

### Particle Swarm Optimization

• Stochastic method.

• Multiple agents,“particle swarm”, search for optimal points in the

parameter space.

### Particle Swarm Optimization

• Stochastic method.

• Multiple agents,“particle swarm”, search for optimal points in the

parameter space.

### Particle Swarm Optimization: Fitting X-Ray Scattering Data

2 4 6 8 10 12 14 16 18 20 Number of agents 1 4 9 16 25 36 49 64 81 100Parameter space volume

80 160 240 320 400 480 560 640 Function evaluations Fitting 2 Parameters 2 4 6 8 10 12 14 16 18 20 Number of agents 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Parameter space volume

100 200 300 400 500 600 700 800 900 Function evaluations Fitting 6 Parameters

### Particle Swarm Optimization: Performance

0 4 16 36 64 100 144

Search Space Volume ([x1]x[x2])

0 5 10 15 20 25 30

Iterations for Convergence

15 agents 20 agents

Convergence w.r.t. Search Space Volume 4 16 64 256 Number of GPUs 102 103 104 Time [s] sample 1 sample 2

### Particle Swarm Optimization: Agents vs. Generations

0 10 20 30 40 50 Generation 10-12 10-11 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 1010-1 0 Relative Error 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0 10 20 30 40 Generation 20 40 60 80 100 Num. Agents 10-11 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 Relative Error### An Ongoing Work

We saw that:

• GPUs have brought data analysis time from days and weeks to just

minutes and seconds.

• Derivative-based methods converge only for simple cases.

• Trust-region-based methods are very sensitive to initial guess.

• PSO does not require an initial guess.

• PSO is robust, nearly always converging, but expensive.

• Need better optimization algorithms.

• Opening gates to much more sophisticated analyses.

• We are applying machine learning for feature and structural

### An Ongoing Work

We saw that:

• GPUs have brought data analysis time from days and weeks to just

minutes and seconds.

• Derivative-based methods converge only for simple cases.

• Trust-region-based methods are very sensitive to initial guess.

• PSO does not require an initial guess.

• PSO is robust, nearly always converging, but expensive.

• Need better optimization algorithms.

Near future:

• Opening gates to much more sophisticated analyses.

• We are applying machine learning for feature and structural

### Our Current Core Team

• **Alexander Hexemer**,Advanced Light Source, Berkeley Lab.

• **Dinesh Kumar**,Advanced Light Source, Berkeley Lab.

• **Xiaoye S. Li**,Computational Research Division, Berkeley Lab.

• **Abhinav Sarje**,Computational Research Division, Berkeley Lab.

### Our Current Core Team

• **Alexander Hexemer**,Advanced Light Source, Berkeley Lab.

• **Dinesh Kumar**,Advanced Light Source, Berkeley Lab.

• **Xiaoye S. Li**,Computational Research Division, Berkeley Lab.

• **Abhinav Sarje**,Computational Research Division, Berkeley Lab.

• **Singanallur Venkatakrishnan**,Advanced Light Source, Berkeley Lab.

### Further Information

1 **HipGISAXS: a high-performance computing code for simulating**
**grazing-incidence X-ray scattering data**. Journal of Applied
Crystallogrpahy, vol. 46, pp. 1781–1795, 2013.

2 **Massively parallel X-ray scattering simulations**. Supercomputing (SC),

2012.

3 **Tuning HipGISAXS on multi- and many-core supercomputers**.

Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, 2013.

4 **High performance inverse modeling with Reverse Monte Carlo**
**simulations**. International Conference on Parallel Processing, 2014.

### Acknowledgments

• Supported by the Office of Science of the U.S. Department of Energy

under Contract No. DE-AC02-05CH11231.

• Also supported by DoE Early Career Research grant awarded to

Alexander Hexemer.

• Used resources of the Oak Ridge Leadership Computing Facility at the

Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

• Used resources of the National Energy Research Scientific Computing

Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Thank you!

Web: **http://portal.nersc.gov/project/als/hipgisaxs**