Introduction: big data and partial differential equations

(1)

Preface for the special issue

Big Data and Partial Differential

Equations

Yves van Gennip and Carola-Bibiane Sch¨onlieb

September 21, 2017

Partial differential equations (PDEs) are expressions involving an unknown function in many independent variables and their partial derivatives up to a certain order. Since PDEs express continuous change, they have long been used to formulate a myriad of dynamical physical and biological phenomena: heat flow, op-tics, electrostatics and -dynamics, elasticity, fluid flow and many more. Many of these PDEs can be derived in a variational way, i.e. via minimization of an ‘energy’ functional. In this globalized and technologically advanced age, PDEs are also extensively used for modeling social situations (e.g. models for opinion for-mation, mathematical finance, crowd motion) and tasks in engineering (such as models for semiconductors, networks, and signal and image processing tasks).

In particular in recent years there has been increasing interest from applied analysts in applying the models and techniques from variational methods and PDEs to tackle problems in data science. This issue of the European Journal of Applied Mathematics highlights some recent developments in this young and growing area. It gives a taste of endeavours in this realm in two exemplary contributions on PDEs on graphs [1,2] and one on probabilistic domain decomposition for numerically solving large-scale PDEs [3].

The graph framework Applied mathematics research on graphs in the context of data science starts with the observation that many kinds of discrete data can be represented as a weighted graph or network. This representation is convenient when developing data processing methods as it provides a mathematical structure that one can work with. In 2012 a paper by Andrea Bertozzi and Arjuna Flenner kicked off a boom in applied mathematics research on this topic [4]. In this paper the authors use graph versions of the Ginzburg-Landau functional for data clustering, data classification and image segmentation. Minimization of the classical continuum Ginzburg-Landau functional,

F(u) :=ε

Z

Ω

|∇u|2_dx₊1

ε

Z

Ω

W(u)dx,

provides a model for phase separation. Here W(u) = u2(1−u)2 _{is a double well potential with minima} at u= 0 andu = 1, andu describes the relative presence of the two phases{u≈0} and {u≈1} in the domain Ω. When F is minimized under some suitable constraints onu(e.g. a mass constraint of the form

R

Ωu dx=M) and for small values of the parameterε, uwill take values close to 0 and 1, with transitions between those values occurring in small regions of widthO(ε).

In [4] the graph functional

f(u) := X

i,j∈V

ωij(ui−uj)2+

1

ε

X

i∈V

W(ui)

was introduced. This is a functional whose input argumentuis a function on the nodes of a graph, instead of on a continuum set Ω⊂_Rn _{and which serves as a graph counterpart to}_F_{. Here}_V _{is the node set of the}

graph,ωij is a nonnegative weight on the edge between nodesi andj in a finite, simple, undirected graph

(2)

mass constraint or an additional data fidelity term of the formP

i: training data(ui−utrainingi )

2 _{to cluster or}

classify the nodes of a graph into two groups (‘phases’ whereu≈0 andu≈1) based on the pairwise node similarity encoded in the edge weights ωij. By treating the pixels of an image as nodes in a graph, data

classifcation can be used for image segmentation as well.

Interesting mathematical questions that could arise from such a model are:

1. Can we find graph analogues of properties of the continuum functional?

2. Is the continuum functional a limit of the graph functionals in some sense?

3. What can we say about the resulting algorithm and its usage for data analysis/image processing?

4. Are there other network problems that can be tackled by a PDE inspired approach?

5. Are there other PDE/variational systems that have interesting network analogues? And if the inspiring PDEs are related, are their graph analogues related?

Some of these questions have been considered in the state-of-the-art literature, with some highlights reported in the following.

1. Doesf have similar properties toF? In [5] the authors proved thatf Γ-converges [6, 7], whenε→0, to the graph total variation functional

T V(u) := 1 2

X

i,j∈V

ωij|ui−uj|,

which has as domain the set of node functions u which take values in{0,1}. This mirrors the well-known continuum result [8, 9]. Moreover, for such {0,1}-valued functions u, T V(u) reduces to the graph cut [10] of the node partition V0 = {i : ui = 0}, V1 = {i : ui = 1}, i.e. the sum of the edge

weights ωij corresponding to edges that have one node inV0 and the other inV1.

2. Furthermore, when f or T V are defined on certain graphs for which a sensible continuum limit can be defined, they Γ-converge to the continuum total variation in the continuum limit, e.g. on 4-regular graphs obtained by ever finer discretisations of the flat torus [5] and on point clouds obtained by sampling ever more points from an underlying subset of Rn [11, 12, 13]. In the latter context these

limit results can be interpreted as consistency results that show that the discrete model defined on the samples is asymptotically consistent with a continuum model.

3. Minimization off is in practice (approximately) achieved either by solving a gradient flow equation of Allen-Cahn type,

dui

dt =−

X

j∈V

ωij(ui−uj)−

1

εW

0₍_u

i)

(plus additional terms coming from a mass constraint or fidelity term) or by a graph version of the threshold dynamics (or MBO) scheme [14]:

uk+1=

(

0, if ˜u(τ)< 1₂,

1, if ˜u(τ)≥ 1 2,

where ˜u(t) solves

(_d_u_˜

i

dt =−

P

j∈V ωij(˜ui−u˜j),

˜

u(0) = 0.

In the (spectral) graph theory literature [10,15] (∆u)i :=Pj∈V ωij(ui−uj) is known as the

unnor-malised or combinatorial graph Laplacian ofu. The equations above can also be formulated and solved with normalised versions of the graph Laplacian.

(3)

The construction of the underlying graph in the first place can pose a significant computational prob-lem, especially when the number of data points (and thus nodes in the graph) is very large. Matrix completion techniques such as the Nystr¨om extension [17,18] and fast eigenvalue computation algo-rithms such as the Rayleigh-Chebychev algorithm [19] make such computations feasible.

This graph Ginzburg-Landau method has found many applications, for example in data clustering and classification and image segmentation [4, 16, 20] and has also been extended to deal with clustering and classification into more than two classes [21,22, 23,24, 25]. Recent papers prove convergence of the graph Allen-Cahn algorithm (both the spectrally untruncated and truncated versions) and extend the method to non-smooth potentials and hypergraphs [26,27].

This shows that such PDE driven techniques can provide fast approximative alternatives to combina-torial problems whose exact solution is too computationally complex.

4. Another example of such a problem is the computation of a maximum cut in graphs, i.e. to find a partition of the node set into two sets such that the sum of the edge weights corresponding to edges with one node in each set is maximal. If the graph is bipartite, this corresponds to partitioning the node set according to the bipartite structure. The exact solution of this classical problem is known to be computationally unfeasible for large graphs. Work currently in preparation introduces a fast approximate solution method for this problem using an adaptation of the graph Ginzburg-Landau functional f [28].

5. The continuum counterparts of both the graph Allen-Cahn equation and graph MBO scheme from point 3. can be viewed as approximating mean curvature flow [29,30, 31,32,33]. This suggests that graph curvature and graph mean curvature flow are interesting concepts to consider as well. In [34] the authors introduced both. The graph curvature of a node setS is given by

κi:=

(_P

j∈Scωij, ifi∈S, −P

j∈Sωij, ifi∈S c_,

and the related graph mean curvature flow has a variational formulation along the lines of [35,36,37] which leads to a time discrete evolution of node subsetsS (given an initial setS0),

Sn+1∈argmin_SˆF( ˆS, Sn),

where

F( ˆS, Sn) :=

X

i∈S,j∈Sc

ωij+

1

ðt

X

i∈Sˆ

disdni.

Hereðt >0 is the time step,di is the degree of nodeiandsdni is the signed graph distance from node

i to the boundary of node set Sn. In [34] the authors started studying the very interesting question

wether the graph Allen-Cahn equation, graph MBO scheme and graph mean curvature flow are as intimately connected as their continuum counterparts, but establishing such connections is still mostly an open problem.

Other current work studies a graph version of the Ohta-Kawasaki functional [38], which was originally introduced as a variational model for pattern formation in diblock copolymers [39].

The research on these novel methods has shown that new PDE-inspired graph procedures can efficiently (approximately) solve complex graph problems, while at the same time offering fertile ground for proving theoretical connections between the various graph problems (inspired by similar connections their continuum counterparts have) and between the graph problems and their continuum analogues.

Paper [1] relates to question 2. above. Its authors apply similar ideas to those in [11,12, 13] to prove a consistency result for empirical risk minimization. If a functionu:D→ {0,1} acts as a classifier for points that are sampled fromD⊂Rn according to a distributionν, its empirical risk is

R(u) :=

Z

D×{0,1}

(4)

In [1] the authors prove a consistency result for regularized empirical risk functional, which consists of empir-ical risk regularized by a graph total variation term. They also find different regimes for the regularization parameter associated with the total variation term which relate to the notions of overfitting and underfitting of the data.

Paper [2] is related to question 5. above. It proposes a graph version of the game p-Laplacian [40] as an interpolation between the graph (2-)Laplacian and the graph ∞-Laplacian when 2 ≤ p ≤ ∞ and as an interpolation between the graph 1-Laplacian and the graph 2-Laplacian when 1 ≤ p < 2. It proves the existence and uniqueness of gamep-harmonic functions with given Dirichlet ‘boundary’ conditions on a subset of the nodes and relates the graph gamep-Laplacian to a tug-of-war game. It also shows the results of numerical experiments in which the graph game p-Laplacian is used for semi-supervised segmentation, clustering and image inpainting.

Numerical solution of large-scale PDEs When developing algorithms for data-driven applications the scalability of computational methods is essential. In the context of partial differential equations domain decomposition methods are used to divide a large domain into several smaller subdomains in such a way that the solution to the equation on the full domain can be found (or approximated) via the solutions on the subdomains [41,42, 43,44,45,46]. The initial equation restricted to the subdomains defines a sequence of new local problems which are computationally cheaper to solve. A principal motivation behind this principle is the formulation of PDE solvers which can be easily parallelised. As an example, let us consider a coercive,

Ω2

Γ1

Γ2

[image:4.612.243.370.342.484.2]

Ω1

Figure 1: Overlapping domain decomposition. Example setup from [44].

elliptic and self-adjoint differential operatorLand the boundary-value problem

Lu=f in Ω, u= 0 on∂Ω,

for Ω⊂Rn, withn >1 andf sufficiently nice. The basic Schwarz alternating algorithm [44] to solve this

equation is, starting with an initial guess u0 for the solution, we iterate for k = 0,1, . . ., for subdomains Ω1,Ω2 with interfaces Γ1,Γ2 as in Figure1:

Luk₁+1=f, in Ω1, uk₁+1=uk|Γ1 on Γ1,

(5)

and

Luk₂+1=f, in Ω2, uk₂+1=uk|Γ2 on Γ2,

uk₂+1= 0 on∂Ω2\Γ2,

and the iterateuk+1 _{on the whole domain defined by}

uk+1(x) =

(

uk₂+1(x) ifx∈Ω2, uk₁+1(x) ifx∈Ω\Ω2.

Under certain conditions it can be shown that the iterates converge to the true solution uon Ω. Now, the idea is to distribute the solution of the two subproblems on Ω1 and Ω2 to two processors and computeuk

1 anduk

2 in parallel, exchanging information of the solution on the interfaces of the two subdomains between the two processors after every iteration.

In [3] the authors review a particular class of domain decomposition methods which is called probabilistic domain decomposition (PDD), pioneered by Acebron et al. in [48]. The main idea is to use a stochastic rep-resentation of the PDE (via the so-called Feynman-Kac formula), then compute the solution in a few sampled points on the interfaces between the subdomains via Monte Carlo, and then use an efficient deterministic PDE solver for the solution of the PDE on each subdomain with fixed boundary conditions coming from the previously computed Monte Carlo simulation. This probabilistic setup renders a parallelisation strategy for solving PDEs that shows very good scalability properties. Indeed, since the solutions of the PDE on the subdomains are completely independent of each other the PDD method can solve the subproblems fully in parallel and hence does not require communication between processors. Also, the choice of the PDE solver on each subdomain is flexible and so can potentially be executed very efficiently. The paper [3] also serves as an introduction to the concept of PDDs and their various usage areas in data-driven applications.

Conclusion The contributions in this issue are examples of modern research topics in partial differential equations that arise in data-driven applications. They nicely show the various facets of this topic, from PDEs as inspiration for and tools to prove consistency of graph based data processing methods to the development of efficient and scalable numerical methods for solving PDEs for large-scale and high-dimensional data. We believe we are only at the beginning of this exciting and important new area of research. The increasing number of open questions in all areas of data science will make mathematical frameworks, like the one provided by PDEs, more and more attractive. PDEs and variational methods have an important role to play in finding answers to these questions and in the development of adaptive, rigorous and efficient data processing and machine learning methods.

Acknowledgement This preface is an adaptation and extension of the “extended abstract” [49] by YvG. Moreover, CBS acknowledges support from the Leverhulme Trust project Breaking the non-convexity barrier, the EPSRC grant EP/M00483X/1, EPSRC centre EP/N014588/1, the Cantab Capital Institute for the Mathematics of Information, the CHiPS (Horizon 2020 RISE project grant), the Global Alliance project ‘Statistical and Mathematical Theory of Imaging’ and the Alan Turing Institute.

References

[1] N.G. Trillos and R. Murray, A new analytical approach to consistency and overfitting in regularized empirical risk minimization, EJAM 2017.

(6)

[3] F. Bernal, G. dos Reis, and G. Smith,Hybrid PDE solver for data-driven problems and modern branch-ing, EJAM 2017.

[4] A.L. Bertozzi, and A. Flenner,Diffuse interface models on graphs for analysis of high dimensional data, Multiscale Modeling and Simulation10(3) (2012), 1090–1118.

[5] Y. van Gennip and A.L. Bertozzi, Γ-convergence of graph Ginzburg-Landau functionals, Advances in Differential Equations17(11/12)(2012), 1115–1180.

[6] G. Dal Maso,An introduction to Γ-convergence, Birkh¨auser, 1993

[7] A. Braides, Γ-convergence for beginners, Oxford University Press, 2002

[8] L. Modica and S. Mortola, Un esempio di Γ-convergenza, Bollettino dell’Unione Matematica Italiana

5(14-B)(1977), 285–299.

[9] L. Modica, The gradient theory of phase transitions and the minimal interface criterion, Archive for Rational Mechanics and Analysis98(2)(1987), 123–142.

[10] U. von Luxburg,A tutorial on spectral clustering, Statistics and Computing17(4) (2007), 395–416.

[11] N. Garcia Trillos and D. Slepˇcev,Continuum limit of total variation on point clouds, Archive for Rational Mechanics and Analysis220(1)(2016), 193–241.

[12] N. Garcia Trillos, D. Slepˇcev, J. von Brecht, T. Laurent, and X. Bresson,Consistency of Cheeger and Ratio Graph Cuts, Journal of Machine Learning Research17(2016), 1–46.

[13] M. Thorpe and F. Theil,Asymptotic Analysis of the Ginzburg-Landau Functional on Point Clouds, to appear, arXiv preprint arXiv:1604.04930

[14] B. Merriman, J.K. Bence, and S.J. Osher,Motion of multiple functions: a level set approach, Journal of Computational Physics112(2)(1994), 334–363.

[15] F. Chung,Spectral Graph Theory, American Mathematical Society, 1997.

[16] E. Merkurjev, T. Kostic, and A.L. Bertozzi, An MBO scheme on graphs for segmentation and image processing, SIAM Journal on Imaging Sciences6(4)(2013), 1903–1930.

[17] E.J. Nystr¨om, _{Uber die Praktische Aufl¨}¨ _{osung von Linearen Integralgleichungen mit Anwendungen auf} Randwertaufgaben der Potentialtheorie, Commentationes Physico-Mathematicae4(15) (1928), 1–52.

[18] C. Fowlkes, S. Belongie, F. Chung, and J. Malik, Spectral grouping using the Nystrom method, IEEE Transactions on Pattern Analysis and Machine Intelligence26(2)(2004), 214–225.

[19] C.R. Anderson, A Rayleigh–Chebyshev procedure for finding the smallest eigenvalues and associated eigenvectors of large sparse Hermitian matrices, Journal of Computational Physics 229(19) (2010), 7477–7487.

[20] L. Calatroni, Y. van Gennip, C.-B. Sch¨onlieb, H. Rowland, and A. Flenner,Graph clustering, variational image segmentation methods and Hough transform scale detection for object measurement in images, Journal of Mathematical Imaging and Vision57(2)(2017), 269–291.

[21] C. Garcia-Cardona, A. Flenner, and A.G. Percus,Multiclass diffuse interface models for semi-supervised learning on graphs, Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods (ICPRAM 2013) (2013), 78–86.

(7)

[23] C. Garcia-Cardona, E. Merkurjev, A.L. Bertozzi, A. Flenner, and A.G. Percus,Multiclass data segmen-tation using diffuse interface methods on graphs, IEEE Transactions on Pattern Analysis and Machine Intelligence36(2014), 1600–1613.

[24] C. Garcia-Cardona, A. Flenner, and A.G. Percus, Multiclass semi-supervised learning on graphs using Ginzburg-Landau functional minimization, Advances in Intelligent Systems and Computing318(2015), 119–135.

[25] Z. Meng, E. Merkurjev, A. Koniges, and A.L. Bertozzi,Hyperspectral Image Classification Using Graph Clustering Methods, Image Processing On Line7(2017), 218–245.

[26] X. Luo and A.L. Bertozzi,Convergence of the graph Allen–Cahn scheme, Journal of Statistical Physics

167(3)(2017), 934–958.

[27] J. Bosch, S. Klamt, and M. Stoll,Generalizing diffuse interface methods on graphs: non-smooth poten-tials and hypergraphs, arXiv preprint arXiv:1611.06094.

[28] B. Keetch and Y. van Gennip, A Max-Cut approximation using a graph based MBO scheme, in prepa-ration

[29] K.A. Brakke,The motion of a surface by its mean curvature, Princeton University Press (1978)

[30] L. Bronsard and R.V. Kohn, Motion by mean curvature as the singular limit of Ginzburg-Landau dy-namics, Journal of Differential Equations90(2)(1991), 211–237.

[31] L.C. Evans, Convergence of an algorithm for mean curvature motion, Indiana University Mathematics Journal42(2)(1993), 533-557.

[32] G. Barles, H.M. Soner, and P.E. Souganidis, Front propagation and phase field theory, SIAM Journal on Control and Optimization31(2)(1993), 439–469.

[33] G. Barles and C. Georgelin,A simple proof of convergence for an approximation scheme for computing motions by mean curvature, SIAM Journal of Numerical Analysis32(2)(1995), 484–500.

[34] Y. van Gennip, N. Guillen, B. Osting, and A.L. Bertozzi, Mean Curvature, Threshold Dynamics, and Phase Field Theory on Finite Graphs, Milan Journal of Mathematics 82(1)(2014), 3–65.

[35] F. Almgren, J.E. Taylor, and L. Wang, Curvature-driven flows: a variational approach, SIAM Journal on Control and Optimization31(2)(1993), 387–438.

[36] S. Luckhaus and T. Sturzenhecker, Implicit time discretization for the mean curvature flow equation, Calculus of Variations and Partial Differential Equations3(2)(1995), 253–271.

[37] J.E. Taylor,Anisotropic interface motion, Mathematics of Microstructure Evolution4(1996), 135–148.

[38] T. Ohta and K. Kawasaki,Equilibrium Morpholoy of Block Copolymer Melts, Macromolecules19(1986), 2621–2632.

[39] Y. van Gennip,An MBO scheme for minimizing the graph Ohta-Kawasaki functional, in preparation

[40] Y. Peres and S. Sheffield, Tug-of-war with noise: A game-theoretic view of the p-Laplacian, Duke Mathematical Journal145(1)(2008), 91–120.

[41] P. L. Lions, On the Schwarz alternating method, I. In First international symposium on domain de-composition methods for partial differential equations (1988), 1–42.

(8)

[43] J. Xu,Iterative methods by space decomposition and subspace correction, SIAM review,34(4)(1992), 581–613.

[44] T. F. Chan, and T. P. Mathew,Domain decomposition algorithms, Acta Numerica,3(1994), 61–143.

[45] A. Quarteroni, and A. Valli,Domain decomposition methods for partial differential equations, Numerical Mathematics and Scientific Computation, The Clarendon Press Oxford University Press, New York (1999), Oxford Science Publications.

[46] B. Smith, P. Bjorstad, P., and W. Gropp,Domain decomposition: parallel multilevel methods for elliptic partial differential equations, Cambridge University Press (2004).

[47] Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM Journal on Optimization,22(2)(2012), 341–362.

[48] J.A. Acebron, M.P. Busico, P. Lanucara, and R. Spigler, Domain decomposition solution of elliptic boundary-value problems via Monte Carlo and quasi-Monte Carlo methods, SIAM J. Sci. Comput.27(2)

(2005).

[49] Y. van Gennip, Using evolving interface techniques to solve network problems, in Mathematisches Forschungsinstitut Oberwolfach Report: Emerging Developments in Interfaces and Free Boundaries