arxiv: v4 [math.at] 25 May 2021

(1)

(will be inserted by the editor)

Decorated Merge Trees for Persistent Topology

Justin Curry ¨ Haibin Hang ¨ Washington Mio ¨ Tom Needham ¨ Osman Berat Okutan

Received: date / Accepted: date

Abstract This paper introduces decorated merge trees (DMTs) as a novel invari-ant for persistent spaces. DMTs combine both π0and Hninformation into a single data structure that distinguishes filtrations that merge trees and persistent homol-ogy cannot distinguish alone. Three variants on DMTs, which emphasize category theory, representation theory and persistence barcodes, respectively, offer differ-ent advantages in terms of theory and computation. Two notions of distance—an interleaving distance and bottleneck distance—for DMTs are defined and a hier-archy of stability results that both refine and generalize existing stability results is proved here. To overcome some of the computational complexity inherent in these distances, we provide a novel use of Gromov-Wasserstein couplings to compute optimal merge tree alignments for a combinatorial version of our interleaving dis-tance which can be tractably estimated. We introduce computational frameworks for generating, visualizing and comparing decorated merge trees derived from syn-thetic and real data. Example applications include comparison of point clouds, interpretation of persistent homology of sliding window embeddings of time se-ries, visualization of topological features in segmented brain tumor images and topology-driven graph alignment.

JC would like to thank Rachel Levanger for discussions dating back to 2017 when the module-theoretic and lift-module-theoretic approaches to DMTs were first considered. JC would also like to thank Gabriel Bainbridge for teaching him about the category of parameterized objects during the summer of 2020. Gabe’s use of the parameterized category is set to appear in [4]. Finally, JC would like to acknowledge NSF Grant CCF-1850052 and NASA Contract 80GRC020C0016 for supporting his research. WM acknowledges partial support by NSF grant DMS-1722995. TN would like to thank Facundo M´emoli for useful feedback on an earlier draft of the paper. HH would like to acknowledge NSF grant DMS-1854683.

J. Curry SUNY Albany H. Hang

University of Delaware

W. Mio, T. Needham, O. Okutan Florida State University

(2)

Keywords Topological Data Analysis¨ Persistent Homology ¨Merge Trees ¨

Interleaving Distance

Contents

1 Introduction . . . 2

2 Preliminary Definitions . . . 5

3 Decorated Merge Trees Three Different Ways . . . 14

4 Continuity and Stability of Barcode Decorated Merge Trees . . . 23

5 Representations of Tree Posets and Lift Decorations . . . 36

6 Computing Interleaving Distances . . . 47

7 Algorithmic Details and Examples . . . 55

8 Discussion . . . 64

A Interval Topology . . . 69

B Comparison and Existence of Merge Trees . . . 71

C Algorithm Pseudocode . . . 73

1 Introduction

In this paper we introduce a new set of tools for Topological Data Analysis (TDA) based on the concept of Decorated Merge Trees (DMTs). Not only do these new tools have a rich underlying theory that spans category theory and metric geome-try, but they also provide topological signatures for datasets such as point clouds, time series, grayscale images and networks which are more informative and in-terpretable than standard persistent homology barcodes. Ultimately, the DMTs introduced here surpass state-of-the-art answers to the question “What is the most sensitive, stable and computable invariant of a filtration?” To illustrate the main construction of the paper with a simple example, see Figure 1. Here, two point clouds with different coarse topological structure are depicted. Their traditional TDA signatures—degree-0 and degree-1 Vietoris-Rips persistence diagrams—do not, however, clearly distinguish the point clouds. Our DMT construction illus-trates the multiscale topology of each point cloud by overlaying a merge tree (capturing multiscale connectivity) with a degree-1 persistent homology barcode. This depicts not only the multiscale homological (H1) data of each point cloud, but the (topological) location of each degree-1 feature in the dataset. This paper formalizes the DMT construction from several perspectives and extends classical lines of inquiry in the TDA literature—metric stability, decomposability and prac-tical computational aspects—to this novel setting. We now provide an outline of the paper with an overview of our main results.

(3)

Fig. 1: Decorated Merge Trees. The left column shows two point clouds. Their degree-0 and degree-1 Vietoris-Rips persistence diagrams in the middle column are essentially the same, despite clear topological differences in the point clouds. The decorated merge trees (DMTs) in the third column clearly distinguish the datasets topologically by fusing degree-0 and degree-1 information to track the topological location of the degree-1 features. Each red bar corresponds to a degree-1 persistent feature and its placement in the merge tree indicates the connected components over which the feature persists.

The categorical decorated merge tree—the most abstract variant of the three definitions described above—is the one most compatible with defining interleav-ings, which is historically the way that merge trees and persistent homology bar-codes have been shown to be stable to perturbations. In Section4, we introduce an interleaving distance between decorated merge trees and establish several bounds that set decorated merge trees apart from existing TDA methods. In particular, let f, g : XÑR be functions whose sublevel sets are locally connected. For every homological degree n, one obtains categorical decorated merge trees ˜Fn and ˜Gn, as well as classical merge trees and persistence modules, associated to the sublevel set filtrations of f and g. At a high-level, Theorem3, Theorem4 and Theorem5

allow one to extract the following statement:

Theorem The interleaving distance between ˜Fnand ˜Gnis stable with respect to `∞ -distance between f and g, and is more sensitive than (i.e., lower bounded by) both the interleaving distance between the associated merge trees and the interleaving distance between the associated persistence modules.

(4)

can always restrict a pointwise finite dimensional1 _functor _{F : M}_Ñ _{vect to the}

principal up set at a point pPM to obtain an R-module and thus a barcode. This

motivates the definition of the barcode decorated merge tree, which in the simplest setting attaches a persistent homology barcode to each leaf node of the merge tree. We show in Theorem2that a barcode decorated merge tree can be understood as a Lipschitz map from the merge tree to the space of barcodes. Barcode decorated merge trees are amenable to a theory of matchings (Definition37) and thus a new decorated bottleneck distance (Definition 38), offering a tractably approximable metric for comparing these rich invariants. One of the central results of the paper is Theorem 4, which organizes these new distances along with existing ones to provide a new hierarchy of inequalities.

In Section 5, representation-theoretic aspects of DMTs take center stage. In general, one cannot hope for simple indecomposables such as the barcode decom-positions that appear in standard persistent homology—see Example 13 for an illustration. However, Theorem 7 provides a natural condition that guarantees that a DMT decomposes into simple DMTs with totally ordered support. These real interval decomposable merge trees are especially convenient, as Theorem 8

proves that on the class of real decomposable merge trees, the map taking a DMT to a barcode DMT is injective. This illustrates a positive solution to a topological inverse problem, a recently active subfield in TDA [57]. We also give methods for generating real interval decomposable DMTs directly from data, with theoretical guarantees of their correctness (Propositions9and11).

Section 6 shifts the focus to computational aspects of interleaving distance between decorated merge trees. Inspired by work of Gasparovic et al. [34], we reformulate computation of the distance between leaf decorated merge trees as the search for an alignment between nodes of the trees which is optimal with respect to a certain cost function (Proposition 13). This reformulation allows us to introduce a method for estimating the metric via a convex relaxation which can be solved within the Gromov-Wasserstein framework from optimal transport theory [50]. Our algorithm is novel even when estimating classical interleaving distance between (undecorated) merge trees, but has close connections to other recent advances in the literature [53,51,46].

The computational focus is continued in Section7, where algorithms for com-puting and visualizing decorated merge trees from synthentic and real datasets are described. These algorithms are applied in several computational examples. The accuracy of our merge tree interleaving distance estimation algorithm is es-tablished empirically in Section 7.1. Decorated merge trees are produced from point clouds and are shown to give more informative topological summaries (Sec-tion7.2) and better classification (Section7.5) of point cloud data than standard methods based on persistence diagrams. Our methods are also applied to network data in Section7.6to produce topological summaries of Glioblastoma Multiforme tumor segmentations and in Section7.7to compute topology-driven node corre-spondences between networks.

(5)

particular, to AppendixB, which contains a result on the existence of merge trees that is theoretically fundamental but falls outside the narrative of the main body of the paper. AppendixAcontains technical results on the topology of merge trees and AppendixCcontains pseudocode for some of the main DMT algorithms. The Python code used to produce the figures and experiments for the paper are pub-licly available under an open source license athttps://github.com/trneedham/ Decorated-Merge-Trees.

2 Preliminary Definitions

Topological data analysis (TDA) studies data using the invariants of algebraic topology from a perspective geared toward practical computation. Typically the invariants of algebraic topology are formulated using the language of categories and functors, because this language allows us to abstract away certain implementation details. However, in order to actually compute these invariants, we must commit ourselves to a concrete implementation, along with an algorithm for extracting these invariants. For example, we can define connected components in terms of an equivalence relation, but an imperative algorithm for determining equivalence may require the concrete implementation of a simplicial complex along with the UNION-FIND algorithm.

In this section, we review the constructions of TDA with a twin vision—the concrete and the categorical—so that the reader can understand the variations on our central object of study—the decorated merge tree—in the next section. This object ties together connected component (π0) and homological (Hn) information together in a single data structure that is adapted for persistence. By working with a categorical perspective, we also show that the DMT construction sits firmly within the framework introduced by Bubenik, de Silva and Scott [10]. As such, we begin by reviewing the notion of a generalized persistence module introduced by those authors, before reviewing the concrete examples pertinent to our paper. Definition 1 (cf. [10]) A pre-ordered set (P,ď) is a set P together with a reflexive and transitive relation ď. Reflexive means that p ďp for all pPP and

transitive means that whenever pďqand qďrthen pďr. A partially ordered set, or poset, is a pre-ordered set where the relation ďis also anti-symmetric, i.e. pďqand qďpjointly imply that p = q.

From a categorical perspective, a pre-ordered set is a category which is both small—its collections of objects and morphisms are both sets—and thin—there is at most one morphism between any two objects. Under this perspective, the elements ofP are to be viewed as the objects of a category where there is a unique morphism from p to q if and only if pďq.

Let (P,ď)be a pre-ordered set and let C be any category. A functor F : (P,ď

) Ñ C is called a generalized persistence module over P valued in C. We will usually reserve the term “module” for when C = Vectk, the category of vector spaces over a field k, or some variant of this abelian category. A functor F : (P,ď)ÑVectkwill also be called a P-module.

(6)

2L _R _R _R 2L _R

X

Y

X

s

Y

s

X

_t

Y

_t

H

₀

H

₁ R L Fig. 2: A motivating example for the need for decorated merge trees. Two subsets X and Y of R2along with two times in their offset filtration are shown. Degree-0 and degree-1 persistent homology fails to distinguish them as the number of components and the number of holes are the same across all stages in the filtration. This is witnessed by the barcodes, which are drawn to the right in red and are identical in both examples. The barcode oriented vertically under H0 has two intervals reflecting the fact that two features are “born” at time 0 and one feature “dies” at time L, corresponding to the merge event of the two components. The barcode under H1 has two intervals as well, reflecting the two loops that are born at time 0 that die at time R, corresponding to the radius of the two loops. The goal of this paper is to introduce a new tool, the decorated merge tree, that distinguishes these examples by substituting degree-0 persistent homology with the merge tree, which is then decorated with higher persistent homology barcodes.

a functor requires that these morphisms compose correctly in the sense that whenever pďqďris an ordered triple we have

F(pďr) = F(qďr)˝F(pďq).

Historically, the “categorified” perspective on persistence was restricted to the study of functors out of posets that are subsets of (R,ď), see [11] for example. These include the poset of integers Z, the poset of natural numbers N, or the totally ordered set on n + 1 elements

n :=t0ă1ă ¨ ¨ ¨ ănu.

We will prefer to use the term “persistent” in the setting whereP = R although one can speculate about persistence for other posets as well.

2.1 Persistent Spaces and Filtrations

(7)

Definition 2 A persistent space is a functor F : (R,ď)ÑTop. Comparing with Definition1, this means to each sP_{R a topological space F(s) is given and to each}

ordered pair s ďt a continuous map F(sď t) : F(s) ÑF(t) is also given. These maps are required to preserve composition in the sense noted above, namely, that for an ordered triple rďsďtwe have F(rďt) = F(sďt)˝F(rďs). When each map F(sďt)is an injection, we will call such a persistent space a filtration.

We now describe several situations where persistent spaces arise, focusing on the case of filtrations.

Example 1 (Offset Filtration) Given a subset Z of a metric space X, we can con-sider the offset filtration of Z to be the persistent space given by

FZ(s) :=txPX|d(x, Z) := inf

zPZd(x, z)ďsu,

which we equip with the subspace topology. Clearly, if sďtthen F(s)ĎF(t)and the inclusion map F(sďt) is continuous. In Figure2, two subsets of R2 _{with the} Euclidean metric are shown along with two “times” in the offset filtration. Example 2 (Sublevel-Set Filtration) Given a continuous function f : XÑ R, one can consider the sublevel-set filtration, which is a persistent space where

F(s) := f´1(´∞, s] =txPX|f(x)ďsu.

By equipping each sublevel-set with the subspace topology, the inclusion maps F(s)ĎF(t)are continuous and define the maps F(sďt)for the persistent space.

The astute reader will notice that the sublevel-set filtration includes the offset filtration as a special case.

Example 3 (Offset Function) Given a subset Z of a metric space X, we define the offset function fZ: XÑR to be

fZ(x) := inf zPZd(x, z).

The offset filtration FZcan be viewed as the sublevel-set filtration of fZ.

Sublevel-set filtrations can be extended to certain non-continuous functions on simplicial complexes which arise frequently in topological data analysis.

(8)

2.2 Connected Components, Reeb Graphs and Merge Trees

Just as we view individual spaces through topological lenses, so we can view persis-tent spaces and filtrations through topological lenses. Arguably the most primitive lens is π0—the connected components functor—so the most basic invariant of a persistent space is the persistent set of components. This is a categorical general-ization of the classical merge tree, which we define as the Reeb graph associated to the epigraph of a function. In this section we review each of these constructions, with an eye towards why we want to work with the categorical generalization used here.

Definition 3 (The Connected Components Functor) For any topological space X, the set π0(X) is the set of equivalence classes of X given by the relation x„yif and only if there is a connected subset of X containing them both. Because the continuous image of a connected set is connected we know that π0:TopÑSet specifies a functor—the connected components functor—from the category of topological spaces and continuous maps to the category of sets and set maps.

It should be noted that the set of equivalence classes π0(X) actually carries a quotient topology that is inherited from the topology on X. Typically when studying a space X in isolation, we will ignore this topology and think of π0(X) as a set—or a space with the discrete topology. However, there are situations where tracking π0(X) as it varies among a continuous family of spaces X is of interest. One setting in which this occurs is when considering the collection of level-sets of a function f´1_(s) _{as it varies among real values s} _P

R. The Kronrod-Reeb graph construction, which one might argue is one of the first persistent topological invariants, imposes a topology on the collectiontπ₀(f´1_(s))_u _{in a rather explicit} way.

Definition 4 (Kronrod-Reeb Graph) Given a function f : XÑR we define the Kronrod-Reeb graph, often referred to simply as the Reeb graph, to be the quotient space given by the equivalence relation that identifies two points x, yPX if

– x and y have the same value under f, i.e. f(x) = f(y) = s, and – x and y are in the same connected component of f´1_(s).

We note that since f is constant on equivalence classes we have the following commutative diagram of spaces and maps.

X Rf= Y/„ R q f ˜ f

(9)

Epigraph associated to the offset function apply Reeb graph construction Merge tree associated to the offset function Graph of the offset function offset of p p

Fig. 3: We consider a two point subset Z indicated in blue inside of a gray interval with the Euclidean distance. The offset function measures the distance from a point p to the closest point in Z. The graph of the offset function is drawn in bold green and the epigraph is filled in above. The Reeb graph construction, which identifies two points in the same connected component of a level set, produces the classical notion of a merge tree, drawn to the right.

one-dimensional cell complex; see [66] for a modern treatment. We will mostly be concerned with Reeb graphs that track connected components for a sublevel set filtration. This is accomplished via the following auxiliary construction.

Definition 5 The epigraph of a function f : XÑ_{R is the subspace of X}ˆ_{R that}

lies above the graph of f, i.e.

Ef=t(x, t)PXˆR|f(x)ďtu.

Note that we can always embed X into its epigraph via ιf(x) := (x, f(x)). Moreover there is a natural map πf : Ef ÑR given by projecting onto the second factor. These two maps fit together to give an evident factorization of f : XÑR:

f = πf˝ιf: XÑEfÑR

Definition 6 (The Merge Tree as a Reeb Graph) Given a function f : XÑR the classical merge treeMfis the Reeb graph of the epigraph πf: EfÑR. That is to say, the merge tree is the space that fits in the middle of the following com-mutative diagram of spaces and continuous maps, which comes with a canonical projection map to the real line.

Ef Mf= Ef/„ R q πf ˜ πf

(10)

2.3 Persistent Sets and Generalized Merge Trees

The classical definition of the merge tree is primarily adapted to summarizing the connected components of the sublevel sets of a continuous function on a topological space. By passing through the Reeb graph construction, the merge tree construc-tion leaves one wondering about the quotient topology on the setMf; alternative characterizations of this topology are explored in AppendicesAandB. For many of our purposes, the topology is not as interesting as the fact that Mf naturally carries the structure of a poset. In this section we present the notion of a gener-alized merge tree, which we can associate to any persistent space, independent of whether it is a sublevel set filtration or not. This construction first comes from considering the persistent set of components, which we now define.

Definition 7 (cf. Definition 2.2 of [23] and [13]) A functor S : (R,ď)ÑSet is called a persistent set. If F : (R,ď)ÑTop is a persistent space, then we can define the persistent set associated to F as the composition of F : (R,ď)ÑTop with the connected components functor π0:TopÑSet, i.e.

π0˝F : (R,ď)ÑSet where s ù π0(F(s)).

Remark 1 (Classifying Clustering Schemes) In the paper [13], Carlsson and Mémoli also introduce the term “persistent set” to formalize various notions of hierarchical clustering, most notably single linkage clustering. Their use of the term is slightly different from our own and we highlight this difference now. For Carlsson and Mémoli, a persistent set is a functor from Rě0 to the category of partitions of a finite set X, written Part(X). The objects of this category are partitions of X and the morphisms are refinement of partitions. Our language is slightly more general because π0(F(s)) refers to the partition of F(s) into connected components and we make no assumption about the relation of the spaces F(s) and F(t) other than the fact that they’re connected by a continuous map F(sď t); in particular the map π0(F(sďt))need not be a surjection. It should be noted that Carlsson and Mémoli use the term dendrogram to refer to a persistent set (in their sense) where the partition eventually becomes the singleton subset X, which corresponds to the assumption that in a persistent space F, the space F(t) is connected for large enough t.

Example 6 In Figure4we have reconsidered the merge tree associated to the offset function in purely categorical terms. The offset filtration has a naturally associated persistent set of connected components simply by applying π0.

Although a persistent set S may feel abstract when defined as a functor, it has a concrete realization as a single poset.

Definition 8 (The Display Poset) Associated to any persistent set S : (R,ď

)ÑSet is its display poset, which is simply the disjoint union across all the sets that appear in the definition of S, i.e.

S := ğ

tPR

S(t) :=ďS(t)ˆ ttu

The poset structure onS is defined by declaring

(11)

2L _X 0 XL/2 XL X3L/2 2L 𝜋0(X0) 𝜋0(XL/2) 𝜋0(XL) 𝜋0(X3L/2) apply 𝜋0 functor Persistent space associated to the offset filtration Persistent set associated to the offset filtration

Fig. 4: Two blue points separated by distance 2L in a gray interval are shown to the left. Their offsets at t = L/2, L, 3L/2 are shown as blue slices in the salmon-colored offset filtration; black arrows indicate inclusion of these offsets. Applying the connected components functor π0 yields a persistent set, drawn to the right as green dots connected by black arrows. Although the offsets are inclusions, the number of components reduces to one component at t = L.

Definition 9 (The Generalized Merge Tree) The display poset associated to the persistent set of components π0˝Fis called the generalized merge tree of Fand is denoted by (MF,ď). As a set, we have

MF:=ğ tPR

π0(F(t)) :=

ď

π0(F(t))ˆ ttu.

If iPπ0(F(s))denotes a connected component of F(s) and jPπ0(F(t)) denotes a connected component of F(t), then

(i, s)ď(j, t) if and only if π0(F(sďt))(i) = j.

We conclude this section by showing how the classical construction of a merge tree specifies a persistent set and thus embeds into the generalized framework defined here.

Example 7 Let f : X Ñ R be a function. The epigraph construction defines a persistent space

E : (R,ď)ÑTop where s ù π_f´1(s) :=t(x, s)|f(x)ďsu. Evidently, if sďtthen E(s ďt) is the inclusion of sublevel sets. This persistent space is exactly the same as the sublevel set filtration F associated to f, so E–F. By applying π0 we have the persistent set of components

π0˝E : (R,ď)ÑSet where s ù π˜´f1(s).

(12)

2.4 Persistent Homology Modules

Although π0 expresses connectedness of a space, and thus serves as a proxy for clustering in data, homology (Hn) stands as a richer invariant that captures “holes” in varying dimensions—the dimension being indexed by the natural number n. For example, in degree 1, H1 detects non-contractible “loops” in a space and degree 2, H2detects unfilled “cavities.” For example a sphere S2_{and a torus T}2_{are both} connected so π0(S2_{) = π}

0(T2)but they are distinguished by their homology groups in degree 1—every loop drawn on the surface of the sphere may be contracted down to a point, which is not the case for the torus. Consequently, we have another collection of invariants that we can associate to a persistent space called persistent homology.

Definition 10 If F : (R,ď) Ñ Top is a persistent space, then we have for any non-negative integer ně0 the nth_{persistent homology module Fn}_{: (R,}_ď₎_Ñ Vectk

Fn:= Hn˝F : (R,ď)ÑVectk where s ù Hn(F(s); k). Example 8 In Figure2two subsets X and Y of R2 _{with the Euclidean metric are} drawn in the top left of the figure. Their offset filtrations are shown for two time steps in the filtration. Homology converts these offsets into vector spaces, whose bases are drawn as a barcode to the right. The barcode is defined precisely below in Definition14.

2.5 Indecomposables, Barcodes and Persistence Diagrams

The success of topological data analysis does not come from simply defining func-tors modeled on posets, but rather from having tractable invariants that can be associated to these functors. One of the central objects of study in persistent ho-mology is the barcode (or, equivalently, the persistence diagram), which is a finite combinatorial way of encoding the ranks of the maps Fn(s ďt) : Hn(F(s); k) Ñ Hn(F(t); k) for any pair of real numbers sď t. The existence of the barcode is guaranteed by representation theory, and we now review briefly this result. Definition 11 Given two generalized persistence modules F, G : (P,ď)ÑVectk we can form their direct sum F‘G by taking pointwise their direct sum (F‘

G)(p) = F(p)‘G(p). Similarly, the k-linear map associated to the relation pďq is given by the direct sum of the maps, i.e. (F‘G)(pďq) := F(pďq)‘G(pďq). The ability to sum generalized persistence modules is what makes the category of persistence modules Fun(P, Vectk)an additive category. The existence of a zero object, kernels and cokernels is what makes it also an abelian category. The struc-ture of an additive category is all we need to define indecomposable representations of a poset.

(13)

It is a fact that indecomposable R-modules have a very special form. This motivates the next definition.

Definition 13 An R-interval module is a functor kI : (R,ď)Ñvectk that is supported on an interval IĂ_{R, with kI(s) = k if and only if s}PIand otherwise kI(s) is the zero vector space. We define kI(s ď t) =idk if and only if s, t P I and otherwise kI(s ď t) is the zero map. This definition has a straightforward generalization to the notion of aP-interval module kIwhere IĎP is an interval

in the sense that if p, qPIand pďrďqthen rPIas well.

One of the central results in the theory of TDA is the following theorem of Crawley-Boevey, which provides a simple criterion guaranteeing the existence of barcodes, which are defined precisely below.

Theorem 1 (Thm. 1.1 of [21]) Any R-module of finite dimensional vector spaces, written F : (R,ď)Ñvectk, is isomorphic to a direct sum of interval mod-ules, unique up to permutation of the factors appearing in the direct sum decom-position.

Definition 14 A barcode B =t(I, mI)uis a multiset of intervals in the real line, i.e. IĎ_{R is an interval and mI}P_{N indicates its multiplicity.}

Let Barcodes denote the set of all barcodes.

Remark 2 It is a fact a barcode can be viewed as a functor from (R,ď) to the category of sets and matchings. This means that the collection of barcodes can also be viewed as a category. See [6] for more details.

A direct sum of interval modules is determined up to isomorphism by its mul-tiset of intervals. We then have the automatic corollary of Theorem1:

Corollary 1 Any pointwise finite dimensional R-module F has a uniquely associ-ated barcode B(F).

2.6 Bottleneck Distance

The canonical metric on the space of barcodes is the bottleneck distance. It will be useful to recall the specifics of and to set notation for the computational details of bottleneck distance. A matching of barcodes B and B1_{is a bijection ξ between}

subsets dom(ξ)ĂB and ran(ξ)ĂB1 _{(such a ξ is also commonly referred to as a}

partial bijection between B and B1_{). The cost of a matching ξ is}

max

#

max

IPdom(ξ)}I´ξ(I)}∞,IPBmaxzdom(ξ)}

I}∆, max I1_P_B1_z_ran(ξ)}I 1 }∆ + ,

where, for I and I1 _{with endpoints b}_ď_d_{and b}1_ď_d1_{, respectively,}

}I´I1}∞:=maxt|b´b1|,|d´d1|u and }I}∆:= d´b

2 .

(14)

X

DMT for X

Y

DMT for Y

H1 barcode for X merge tree

for X merge tree for Y _H₁_{barcode for Y}

Fig. 5: The two subsets X and Y of R2_{are recalled from Figure}₂_{. The persistent sets, which} track the evolution of connected components (π0) across the filtration, are drawn as merge trees in green above the subsets X and Y. The barcodes for degree-1 persistent homology are drawn in red next to the appropriate branch of the merge tree where birth and death occurs. Intuitively, what distinguishes X and Y is the fact that for X, both H1 features (“loops”) are born and die on a single component, whereas for Y the two H1 features are born on separate connected components. Neither the merge tree nor the barcodes distinguish these two subspaces, but together they do. This is the motivation for decorated merge trees (DMTs).

Definition 15 The bottleneck distance between B and B1 _is

dB(B, B1) :=inftě0|there exists an -matching of B and B1u.

A matching realizing the bottleneck distance will be called an optimal matching. As a technical point, we observe that the distance between a barcode and the same barcode adjoined with finitely many intervals whose left and right endpoints are equal (i.e., points on the diagonal in a persistence diagram) is zero. We therefore identify barcodes up to this equivalence relation. Since an interval in a barcode is allowed to have an endpoint at infinity, it is possible for dB(B, B1_{) =} _{∞; i.e.,}

bottleneck distance is an extended metric.

3 Decorated Merge Trees Three Different Ways

(15)

while the overlaid barcodes (drawn in red) summarize degree-1 homology and the components where the degree-1 features live. The richer invariants of DMTs are able to distinguish these spaces.

Although the construction of the decorated merge tree may seem intuitive, it turns out that there are multiple ways of tracing births and deaths of homological features along an evolving set of components. To this end we provide three different definitions of a decorated merge tree, along with adjectives to distinguish these.

1. The categorical decorated merge tree relies on the definition of a category of parameterized vector spaces pVect. This definition fits squarely within the framework of generalized persistence modules as it is defined in terms of a functor from (R,ď) to pVect. As we will see later in the paper, defining the interleaving distance and proving various stability theorems comes for free with this definition.

2. The concrete decorated merge tree takes the perspective that the under-lying (generalized) merge tree, along with its poset structure (MF,ď), should define the domain of a functor to Vect, where the homology of each component at each time is recorded. This perspective has the advantage that it displays DMTs via the representation theory of posets that are tree-like, which tradi-tionally is considered a wild object of study. In Section 5, we consider some special situations under which this representation theory can be tamed. 3. The barcode decorated merge tree takes a perspective that is reminiscent

of the persistent homology transform (PHT). As a reminder, the PHT takes in a (tame) subset of Rd_{and associates to each point on the unit sphere S}d´1_the barcode gotten by filtering in that direction. In similar fashion, the barcode DMT associates to each point in the generalized merge tree the barcode gotten by restricting the filtration to the line that starts at that point and stretches to infinity. In other words, a barcode DMT is simply a map

BF:MFÑBarcodes.

This perspective connects with other TDA techniques [43,45] that reduce the study of persistent spaces indexed by higher-dimensional posets such as (R2_,_ď

x

ˆ ďy)by restricting to chains in that poset and looking at barcodes associated to this restriction.

The first perspective is in some ways the cleanest definition, but comes at the expense of abstraction. We proceed with this definition first as we have a chain of decreasing abstraction

categorical Ñ concrete Ñ barcode

that allows us to move from one type of DMT to the other in the above order. There is, however, an equivalence between the concrete and categorical DMT con-structions, which may comfort the reader who finds the categorical construction opaque.

3.1 The Categorical Decorated Merge Tree

(16)

compo-nent. This is expressed in the well known fact that if X–ğXi:= ď Xiˆ tiu then Hn(X)– à i Hn(Xi).

Moreover, if f : XÑY is a continuous map of spaces then we can view f as a map between disjoint unions that send each factor in the domain to a unique factor in the range. In other words, the continuous map

f =\fi: ğ iPπ0(X) XiÑ ğ jPπ0(Y) Yj

can be parameterized by the underlying map of sets π0(f) : π0(X) Ñπ0(Y). This indicates that we can parameterize maps inside of a persistent space along its associated persistent set of connected components. This requires some further restrictions on properties of the topological spaces involved such as local (path) connectedness. First we isolate an important categorical construction.

Definition 16 (The Category of Parameterized Objects) Let C be a cat-egory. The category of discretely parameterized objects in C, written pC, has for objects functors I : SÑC where S is a set viewed as a discrete category, i.e. the only morphisms in S are identity morphisms. The functor I amounts to a choice of object of C for each s P S. We will refer to such a functor as an S-parameterized object. A morphism from an S-S-parameterized object I : SÑC to a T -parameterized object J : T ÑC consists of a map of sets m : S Ñ T and a natural transformation from the functor I to the pullback of J along m, i.e. a morphism is a natural transformation α : Iñm˚_J_{where m}˚_{J := J}_˝_m.

Remark 3 To get a handle on the notion of isomorphism in the category pVect, the reader is encouraged to consider Example9.

We note that if C has coproducts, then pC participates in the following diagram of categories and functors:

pC

Set C

cop dom

The functor dom sends any S-parameterized object I : SÑC to the underlying parameterizing set S. The functor cop sends the diagram I : SÑC to its colimit, which is the coproduct in this case. Before exploiting the above diagram to its fullest potential, we state a result that we’ve been implicitly using in our discussion. Lemma 1 Let pTopc denote the category of discretely parameterized connected spaces and let Toplc denote the category of locally connected spaces. The coproduct functor induces an equivalence between these categories:

cop : pTopcÑToplc where I : SÑTopc ù ğ

(17)

Proof Recall that a locally connected space is by definition one where for each open set UĎX and point xPU, there exists a connected open set V with xPV ĎU. A consequence of this definition is that if Xi is a connected component of X then Xi is open in X. Consequently, a locally connected space X is homeomorphic to the disjoint union of its components. Recognizing that the disjoint union is the same thing as the coproduct in the category Top, this implies that every locally connected space is naturally homeomorphic to the coproduct of its components.

To complete the proof, it suffices to show that cop : pTopcÑToplcis full, faith-ful and essentially surjective [63, Thm. 1.5.9]. The essentially surjective property is true by virtue of the fact that every space is homeomorphic to the coproduct of its components. Full and faithful means that if I : SÑTopc and J : T ÑTopc are two parameterized connected spaces, then the map

HompTopc(I, J)_ÑHom

Toplc( ğ sPS I(s),ğ tPT J(t))

is surjective and injective, respectively. To show surjectivity (fullness), we have to show that every continuous map

f : ğ sPS

I(s)Ñ ğ

tPT J(t)

is realized by some morphism (m, α) in pTopc. Here connectivity of each I(s) is an essential part of the hypothesis because it allows us to associate to each sPS a unique tP T so that f(I(s))ĎJ(t). This specifies the map of sets m : SÑ T. The restriction of the continuous map f to each I(s) specifies the components of a natural transformation α : I ñ m˚_{J. Injectivity is not difficult to see because}

if (m, α) and (n, β) are two morphisms that induce the same map between the disjoint unions, then theoretically they are equal as well. Recalling the set-theoretic definition of the disjoint union, this means that

\αs=\`s: ď sPS I(s)ˆ tsu Ñ ď tPT J(t)ˆ ttu

and in particular that m = n and m˚_{α = n}˚_β.

One consequence of Lemma1is that we can define another functor that serves as sort of “inverse” to cop, up to natural isomorphism.

Definition 17 The parameterized by components functor pbc : ToplcÑpTopc

(18)

An alternative proof to Lemma1 is that

pbc˝cop–id_pToplc and cop˝pbc–id_Toplc.

In fact, the above two identities give the definition of an equivalence of categories used in more modern treatments of category theory [63, Def. 1.5.4]. Checking these two identities tends to be lengthier than the proof of Lemma1given above. The content of [63, Thm. 1.5.9] is that these two proofs are in fact equivalent. Checking the details of how these proofs are equivalent can be gotten by reading [63, Thm. 1.5.9] and substituting the particular categories and functors used here. Instead, we will leverage the above identities to provide our first refinement of persistent spaces into functors from (R,ď)ÑpTopc. This is the heart of the definition of a categorical decorated merge tree.

Lemma 2 Any persistent space F : (R,ď)ÑToplc has an associated persistently parameterized space

˜

F :=pbc˝F : (R,ď)ÑpTopc.

The functor ˜F fits into the following diagram, which commutes up to natural iso-morphism. In particular cop˝˜F–F.

(R,ď) pTopc Set Toplc ˜ F π0˝˜F F – cop dom π0

Proof We start with the last statement about commutativity holding up to natural isomorphism. The natural isomorphism cop˝pbc–id_Toplc from the remark above

can be restricted to the image of F to yield

cop˝pbc˝F–F ô cop˝˜F–F.

Explicitly this means that for every s P R the spaces cop˝˜F(s) and F(s) are homeomorphic. Since homeomorphic spaces have isomorphic sets of components, we know that the generalized merge tree π0˝Fand the persistent set dom˝˜Fare isomorphic as well.

We are now in a position to define the categorical decorated merge tree of a persistent (locally connected) space.

Definition 18 Given a persistent space F : (R,ď)ÑToplc where every space is locally connected, we first apply Lemma2 to obtain a persistently parameterized space ˜F : (R,ď)Ñ pTopc. Applying singular homology in degree n, leads to the categorical decorated merge tree in degree n:

˜

(19)

x

k

2 y

0 I

a

k

b

k

J

x

k

2 a

k

⍺

_x m x

k

2 n

β

_a

I

≇

J

(m*J)(x) (n*I)(a)

S

T

Fig. 6: Associated to the offset filtration of the two subsets X and Y of R2_{from Figure}₅_are two categorical decorated merge trees ˜F1and ˜G1, as defined in Definition18. This figure shows the mechanics of checking if the parameterized vector spaces at offset 0, ˜F1(0) = I : S Ñ Vect and ˜G1(0) = J : T Ñ Vect, are isomorphic. They are not, which proves that our categorical decorated merge tree can distinguish these spaces. See Example9for more details.

Example 9 (Our Motivating Example, Reconsidered) In Figure5we considered the offset filtrations F and G associated to two different subsets of the plane X and Y, respectively. Following Definition18we can associate two categorical decorated merge trees in degree 1, ˜F1 and ˜G1, to X and Y. To verify that ˜F1flG˜1 it suffices to show that their values at filtration value 0, which we denote by I :tx, yu ÑVect and J :ta, bu ÑVect, are not isomorphic in the category pVect.

Recall that two parameterized vector spaces I : S ÑVect and J : T Ñ Vect are isomorphic if there are set maps m : S Ñ T and n : T Ñ S and natural transformations α : Iñm˚_J_{and β : J}_ñ_n˚_I_satisfying

m˚_β_˝_{α =}_id

I and n˚α˝β =idJ; in particular,

n˝m = idS and m˝n =idT.

By considering the parameterized vector spaces at 0 in our example, I :tx, yu Ñ

Vect and J : ta, bu ÑVect, where I(x) = k2, I(y) = 0, J(a) = k and J(b) = k, we can easily show that no isomorphism is possible because any bijection between S =tx, yuand Y =ta, buwill force a linear transformation of the form

k2ÑkÑk2, which can never be an isomorphism.

3.2 The Concrete Decorated Merge Tree

The categorical notion of a decorated merge tree boils down to the following se-quence of assignments: to each real number sPR a set I(s) is assigned and then to each element i P I(s) a (homology) vector space is assigned. This process is reminiscent of specifying an element of Hom(A, Hom(B, C)), which amounts to assigning to each element of A a map from B to C. The reader then might find it useful to consider the adjunction between products and exponentials gotten by currying:

(20)

In this section we work with, in essence, the right hand side of this isomorphism, where AˆBis replaced with the generalized merge treeMFand C is replaced with the category of vector spaces. The analog of currying in this section is concretiza-tion, the namesake of this form of decorated merge trees.

Definition 19 Let G : (R,ď) Ñ pC be a functor from (R,ď) to the category of discretely parameterized objects in C, see Definition 16. By recording the pa-rameterizing sets of G across all real values s P R, we obtain a persistent set dom˝G : (R,ď)ÑSet, which has a display poset, in the sense of Definition8,

MG:= ğ sPR

dom(G(s)) = ď sPR

dom(G(s))ˆ tsu.

We define the concretization of G to be the uniquely associated functor G : (MG,ď)ÑC where (i, s)PMG ù G(s)(i)PC.

Recall that the poset relation (i, s)ď(j, t) inMG means that the morphism from G(s)to G(t) in pC specifies that iPdom(G(s)) is sent to jPdom(G(t)) under a set map m. The natural transformation α : G(s)ñm˚_G(t)_{then includes the data of}

a morphism in C from G(s)(i) to G(t)(j). This proves thatG is actually a functor. We now specialize the above construction to categorical decorated merge trees. Definition 20 (The Concrete Decorated Merge Tree) Given a persistent space

F : (R,ď)ÑToplc

we denote its generalized merge tree by (MF,ď), which is defined as the display poset of π0˝F. The concrete decorated merge tree in degree n is the (MF,ď )-module, or tree module for short,

Fn: (MF,ď)ÑVectk where (i, s) ù Hn(F(s)i; k)

that records the nth _{homology of the i}th _{component of F(s). It is equivalently} defined as the concretization of the categorical decorated merge tree in degree n, cf. Definition18.

Remark 4 The definition of a concrete decorated merge tree can be abstracted in a way that doesn’t refer to the particular homology construction. That is, we can define a concrete decorated merge tree more generally to be a functor F : (MF,ď)ÑVect on a generalized merge tree, considered as a poset category. When dealing with theoretical aspects of decorated merge trees, we tend to deal with these more general objects. This generalized object is also sometimes referred to as a tree module.

(21)

3.3 The Barcode Decorated Merge Tree

To a persistent space F : (R,ď) Ñ Toplc we have already shown how to asso-ciate two (equivalent) devices to record homology in degree n as it varies across components and filtration values sP R: The categorical decorated merge tree of Definition18

˜

Fn: (R,ď)ÑpVect and the concrete decorated merge tree of Definition20

Fn: (MF,ď)ÑVect.

Unfortunately, unlike ordinary persistent homology modules, neither of these de-vices have simple summaries such as barcodes or persistence diagrams. This is due to the fact that the underlying poset (MF,ď)is not totally-ordered. However, if one considers the restriction of a tree moduleFnto the principal up set at a point p = (i, s)PMF, then we do obtain a module indexed by a totally ordered set and

can call this the “barcode at p.” This motivates the following definition, but we will also indicate another procedure for attaching barcodes to points on a merge tree that does not follow this basic idea. As such, we must isolate the following general definition.

Definition 21 (Barcode Decorated Merge Tree) A map from a generalized merge tree to the set of barcodes, written

B : (MF,ď)ÑBarcodes,

is called a barcode decorated merge tree. We say that a barcode decorated merge tree is determined by restriction if whenever (i, s) =: pďq := (j, t)PMF

we have that

B(q) = B(p)X[s,∞).

If the generalized merge tree has leaf nodes, in the sense that every maximal chain in (MF,ď)has a minimal element, then we call a barcode decorated merge tree that is determined by restriction a leaf-decorated merge tree.

Definition 22 (Restricting a Tree Module) Suppose (MF,ď) is the gener-alized merge tree associated to the persistent set π0˝F. Given a tree module F : (MF,ď)ÑVect and a point p = (i, s)PMF, we define the restriction of F

to the principal up set Upto be the R-module

F|Up : (R,ď)ÑVect where, for sďt, F|Up(t) =F(π0˝F(sďt)(i), t).

For răswe definedF|Up(r) =0.

Proposition 1 Assume that the generalized merge tree (MF,ď)has leaves, in the sense that every maximal chain has a minimal element. To any pointwise finite dimensional tree moduleF : (MF,ď)Ñvect, we have a leaf-decorated merge tree

BF : (MF,ď)ÑBarcodes where BF(p) = BC(F|Up).

(22)

v w Pushforward Barcode to ℝ Barcodes “viewed” from v and w v w

F

1

=I

1

I

2 BF1

(v)

BF1

(w)

⨁ v w Pushforward Barcode to ℝ Barcodes “viewed” from v and w v w BG1

(v)

BG1

(w)

G

1

=J

1⨁

J

2

Fig. 7: The Barcode Decorated Merge Tree comes from restricting a tree module to its leaf nodes, one at a time, and then calculating the barcode associated to the restriction of this tree module to the principal up set at each leaf node. In this figure two non-isomorphic tree modules are shown to have identical barcodes, when “viewed” from each of its leaf nodes. This example proves that the association of tree modules to their associated barcode decorations is not injective.

Definition 23 (Barcode Transform) The mapBF whose existence is implied by Proposition1is referred to as the barcode transform of F.

Proof (Proof of Proposition 1) Since the tree module is already pointwise finite dimensional, the restriction at each principal up set Up will be a pointwise fi-nite dimensional R-module. Crawley-Boevey’s Theorem 1 then implies that this restricted tree module has a barcode. Obviously the barcode decoration is deter-mined by restriction because for any pair of comparable points pďqthe restriction at Uq can be obtained by restricting the module at Up.

We have the following immediate corollary.

Corollary 2 If F : (R,ď)ÑTop is a persistent space whose associated generalized merge tree MF has leaves and whose associated concrete decorated merge tree in degree n ě0 Fn : (MF,ď)Ñvect is pointwise finite dimensional, then F has an associated leaf-decorated merge tree in degree n,BFn.

Remark 5 The barcode transform associates an (indexed) ensemble of barcodes to a filtered space through a certain “slicing” operation (i.e., slicing along upsets from leaves). This operation is analogous to several other recent methods in the TDA literature; we provide a few examples here. The persistent homology transform [74,

24] associates to an embedded simplicial complex a barcode for each direction, ob-tained by using projection onto the direction axis as a filtration function. A fibered barcode [45] is a collection of barcodes associated to a multiparameter persistence module by slicing the parameter space by affine lines. The barcode embedding [28,

(23)

There are two main problems with the barcode decorated merge tree construc-tion. The first is that it is expensive to compute. Even for leaf-decorated merge trees, a barcode must be computed for every leaf node even though perhaps only certain branches have interesting bars associated to them. One way to sidestep this problem is to consider lift -decorated merge trees, as defined below in Section

5.4. The second problem is that two non-isomorphic tree modules can produce identical leaf-decorated merge trees. Such an example is considered in Figure 7. Nonetheless, we have found that leaf-decorated merge trees are computable and produce informative summaries in practice—see numerical examples in Section7.

4 Continuity and Stability of Barcode Decorated Merge Trees

At this point a barcode decorated merge tree has been defined to be a map from the (generalized) merge tree of a persistent space or filtration F to the set of all possible barcodes, i.e.

B : MFÑBarcodes.

The barcode associated to a point p = (i, s) in the merge tree is the barcode of the concrete decorated merge tree Fn when restricted to the up set of p in MF. We would like to prove that this map is actually continuous, at least when the codomain Barcodes is equipped with the bottleneck distance. This requires that we review the notion of interleavings, which is the primary way that constructions in topological data analysis are shown to be continuous.

Additionally, we want to prove that the barcode decorated merge tree construc-tion is stable across perturbaconstruc-tions of funcconstruc-tional data. This requires the comparison of concrete DMTs whose underlying merge trees are different. Although the first definition of interleavings of merge trees [54] can be adapted to handle this, the categorical perspective on DMTs outlined in Definition 18, suggests an elegant workaround: by considering functors from the totally ordered poset (R,ď) into the category of parameterized vector spaces pVect we can define interleavings of categorical DMTs in terms of interleavings of functors from (R,ď)to pVect. By applying the forgetful functor from pVect to Set, this determines an interleaving of the underlying merge trees automatically. This interleaving distance also leads naturally to an upper bound for a new distance on barcode decorated merge trees that we call the decorated bottleneck distance. The hierarchy of distances that or-ganizes these structures is one of the main theoretical contributions of this paper.

4.1 Interleavings of Generalized Persistence Modules

One of the benefits of having three different perspectives on decorated merge trees is that they offer different ways of expressing continuity results. In order to define interleavings of functors over a poset we need a notion of a shift operation or translation of a poset.

Definition 24 A translation of a poset (P,ď)is a poset map τ :PÑP where for

all pPP we have pďτ(p). We remind the reader that a map of posets τ :PÑP

(24)

Although a single translation is enough to define interleavings indexed by N, it is more common to work with a whole family of translations indexed by Rě0. Traditionally, this allows one to study topological phenomena at scales indexed by the non-negative real numbers. We present one variation on this notion, although there are many others [26,68].

Definition 25 A strict [0,∞)-action or strict flow on a poset (P,ď)is a one-parameter collection of translations that satisfies a strict composition law. This means that a map

σ(‚,‚) : [0,∞)ˆPÑP

is a strict flow if

– For any P[0,∞) the map σ(,‚) =: σ_:_P_Ñ_{P is a translation, and if} – For any pair , 1_P_[0,_{∞), we have that σ}1

˝σ_{= σ}1₊

. This second condition implies that σ0 ₌_id

P. We will also refer to the image of any pPP under the translation σas p_{. Generally speaking, we will fix a single} strict flow throughout this paper so as not to cause notational confusion between an element that has been shifted, e.g. p_{, and the shift map itself, e.g. σ}_. Example 10 The totally ordered set (R,ď)has an obvious strict flow σR given by sending tPR to t + for any ě0. We call this the standard flow on R. Definition 26 (Shifted Functors) A strict flow σ allows us to act on a functor F :P ÑC in the obvious way: Given any P[0,∞) we define the -shift of F, written F_:_P_Ñ_{C, via pre-composition by σ}_:_P_Ñ_{P, i.e.}

F:= F˝σ:PÑC where F(p) := F(p) and F(pďq) := F(pďq). Of course the -shift operation is functorial in the sense that it sends natural transformations to natural transformations. Recall that a natural transforma-tion of functors α : FñGconsists of a collection of morphisms indexed by pPP

that connect the objects assigned to each pPP, i.e. α(p) : F(p)ÑG(p)and which make the following square commute for every pair pďq:

F(p) α(p) F(pďq)_{// F(q)} α(q) G(p) G(pďq)// G(q).

Since a natural transformation is indexed by elements ofP we can use the trans-lation to index the natural transformation between the shifts of F and G, i.e. α_: F_ñ_G _{is defined by letting α}_{(p) := α(p}_).

(25)

Definition 27 (The Internal -Shift Natural Transformation) If σ(‚,‚)is a strict flow on P and if F : P Ñ C is any functor indexed by P, then for any P[0,∞) we have the internal -shift to F:

η_F: FñF where η_F(p) : F(p)ÑF(p) is F(pďp). We reserve the notation η

F for this particular natural transformation so that there is no confusion between the -shift of a natural transformation and this particular family of natural transformations associated to F. Fortunately, the -shift of this internal natural transformation composes with the internal -shift to match with the internal 2-shift, i.e. (η

F)˝ηF= η2F .

We now can present the notion of an -interleaving of functors.

Definition 28 Fix a poset P and a strict flow σ(‚,‚) : [0,∞)ˆP Ñ P. Two

functors F, G : P Ñ C are -interleaved if there are natural transformations ϕ : FñG _{and ψ : G}_ñ_F_{such that}

ψ˝ϕ = η2_F and ϕ˝ψ = η2_G.

Definition 29 Given two functors F, G : (P,ď)ÑC, we define their interleaving distance as

dI(F, G) = inftą0|Fand G are -interleavedu.

Example 11 (Interleavings of Persistence Modules) We recall that a persistence module is a functor F : (R,ď)ÑVect. Using the standard flow σRon R that sends tPR to t + we can use the above definition to recover the classical interleaving distance of persistence modules as introduced in the landmark paper [15]. Example 12 (Interleavings of Persistent Sets) Suppose F, G : (R,ď)ÑTop are two persistent spaces. These could arrive as the sublevel set filtrations of two functions f : X Ñ R and g : Y Ñ R, for example. Using the standard flow σR on R, we can define the interleaving distance between the associated persistent sets to be the interleaving distance between the functors π0˝F and π0˝G, which are both functors from (R,ď)to Set.

Example 12generalizes the original definition of interleavings of merge trees found in [54], but some of the differences are worth considering. This is taken up in Section4.3. Before that, we show a simple continuity result using the machinery just developed here.

4.2 Continuity of the Decorated Merge Tree

(26)

Definition 30 Suppose (MF,ď)is the generalized merge tree of a persistent space F. We say that two points (i, s) and (j, t) inMF are related at time r if the shift maps of the original persistent set π0˝Fcarry them to a common point, i.e. the equality

(π0˝F)(sďr)(i) = (π0˝F)(tďr)(j) holds. Note that necessarily rěmax(s, t).

Definition 31 Let our generalized merge treeMFbe connected, i.e. any two points (i, s) and (j, t) are related for some time r. We define the `p metric on (MF,ď) for 1ďpă∞ to be dp_M F((i, s), (j, t)) = |s´rinf| p₊_|_t_´_r inf|p 1/p where

rinf:=inftrPR|(i, s) and (j, t) are related at ru. We define the `∞_M_F metric on (MF,ď)to be

d∞_M

F((i, s), (j, t)) = maxt|s´rinf|,|t´rinf|u

where rinf is defined as above.

Using general estimates for `p _{norms on R}2_{, one sees that the `}p _{metrics are} bi-Lipschitz equivalent and therefore induce the same topology. Moreover, we have the following characterization whenMF comes form a sublevel set filtration. Proposition 2 Let f : XÑR be a continuous map such that the associated merge tree Mf has finitely many leaves. Then the topology induced by each `p _metric coincides with the quotient space topology.

We defer the proof of the proposition to Appendix A, since it relies on other technical results about merge tree topologies.

Theorem 2 (Continuity of Barcode DMTs) We assume that (MF,ď) is a connected generalized merge tree associated to a persistent set π0˝F. Endow MF with the extended metric dp_M

F and Barcodes with the bottleneck distance dB. For

any pointwise finite dimensional tree module F : (MF,ď)Ñ vect, the associated barcode decorated merge tree BF : MF Ñ Barcodes is 21´1/p_{-Lipschitz for p} _P [1,∞) and 2-Lipschitz for p = ∞.

Proof Suppose that p = (i, s) and q = (j, t) are carried to z = (k, r) via π0˝F. This means that d1

MF(p, q) ď (r´s) + (r´t) There is an obvious 1 := r´s

interleaving between the R-modules F|Up andF|Uz. To see this, note that there

is a natural morphism of R-modules F|Uz ÑF|Up given by the 0 map up to, but

not including, r. For real values greater than r this morphism is the identity map. The other morphism that participates in an interleaving is given by the internal morphisms fromF|UptoF|

1

Uz. This proves that there is an 1-interleaving between

these restrictions. The exact same argument shows that there is an 2 = r´t interleaving between the restrictions F|Uq and F|Uz. The triangle inequality for

the interleaving distance proves that there is at most an 1 + 2 interleaving betweenF|Up andF|Ur. By Lesnick’s isometry theorem [44], this implies that the

bottleneck distance between BC(F|Up) and BC(F|Ur) is at most 1 + 2. This

proves that the map is 1-Lipschitz with respect to d1

MF and the remaining cases

(27)

4.3 Interleavings of Merge Trees

In [54] the merge tree interleaving distance was defined for merge trees that come from functional data. This meant that interleavings were defined in terms of a shift operation on the epigraphs and continuous maps between the associated merge trees. In this section we review this construction and show how it is subsumed under the more general framework used here.

First, we recall that if f : XÑR and g : YÑR are continuous functions, one can consider their merge trees to be the Reeb graphs associated to their epigraphs πf: EfÑR and πg: EgÑR, i.e.

˜

πf:MfÑR and π˜g:MgÑR This was the construction outlined in Definition6.

To summarize the construction of [54], we note that the epigraph Ef supports an action ρf: [0,∞)ˆEf ÑEf of the additive semi-group [0,∞) of non-negative real numbers given by ρf(, x, t) = (x, t + ); this operation looks like a strict flow, but we have not specified a poset structure on the epigraph so we will use the term “action.” As before, for a fixed ě0, we let ρ

f: Ef ÑEf be the mapping given by (x, t)ÞÑρf(, x, t).

Under the action ρf, the projection map πf: EfÑR is equivariant with respect to the standard flow on R, i.e. πf˝ρf(, x, t) = πf(x, t) + = t + . Since ρf maps level sets of πf to level sets of πf, ρfinduces an action on the corresponding Reeb graphs ηf: [0,∞)ˆMf ÑMf. Letting η_f: Mf Ñ Mf be the mapping given by tÞÑηf(, t), we have that πf˝ρf = ηf ˝πf.

All of the preceding paragraph can be summarized by saying that the following diagram commutes: Ef Ef Mf Mf R R πf ρ f πf ˜ πf η f ˜ πf σ R

We now review the construction of interleavings given in [54] and provide in-line comments of how it can be cast in terms of natural transformations of functors and the general definition of interleavings given in [10], which we use here. Definition 32 (-maps) Let f : X Ñ R and g : Y Ñ R be two continuous functions and letMfandMgbe their associated merge trees. For ě0, we define an -map to be a continuous (with respect to quotient space topologies) map α :MfÑMg such that

˜

πg˝α([x], t) = t +

for all points ([x], t) P Mf. In other words, an -map carries components of

f´1₍_´_{∞, t] to components of g}´1₍_´_{∞, t + ]. This can also be phrased as saying} that an -map is a continuous map α that makes the following diagram commute.

(28)

Remark 6 (-maps are Natural Transformations) Given two arbitrary functions f : XÑ_{R and g : Y}Ñ_{R, not necessarily continuous, we note that their associated}

sublevel set filtrations determine persistent spaces F, G : (R,ď) Ñ Top where F(t) := f´1₍_´_{∞, t] and G is defined similarly. Recall that the persistent set π0}_˝_F is the functor that assigns to each t P R the set of components of F(t), i.e. the components of the sublevel set f´1₍_´_{∞, t]; the map π0}_˝_F(s _ď_t) _{is the map on} components induced by the inclusion f´1₍_´_{∞, s]} _Ď _f´1₍_´_{∞, t]. The action η} f outlined above is exactly the internal shift natural transformation

η_F: π0˝Fñπ0˝F where ηF: π0˝F(t)Ñπ0˝F(t + ) is π0˝F(tďt + ). The exact same comment applies for η

g: it can be viewed as the internal shift natural transformation η

G : π0˝G ñπ0˝G. An -map α : Mf toMg carries components of f´1₍_´_{∞, t] to g}´1₍_´_{∞, t + ]. This is exactly the expression that} αdefines a natural transformation

α : π0˝Fñπ0˝G:= π0˝G˝σR.

The upshot of the above remark is that we never needed to consider the action ηf on the epigraph. It is entirely specified using the standard flow σRto shift the persistent set π0˝F.

In [54] the notion of an -interleaving is couched in the terms of -compatibility. We review this notion now.

Definition 33 (-compatible, cf. [54]) Two -maps α : Mf Ñ Mg and β : MgÑMf are said to be -compatible if β˝α = η2

f and α˝β = η2g .

Lemma 3 If α :MfÑMg and β :MgÑMf are -maps that are -compatible, then they induce an -interleaving between the persistent sets π0˝Fand π0˝Gof the sublevel set filtrations of f : XÑ_{R and g : Y}Ñ_R.

Proof In Remark6we already made the connection that α is equivalent to specify-ing a natural transformation α : π0˝Fñπ0˝Gand that β is likewise tantamount to a natural transformation β : π0˝Gñπ0˝F without referring to continuity. By comparing with Definition28it is now obvious that setting α to be ϕ and β to be ψ and enforcing the -compatibility condition then implies the interleaving condition defined there.

In [54], Morozov, Beketayev and Weber define the interleaving distance between two merge treesMf andMgto be

θMBW_I (Mf,Mg) :=tě0|Mf andMg have -compatible mapsu. We relax this notion slightly to use the more modern definitions found in [23,55]. Definition 34 The interleaving distance between merge treesMf andMg associated to f : XÑ_{R and g : Y} Ñ_{R is defined to be the interleaving distance}

between the persistent sets π0˝F and π0˝G_{where F : (R,}ď) ÑTop is defined by F(t) := f´1₍_´_{∞, t] and G : (R,}_ď₎_Ñ_{Top is defined by G(t) := g}´1₍_´_{∞, t]. We} thus introduce the special notation

(29)

Remark 7 We note that because an -interleaving of persistent sets does not nec-essarily guarantee continuity of the maps between the merge treesMf andMgwe have that

θI(Mf,Mg)ďθMBWI (Mf,Mg).

In the next proposition we provide hypotheses under which these two inter-leaving distances are the same. This fills a gap in the literature that is not usually remarked upon and helps bridge the gap between [54] and [10], for example. We prove that the poset structure on the generalized merge tree is sufficient for speci-fying continuous maps between the merge trees with their inherited topology from the Reeb graph construction, assuming the domain of the spaces are compact and the number of leaves are finite.

Proposition 3 Suppose f : XÑ R and g : Y ÑR are continuous maps defined on compact spaces X and Y so that their merge treesMf and Mg, each defined as the Reeb graph of their respective epigraphs, have finitely many leaves. Let F and Gdenote the sublevel set filtrations of f and g, viewed as persistent spaces where F(t) := f´1₍_´_{∞, t] and G(s) := g}´1₍_´_{∞, s]. Every -interleaving of the persistent} sets π0˝Fand π0˝Gdefines a pair of -compatible maps between the merge trees Mf and Mg. Consequently,

θMBW_I (Mf,Mg) = θI(Mf,Mg) = dI(π0˝F, π0˝G).

Proof Suppose φ : π0˝F ñ π0˝G and ψ : π0˝G ñ π0˝F specify an -interleaving. Now consider the display posets of π0˝F and π0˝G, written MF and MG to distinguish them from the merge trees Mf and Mg. It is obvious thatMF andMf are identical as posets, similarly forMG andMg. Following the discussion in AppendixAwe can equipMFandMGwith the interval topology (see Definition 57). By Lemma10 the principal up set of any point is closed in this topology. SinceMF andMG have finitely many leaves, consider the closed cover of each by the principal up sets at each of the leaf nodes. It is easy to see that the map φ : MF Ñ MG is continuous when restricted to the up set of any leaf node inMF. Since a map is continuous if its restricted to any member of a finite closed cover, we conclude that φ : MF ÑMG is continuous with respect to the interval topology. A completely symmetric argument proves that ψ :MGÑMF is continuous with respect to the interval topology as well. Now by Proposition15we can conclude that the interval topology and the quotient topology are the same, thus φ and ψ can be viewed as -maps (with the continuity assumption) that are -compatible. The claim now follows.

4.4 Stability of Decorated Merge Trees for Functional Data

(30)

Definition 35 An -interleaving of R-spaces f : X Ñ _{R and g : Y} Ñ _{R is}

a pair of continuous maps Φ : X Ñ Y and Ψ : Y Ñ X along with homotopies HX: Xˆ[0, 1]ÑXand HY : Yˆ[0, 1]ÑY connecting the identity maps idX and idY with Ψ˝Φand Φ˝Ψ, respectively. We require further that the following four properties hold for Φ, Ψ, HX and HY:

1. Φ(Xďs)ĎYďs+ for all sPR 2. Ψ(Yďs)ĎXďs+ for all sPR

3. f˝HX(x, t)ďf(x) +2 for all xPXand tP[0, 1] 4. g˝HY(y, t)ďg(y) +2 for all yPY and tP[0, 1]

Definition 36 The functional interleaving distance between R-spaces Xf := f : XÑR and Yg:= g : YÑR is defined as

δI(Xf, Yg) :=inft|Xfand Ygare -interleavedu. If no interleaving exists, we set δI(Xf, Yg) =∞.

The functional interleaving distance is a special case of the homotopy type distance introduced in [33]. This metric was used in [37] to develop persistent ho-mology of R-spaces with different base spaces. A similar metric on R-spaces, called homotopy interleaving distance, was defined in [7], but the exact connection between homotopy type distance and homotopy interleaving distance remains to be studied. The following proposition is proved as part of [33, Proposition 2.11], but we include a proof here for the convenience of the reader.

Proposition 4 If f : XÑR and g : YÑR are R-spaces where we further assume that X = Y, then the functional interleaving distance is bounded above by the sup norm, i.e.

δI(Xf, Yg)ď }f´g}∞.

Proof Let ą }f´g}_∞. We need to show that there exists an -interleaving of the R-spaces f : X Ñ _{R and g : Y} Ñ _{R. Since X = Y we can take Φ and Ψ to}

be the identity maps. The homotopies HX and HY can be taken to be constant at the identy map idX =idY. All that needs to be checked is whether properties (1) and (2) of Definition35hold. In this special setting this is nothing more than requiring that for each sPR the sublevel sets F(s) := f´1(´∞, s] and G(s + ) :=

g´1₍_´_{∞, s+] satisfy the containment relations F(s)}_Ď_G(s+)_{and G(s)}_Ď_F(s+). This is obvious, but worth repeating. If xPF(s)then by definition f(x)ďs. Since

}f´g}_∞ ă we must have that g(x) ď s + . This proves the containment F(s) Ď G(s + ). The reverse containment G(s) Ď F(s + ) follows by the exact same reasoning. Properties (3) and (4) of Definition 35 are tautological, which completes the proof of the proposition.

With the above definition in place, we can now state our first stability result. For this result we assume that F and G are the persistent spaces obtained by considering the sublevel set filtrations of two continuous functions f : XÑ_{R and}