Lagrangian Function on the Finite State Space Statistical Bundle

(1)

Lagrangian Function on the Finite State Space

Statistical Bundle

Giovanni Pistone1ID

1 _{de Castro Statistics, Collegio Carlo Alberto, Piazza Vincenzo Arbarello 8, 10122 Torino, Italy;}

[email protected]

Abstract:The statistical bundle is the set of couples(Q,W)of a probability densityQand a random 1

variableWsuch thatEQ[W] =0. On a finite state space, we assumeQto be a probability density with 2

respect to the uniform probability and give an affine atlas of charts such that the resulting manifold is 3

a model for Information Geometry. Velocity and accelleration of a one-dimensional statistical model 4

are computed in this set up. The Euler-Lagrange equations are derived from the Lagrange action 5

integral. An example of Lagrangian using minus the entropy as potential energy is briefly discussed. 6

Keywords:Information Geometry; Statistical Bundle; Lagrangian function 7

1. Introduction 8

The set-up of classical Lagrangian Mechanics is a finite-dimensional Riemannian manifold. For 9

example, see the monographs by V.I. Arnold [1, Ch. III-IV], R. Abraham and J.E. Mardsen [2, Ch. 3], 10

J.E. Marsden and T.S. Ratiu [3, Ch. 7]. Classical Information geometry, as it was firstly defined in 11

the monograph by S.-I. Amari and H. Nagaoka [4], views parametric statistical models as a manifold 12

endowed with a dually-flat connection. In a recent paper, M. Leok and J. Zhang [5] have pointed out 13

the natural relation between these two topics and have given a wide overview of the mathematical 14

structures involved. 15

In the present paper, we take up the same research program with two further qualification. First, 16

we assume a non-parametric approach by considering the full set of positive probability functions on a 17

finite set, as it was done, for example, in our review paper [6]. The discussion is restricted here to a 18

finite state space to avoid difficult technical problems. Second, we consider a specific expression of 19

the tangent space of the statistical manifold, which is an Hilbert bundle that we call statistical bundle. 20

Our aim is to emphasize the basic statistical intuition of the geometric quantities involved. Because 21

of that, we choose to use systematically the language of non-parametric differential geometry as it is 22

developed in S. Lang monograph [7]. 23

We use here our version of Information Geometry, see the review paper [6]. Preliminary versions 24

of this paper have been presented at the SigmaPhy2017 Conference held in Corfu, Greece, Jul. 10-14 25

2017 and at a seminar held at Collegio Carlo Alberto, Moncalieri, on Sep. 5, 2017. In these early 26

versions we did not refer to Leok and Zhang work we where unaware of at that time. 27

In Sec.2we review the definition and properties of the statistical bundle, and of the affine atlas 28

that endows it with both a manifold structure and a natural family of transports between the fibers. In 29

Sec.3we develop the formalism of the tangent space of the statistical bundle and derive the expression 30

of the velocity and the acceleration of a one-dimensional statistical model in the given affine atlas. The 31

derivation of the Euler-Lagrange equations, together with a relevant example, is discussed in Sec.4. 32

2. Statistical bundle 33

We consider a finite sample spaceΩ, with #Ω=N. The probability simplex is∆(Ω)and∆◦(Ω) 34

is its interior. The uniform probability onΩis denotedµ,µ(x) = _N1,x∈Ω. Themaximal exponential 35

(2)

familyE(µ)is the set of all strictly positive probability densities of(Ω,µ). The expected value of 36

f:Ω→_Rwith respect to the densityP∈ E(µ)is denotedEP[f] =Eµ[f P] = N1 ∑x∈Ω f(x)P(x).

37

In [6,8,9] we made the case for the statistical bundle being the key structure of Information 38

Geometry. Thestatistical bundlewith baseΩis 39

SE(µ) =(Q,V)Q∈ E(µ),E_Q[V] =0 .

The statistical bundle is a semi-algebraic subset ofR2Ni.e., it is defined by algebraic equations 40

and strict inequalities. It is trivially a real manifold. At eachQ∈ E(µ)the fiberSQE(µ)is endowed 41

with the scalar product 42

(V1,V2)7→ hV1,V2iQ=EQ[V1V2] =CovQ(V1,V2) .

We add to this structure a special affine atlas of charts in order to show a structure of affine 43

manifold which is of interest in the statistical applications. Theexponential atlas of the statistical 44

manifoldSE(µ)is the collection of charts given for eachP∈ E(µ)by 45

sP:SE(µ)3(Q,V)7→(sP(Q),eUPQV)∈SPE(µ)×SPE(µ) , (1) where (with a slight abuse of notation)

46

sP(Q) =logQ P −EP

logQ

P

, eUPQV=V−EP[V] . (2)

AssP(P,V) = (0,V), we say thatsPis the chartcentered at P. IfsP(Q) =U, it is easy to derive the 47

exponential form ofQas a density with respect toP, namelyQ=eU−EP

h logQ_Pi

·P. AsEµ[Q] =1, then

48

1=_Ep

eU−EP

h log_QPi

=_EpeUe

−_E_Phlog_QPi

, so that thecumulant function KPis defined onSPE(µ) 49

by 50

Kp(U) =logEP

h

eUi=_EP

logP

Q

=D(PkQ) ,

that is,KP(V)is the expression in the chart atPof Kullback-Leibler divergence ofQ7→D(PkQ), and 51

we can write 52

Q=eU−KP(U)_·_P₌_e

P(U). Thepatch centered at Pis

53

s−_P1=eP:(SPE(µ))23(U,W)7→(eP(U),eUePP(U)W)∈SE(µ) .

In statistical terms, the random variable log(Q/P)is the relative point-wise information aboutQ 54

relative to the referenceP, whilesP(Q)is the deviation from its mean value atP. The expression of the 55

other divergence in the chart centered atPis 56

D(QkP) =EQ

logQ

P

=EQ[U−KP(U)] =EQ[U]−KP(U).

The equation above shows that the two divergences are convex conjugate functions in the proper 57

charts, see [10]. 58

(3)

sP2◦eP1(U,W) =sP2

eP1(U),

e

UeP1P(U)W

=sP2

eU−KP₁(U)_·_P_1,_W₋

EeP₁(U)[W]

=

U−KP1(U) +log P1 P2

−_EP2

U−KP1(U) +log P1 P2

,W−_E_e_P

1(U)[W]−EP2

h

W−_E_e_P

1(U)[W]

i =

e

UPP21U+sP2(P1), e

UPP21W

,

so that the exponential atlas is indeed affine. Notice that the linear part iseUPP21. 60

3. The tangent space of the statistical bundle 61

Let us compute the expression of the velocity at time t of a smooth curve t 7→ γ(t) = 62

(Q(t),W(t))∈SE(µ)in the chart centered atP. The expression of the curve is 63

γP(t) =sP(γ(t)) =

sP(Q(t)),eUPQ(t)W(t)

,

and hence we have, by denoting the derivative inRNby the dot, 64

d

dtsP(Q(t)) = d dt

logQ(t) P −EP

logQ(t) P

= Q˙(t)

Q(t)−EP ˙

Q(t)

=eUP_Q₍_t₎ ˙ Q(t)

Q(t), (3) and

65

d dt

e

UP_Q₍_t₎W(t) = d

dt(W(t)−EP[W(t)]) =W˙(t)−EP

_˙

W(t)

=eUP_Q₍_t₎

˙

W(t)−EQ(t) _˙

W(t) . (4)

If we define thevelocityoft7→Q(t) =eU(t)−Kp(U(t))_·_P_{to be}

66

?

Q(t) = Q˙(t)

Q(t) =

d

dtlogQ(t) =U˙(t)−dKP(U(t))[U˙(t)]∈SQ(t)E(µ) ,

thent 7→ (Q(t),Q?(t)) is a curve in the statistical bundle whose expression in the chart centered 67

atPist 7→ (U(t), ˙U(t)). The velocity as defined above is nothing else as thescore function of the 68

one-dimensional statistical model see e.g., the textbook by B. Efron and T. Haste [11, §4.2]. The 69

variance of the score function i.e. the squared norm ofQ?(t)inSQ(t)E(µ)is classically known asFisher 70

informationatt. 71

We define thesecond statistical bundleto be 72

S2E(µ) =(Q,W,X,Y)(Q,W)∈SE(µ),X,Y∈SQE(µ) , with charts

73

sP(Q,V,X,Y) =

sP(Q,V),eUPQX,eUPQY

,

we can identify the second bundle with the tangent space of the first bundle as follows. 74

For each curvet7→γ(t) = (Q(t),W(t))in the statistical bundle, define itsvelocity at tto be 75

?

γ(t) =

Q(t),W(t), ?

Q(t), ˙W(t)−EQ(t) _˙

W(t) ,

becauset7→γ?(t)is a curve in the second statistical bundle and that its expression in the chart atPhas 76

the last two components equal to the values given in Eq.s (3) and (4). 77

In particular, consider the a curvet7→χ(t) = (Q(t),

?

(4)

?

χ(t) =

Q(t),Q?(t),Q?(t),Q∗∗(t) , where theaccelerationQ∗∗(t)is

79

∗∗

Q(t) = d

dt ˙ Q(t)

Q(t)−EQ(t)

d dt

˙ Q(t)

Q(t)

= Q¨(t)

Q(t)− ?

Q(t)2−_E_Q₍_t₎hQ?(t)2i (5) It should be noted that the acceleration has been defined without explicitly mentioning the 80

relevant connection. In fact, the connection here is implicitly defined by the transportseUQP, which 81

is unusual in Differential Geometry, but is quite natural from the probabilistic point of view, see P. 82

Gibilisco and G. Pistone [12]. We shall see below that the non-parametric approach to Information 83

Geometry allows to define a dual transport, hence a dual connection as it was in [4]. because of that. 84

we could have defined other types of acceleration together with the one we have defined. Namely, we 85

could consider anexponential accelerationeD2_Q₍_t_{) =}_Q∗∗₍_t₎_{, a}_{mixture acceleration}m_D2_Q₍_t_{) =}_Q_¨₍_t₎_/_Q₍_t₎_, 86

and aRiemannian acceleration 87

0_D2_Q₍_t_{) =} 1 2

e_D2_Q₍_t_{) +}m_D2_Q₍_t₎₌ Q¨(t) Q(t)−

1 2

˙

Q(t)

Q(t) 2

−EQ(t) "

˙

Q(t)

Q(t) 2#!

, (6)

each acceleration being associated with a specific connection, see the review paper [6]. We do not 88

further discuss the different second order geometries associated to the statistical bundle in this paper. 89

Example 1(Boltzmann-Gibbs). Let us compare the formalism we have introduced above with standard 90

computations in Statistical Physics. TheBoltzmann-Gibbs distributiongives to pointx ∈Ωthe probability 91

e−(1/θ)H(x)_/_Z₍_θ₎_{, with}_Z₍_θ_{) =}∑

x∈Ωe−(1/θ)H(x)andθ>0, see Landau and Lifshits [13, Ch. 3]. As a 92

curve inE(µ), it isQ(θ) =Ne−(1/θ)H/Z(θ)because of the reference to the uniform probability. The 93

velocity defined above becomes in this caseQ?(θ) =θ−2(H−Eθ[H]), while the acceleration of Eq. (5)

94

isQ∗∗(θ) =−θ−3(H−Eθ[H]). Notice that we have the equationθ ∗∗

Q(θ) +

?

Q(θ) =0. 95

Following the original construction of Amari’s Information Geometry [4], we have defined on 96

the statistical bundle a manifold structure which is both an affine and a Riemannian manifold. The 97

base manifoldE(µ)is actually an Hessian manifold with respect to any of the convex functions 98

Kp(U) =logEpeU,U∈ SpE(µ), see [14]. Many computations are actually performed using the 99

Hessian structure. The following equations are easily checked and frequently used 100

EeP(U)[H] =dKP(U)[H]; (7)

e

UePP(U)H=H−dKP(U)[H]; (8)

d2KP(U)[H1,H2] = D

e

UePP(U)H1,eUePP(U)H2

E

eP(U)

; (9)

d3Kp(U)[H1,H2,H3] =EeP(U)

h e

UPeP(U)H1·eUPeP(U)H2·eUePP(U)H3 i

. (10)

We have defined a centering operation that can be thought of as atransportamong fibers, 101

e

UQP:SpE(µ)→SqE(µ) , whose adjoint ismUqpV= qpV. In fact, is the adjoint ofeU

(5)

D_e

UQPU,V

E

Q=EQ

(U−_EQ[U])V=EQ[UV] =EP

U

Q PV

=DU,mUPQV

E

P

Moreover, iffU,V∈SPE(µ), then 103

D e

UQPU,mU Q PV

E

Q=

D e

UPQeU Q PU,V

E

P=hU,ViP .

Example 2(Entropy flow). This example is taken from [8]. In the scalar fieldH(Q) =−_E_Q[logQ] 104

there is no dependence on the fiber. Ift 7→ Q(t) = eV(t)−KP(V(t))_·_P_{is a smooth curve in} _E₍_µ₎

105

expressed in the chart centered atP, then we can write 106

H(Q(t)) =−EQ(t)[V(t)−KP(V(t)) +logP] =

KP(V(t))−EQ(t)[V(t) +logP+H(P)] +H(P) =

KP(V(t))−dKP(V(t))[V(t) +logP+H(P)] +H(P) , (11) where the argument of the last expectation belongs to the fiberSPE(µ)and we have expressed the 107

expected value as a derivative by using Eq. (7). 108

Using again Eq. (7), and also Eq. (9) we compute the derivative of the entropy along the given 109

curve as 110

d

dtH(Q(t)) = d

dtKP(V(t))− d

dtdKP(V(t))[V(t) +logP+H(P)] =

dKP(V(t))[V˙(t)]−d2KP(V(t))[V(t) +logP+H(P), ˙V(t)]−dKP(V(t))[V˙(t)] = −_E_Q₍_t₎he

UQP(t)(V(t) +logP)eU Q(t) P V˙(t)

i .

We use now the equations 111

V(t) +logP=logQ(t) +KP(V(t)), eUPQ(t)(logQ(t) +KP(V(t))) =logQ(t) +H(Q(t)) , andeUQP(t)V˙(t) =

?

Q(t), to obtain 112

d

dtH(Q(t)) =−

D

logQ(t) +H(Q(t)),Q?(t)E

Q(t) .

We have identified the gradient of the entropy in the statistical bundle, 113

gradH(Q) =−(logQ+H(Q)). (12)

Notice that the previous computation could have been done using the exponential familyQ(t) = 114

eP(tV). See in [8] the computation of the gradient flow. 115

In the next section, we extend the computation illustrated in the example above to scalar fields on 116

the statistical bundle. 117

4. Lagrangian function 118

(6)

L:SE(µ)3(Q,W)7→L(Q,W)∈R. At each fixed densityQ∈ E(µ), the partial mapping

120

SQE(µ)3W 7→L(Q,W) (13)

is a defined on the vector spaceSqE(µ), hence we can use the ordinary derivative, which is called in 121

this casefiber derivative, 122

d2L(Q,W)[H2] = d

dtL(Q,W+tH2)

_t₌₀

, H2∈SQE(µ) . (14)

Example 3(Running example I). If 123

L(Q,W) = 1

2hW,WiQ+κH(Q) , κ≥o, (15)

thend2L(Q,W)[H2] =hW,H2iQ. The example is suggested by the form of the classical Lagrangian 124

function in mechanics, where the first term is the kinetic energy and−κH(Q)is the potential energy. 125

As the statistical bundleSE(µ)is non-trivial, the computation of the partial derivative of the 126

Lagrangian with respect to the first variable requires some care. We want to compute the expression of 127

the total derivative in a chart of the affine atlas defined in Eq.s (1) and (1). 128

Lett7→γ(t) = (Q(t),W(t))a smooth curve in the statistical bundle. In the chart centered atP 129

we have 130

Q(t) =eU(t)−KP(U(t))_·_P₌_e

P(U(t)), W(t) =eUePP(U(t))V(t),

witht7→γP(t) = (U(t),V(t))being a smooth curve in(SPE(µ))2. Let us compute the velocity of 131

variation of the LagrangianLalong the curveγ. 132

d

dtL(γ(t)) = d

dtL(Q(t),W(t)) = d

dtL(eP(U(t)), e

UePP(U(t))V(t)) = d

dtLP(U(t),V(t)), withLP(U,V) =L(eP(U),eUePP(U)V). It follows that

133

d

dtL(Q(t),W(t)) =d1LP(U(t),V(t))[U˙(t)] +d2LP(U(t),V(t))[V˙(t)]. (16) If we writeQ=eP(U)andW =eUePP(U)V, then we have

134

d2LP(U,V)[H2] = d

dtLP(U,V+tH2)

_t₌₀

=

d

dtL(Q,W+t e

UQPH2) _t₌₀

=d2L(Q,W)[eUQPH2], (17)

whered2Lis the fiber derivative ofL. As ˙U(t) =eUPQ(t) ?

Q(t)andeUPeP(U(t))V˙(t) =

?

W(t), it follows 135

from Eq.s (16) and (17), that 136

d

dtL(Q(t),W(t)) =d1LP(U(t),V(t))[ e

UPQ(t) ?

Q(t)] +d2L(Q(t),W(t))[ ?

(7)

In the equation above the first term in the RHS does not depend onPbecause the LHS and 137

the second term of the RHS do not depend onP. Hence we define the first partial derivative of the 138

Lagrangian function to be 139

d1(Q,W)[H1] =d1LP(U,V)[eUPeP(U)H1], H1∈SQE(µ) , (18)

so that the derivative ofLalongγbecomes 140

d

dtL(Q(t),W(t)) =d1L(Q(t),W(t))[

?

Q(t)] +d2L(Q(t),W(t))[ ?

W(t)]. (19) In particular, ifW(t) =Q?(t), then

141

d

dtL(Q(t),

?

Q(t)) =d1L(Q(t), ?

Q(t))[Q?(t)] +d2L(Q(t), ?

Q(t))[Q∗∗(t)], see Eq. (5).

142

Example 4(Running example II). With the Lagrangian of Eq. (15), we have 143

LP(U,V) = 1 2

D e

UePP(U)V,eU eP(U)

P V

E

eP(U)

−κEeP(U)[U−KP(U) +logP] =

1 2d

2_K

P(U)[V,V] +κ(KP(U)−dKP(U)[U+logP+H(P)] +H(P)) , see Eq.s (9) and (11). The first partial derivative is

144

d1LP(U,V)[H1] = 1

2d 3_K

P(U)[V,V,H1] +κ

dKP(U)[H1]−d2KP(U)[U+logP+H(P),H1]−dKP(U)[H1]

=

1 2d

3_K

P(U)[V,V,H1]−κd2KP(U)[U+logP+H(P),H1] = 1

2EQ h

W2 eUe_PP(U)H1 i

−κEQ

h

(logQ+H(Q))eUe_PP(U)H1 i

=

EQ

1 2

W2−EQ

h

W2i−κ(logQ+H(Q))

e

UePP(U)H1

,

where we have used Eq.s (9) and (10) together witheUePP(U)(U+logP+H(P)) =logQ+H(Q). 145

We have found that 146

d1L(Q,W)[H1] =

1 2

W2−EQ

h

W2i−κ(logQ+H(Q)),H1

Q

, H1∈SQE(µ) , (20)

and also 147

d1L(Q(t), ?

Q(t))[Q?(t)] = ₁

2 _?

Q(t)2−EQ

_?

Q(t)2

−κ(logQ+H(Q)),

?

Q(t)

Q .

(8)

d

dtL(Q(t),

?

Q(t)) =

1 2

_?

Q(t)2−_EQ

_?

Q(t)2

−κ(logQ+H(Q)),

?

Q(t)

Q

+DQ?(t),Q∗∗(t)E

Q .

Notice that Eq. (12) shows that one of the term in the equations above is gradH(Q). 149

5. Action integral 150

If[0, 1]3t7→Q(t)is a smooth curve in the exponential manifold, then theaction integral

A(Q) = Z 1

0 L(Q(t), ?

Q(t))dt

is well defined. We consider the expression ofQin the chart centered atP,Q(t) =eU(t)−KP(U(t))_·_P_.

151

Givenϕ ∈ C1([0, 1])with ϕ(0) = ϕ(1) = 0, for each δ ∈ RandH ∈ SPE(µ)we define the 152

perturbed curve 153

Qδ(t) =e

(U(t)+δϕ(t)H)−KP(U(t)+δϕ(t)H)_·_P_.

We haveQδ(0) =Q(0),Qδ(1) =Q(1), and

154

?

Q_δ(t) =U˙(t) +δϕ˙(t)H−EQδ(t)

(U˙(t) +δϕ˙(t))H , whose expression in the chart centered atPis ˙U(t) +δϕ˙(t)H.

155

Let us consider the variation inδof the action integral. We apply Eq. (19) applied to the smooth 156

curve inSE(µ)given by 157

δ7→(Qδ(t), ?

Qδ(t)),

wheretis fixed. As 158

d

dδlogQδ(t) =

d

dδ(U(t) +δϕ(t)H)−EQδ(t)

d

dδ(U(t) +δϕ(t)H)

= ϕ(t)(H−EQδ(t)[H])

and 159

e UQδ(t)

P d

dδ(U˙(t) +δϕ˙(t)H) =ϕ˙(t)(H−EQδ(t)[H]),

we obtain 160

d

dδA(Qδ) = Z 1

0 d

dδL(Qδ(t), ?

Qδ(t))dt= Z 1

0

ϕ(t)d1L(Qδ(t), ?

Qδ(t))[H−EQ_δ(t)[H]] +ϕ˙(t)d2L(Qδ(t), ?

Qδ(t))[H−EQ_δ(t)[H]]

dt= Z 1

0 ϕ (t)

d1L(Qδ(t), ?

Q_δ(t))[H−_E_Q

δ(t)[H]]−

d

dtd2L(Qδ(t), ?

Q_δ(t))[H−_E_Q

δ(t)[H]]

dt.

Ift7→Q(t)is an critical curve of the action integral, then _dd_δA(Qδ)

δ=0

=0, hence for allϕandH 161

we have 162

Z 1

0 ϕ(t)

d1L(Q(t), ?

Q(t))[H−EQ(t)[H]]− d

dtd2L(Q(t),

?

Q(t))[H−EQ(t)[H]]

(9)

This, in turn, implies that for eacht ∈ [0, 1]andH∈ SQ(t)E(µ)the Euler-Lagrange equation 163

holds: 164

d1L(Q(t), ?

Q(t))[H]− d

dtd2L(Q(t),

?

Q(t))[H] =0 . (22)

Example 5(Running example III). For the Lagrangian of Eq. (15) we can use Eq. (20) in the form 165

d1L(Q(t), ?

Q(t))[H−EQ(t)[H]] =

1 2

?

Q(t)2−EQ(t) h?

Q(t)2i−κ(log(Q(t)) +H(Q(t))),H−EQ(t)[H]

Q(t) ,

with H∈SPE(µ). For the other term, we have 166

d2L(Q(t), ?

Q(t))[H−EQ(t)[H]] = D ?

Q(t),H−EQ(t)[H] E

Q(t)=d 2_K

P(U(t))[U˙(t),H], whose derivative is

167

d dtd

2_K

P(U(t))[U˙(t),HR] =d3KP(U(t))[U˙(t), ˙U(t),H] +d2KP(U(t))[U¨(t),H] =

EQ(t) h?

Q(t)2(H−_E_Q₍_t₎[H])i+_E_Q₍_t₎hQ∗∗(t)(H−_E_Q₍_t₎[H])i= EQ(t)

h ?

Q(t)2−EQ(t) h ?

Q(t)2i(H−EQ(t)[H]) i

+EQ(t) h∗∗

Q(t)(H−EQ(t)[H]) i

.

Dropping the genericH, the Euler-Lagrange equation becomes 168

∗∗

Q(t) +Q?(t)2−_E_Q₍_t₎hQ?(t)2i= 1 2

?

Q(t)2−_E_Q₍_t₎hQ?(t)2i−κ(log(Q(t)) +H(Q(t))), that is

169

∗∗

Q(t) +1 2

?

Q(t)2−EQ(t) h ?

Q(t)2i=−κ(log(Q(t)) +H(Q(t))),

The equation above has been derived using the exponential affine geometry of the statistical 170

bundle and involvesQ∗∗(t). However, by using Eq.s (5), (6), and (12), we find the equivalent form 171

0_D2_Q₍_t_{) =}

κgradH(Q(t)) . 6. Discussion

172

We have shown that the research program consisting in applying to Statistics concepts taken from 173

Classical Mechanics makes sense, even if no practical application has been produced in this paper. 174

Some simple examples have been discussed in order to show clearly that the language from classical 175

mechanics is indeed suggestive when applied to typical concepts in Statistics such as Fisher score and 176

statistical entropy. The derivation of the Euler-Lagrange equations is classically done in the set-up of 177

the Riemannian geometry, while here we have used the affine structure of Information Geometry. The 178

present provisional results prompt to a generalization to non-finite sample spaces and the development 179

(10)

Acknowledgments: The Author gratefully thanks Hiroshi Matsuzoe (Nagoya Institute of Technology, JP),

181

Lamberto Rondoni (Politecnico di Torino, IT), Antonio Scarfone (CNR and Politecnico di Torino, IT), Tatsuaki

182

Wada (Ibaraki University, JP), for their interesting comments on early versions of this piece of research. He thanks

183

two anonymous referees for their useful and enlightening comments. He acknowledges the support of de Castro

184

Statistics, Collegio Carlo Alberto, and of GNAMPA-INdAM.

185

Conflicts of Interest:The author declare no conflict of interest.

186

References 187

1. Arnold, V.I.Mathematical methods of classical mechanics; Vol. 60,Graduate Texts in Mathematics, Springer-Verlag,

188

New York, 1989; pp. xvi+516. Translated from the 1974 Russian original by K. Vogtmann and A. Weinstein,

189

Corrected reprint of the second (1989) edition.

190

2. Abraham, R.; Marsden, J.E.Foundations of mechanics; Benjamin/Cummings Publishing Co., Inc., Advanced

191

Book Program, Reading, Mass., 1978; pp. xxii+m–xvi+806. Second edition, revised and enlarged, With the

192

assistance of Tudor Ra¸tiu and Richard Cushman.

193

3. Marsden, J.E.; Ratiu, T.S. Introduction to mechanics and symmetry, second ed.; Vol. 17,Texts in Applied 194

Mathematics, Springer-Verlag, New York, 1999; pp. xviii+582. A basic exposition of classical mechanical

195

systems.

196

4. Amari, S.; Nagaoka, H.Methods of information geometry; American Mathematical Society, 2000; pp. x+206.

197

Translated from the 1993 Japanese original by Daishi Harada.

198

5. Leok, M.; Zhang, J. Connecting Information Geometry and Geometric Mechanics. Entropy2017,19, 518.

199

6. Pistone, G. Nonparametric information geometry. InGeometric science of information; Nielsen, F.; Barbaresco,

200

F., Eds.; Springer, Heidelberg, 2013; Vol. 8085,Lecture Notes in Comput. Sci., pp. 5–36. First International

201

Conference, GSI 2013 Paris, France, August 28-30, 2013 Proceedings.

202

7. Lang, S. Differential and Riemannian manifolds, third ed.; Vol. 160, Graduate Texts in Mathematics,

203

Springer-Verlag, 1995; pp. xiv+364.

204

8. Pistone, G. Examples of the application of nonparametric information geometry to statistical physics.

205

Entropy2013,15, 4042–4065.

206

9. Lods, B.; Pistone, G. Information Geometry Formalism for the Spatially Homogeneous Boltzmann Equation.

207

Entropy2015,17, 4323–4363.

208

10. Pistone, G.; Rogantin, M. The exponential statistical manifold: mean parameters, orthogonality and space

209

transformations. Bernoulli1999,5, 721–760.

210

11. Efron, B.; Hastie, T. Computer age statistical inference; Vol. 5, Institute of Mathematical Statistics (IMS) 211

Monographs, Cambridge University Press, New York, 2016; pp. xix+475. Algorithms, evidence, and data

212

science.

213

12. Gibilisco, P.; Pistone, G. Connections on non-parametric statistical manifolds by Orlicz space geometry.

214

IDAQP1998,1, 325–347.

215

13. Landau, L.D.; Lifshits, E.M. Course of Theoretical Physics. Statistical Physics., 3rd ed.; Vol. V,

216

Butterworth-Heinemann, 1980.

217

14. Shima, H. The geometry of Hessian structures; World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ,

218

2007; pp. xiv+246.