• No results found

Learning Theory for Vector-Valued Distribution Regression

N/A
N/A
Protected

Academic year: 2021

Share "Learning Theory for Vector-Valued Distribution Regression"

Copied!
63
0
0

Loading.... (view fulltext now)

Full text

(1)

Learning Theory for Vector-Valued Distribution

Regression

Zolt´an Szab´o (Gatsby Unit, UCL)

Joint work with

◦ Bharath K. Sriperumbudur (Department of Statistics, PSU), ◦ Barnab´as P´oczos (ML Department, CMU),

◦ Arthur Gretton (Gatsby Unit, UCL)

CMStatistics, London, UK December 12, 2015

(2)

Outline

Motivation: application + theory. Problem formulation.

Results: computational & statistical tradeoffs. Numerical examples.

(3)

The task

Samples: {(xi, yi)}ℓi=1. Find f ∈ H such that f (xi) ≈ yi.

Distribution regression:

xi-s are distributions,

available only through samples: {xi,n}Nn=1i , labelled bags.

(4)

The task

Samples: {(xi, yi)}ℓi=1. Find f ∈ H such that f (xi) ≈ yi.

Distribution regression:

xi-s are distributions,

(5)

Motivation (application): aerosol prediction

Bag := pixels of a multispectral satellite image over an area. Label of a bag := aerosol value.

Relevance: climate research, sustainability.

Engineered methods [Wang et al., 2012]: 100 × RMSE = 7.5 − 8.5. Using distribution regression?

(6)

Wider context

Context:

machine learning: multi-instance learning,

statistics: point estimation tasks (without analytical formula).

Applications:

(7)

Several algorithmic approaches

1 Parametric fit: Gaussian, MOG, exp. family

[Jebara et al., 2004, Wang et al., 2009, Nielsen and Nock, 2012].

2 Kernelized Gaussian measures:

[Jebara et al., 2004, Zhou and Chellappa, 2006].

3 (Positive definite) kernels:

[Cuturi et al., 2005, Martins et al., 2009, Hein and Bousquet, 2005].

4 Divergence measures (KL, R´enyi, Tsallis, . . . ):

[P´oczos et al., 2011, Kandasamy et al., 2015].

5 Set metrics: Hausdorff metric [Edgar, 1995]; variants

[Wang and Zucker, 2000, Wu et al., 2010, Zhang and Zhou, 2009, Chen and Wu, 2012].

(8)

Motivation: theory

MIL dates back to [Haussler, 1999, G¨artner et al., 2002].

Sensible methods in regression: require density estimation [P´oczos et al., 2013, Oliva et al., 2014, Reddi and P´oczos, 2014, Sutherland et al., 2015] + assumptions:

(9)

Input-output requirements

Input: distributions on ’structured’ D domains (kernels).

(10)

Input-output requirements

Input: distributions on ’structured’ D domains (kernels). Output:

(11)

Input-output requirements

Input: distributions on ’structured’ D domains (kernels). Output:

simplest case: Y = R, but

dependenciesmight matter: Y = Rd (or separable Hilbert).

(12)

Kernel, RKHS

k : D × D → R kernel on D, if

∃ϕ : D → H(ilbert space) feature map, k(a, b) = hϕ(a), ϕ(b)iH (∀a, b ∈ D).

(13)

Kernel, RKHS

k : D × D → R kernel on D, if

∃ϕ : D → H(ilbert space) feature map, k(a, b) = hϕ(a), ϕ(b)iH (∀a, b ∈ D).

Kernel examples: D = Rd (p > 0, θ > 0)

k(a, b) = (ha, bi + θ)p: polynomial, k(a, b) = e−ka−bk22/(2θ

2)

: Gaussian, k(a, b) = e−θka−bk1: Laplacian.

In the H = H(k) RKHS (∃!): ϕ(u) = k(·, u).

(14)

RKHS: evaluation point of view

Let H ⊂ RD be a Hilbert space.

Consider for fixed x ∈ D the δx : f ∈ H 7→ f (x) ∈ R map.

The evaluation functional is linear:

(15)

RKHS: evaluation point of view

Let H ⊂ RD be a Hilbert space.

Consider for fixed x ∈ D the δx : f ∈ H 7→ f (x) ∈ R map.

The evaluation functional is linear:

δx(αf + βg ) = αδx(f ) + βδx(g ).

Def.: H is called RKHS if δx is continuous for ∀x ∈ D.

(16)

RKHS: reproducing point of view

Let H ⊂ RD be a Hilbert space.

k : D × D → is called a reproducing kernel of H if for ∀x ∈ D, f ∈ H

1 k(·, x) ∈ H, 2 hf , k(·, x)i

(17)

RKHS: reproducing point of view

Let H ⊂ RD be a Hilbert space.

k : D × D → is called a reproducing kernel of H if for ∀x ∈ D, f ∈ H 1 k(·, x) ∈ H, 2 hf , k(·, x)i H = f (x) (reproducing property). Specifically, ∀x, y ∈ D k(x, y ) = hk(·, x), k(·, y)iH.

(18)

RKHS: positive-definite point of view

Let us given a k : D × D → R symmetric function. k is called positive definite if ∀n ≥ 1, ∀(a1, . . . , an) ∈ Rn,

(x1, . . . , xn) ∈ Dn n X i ,j=1 aiajk(xi, xj) = aTGa≥ 0, where G = [k(xi, xj)]ni ,j=1.

(19)

Kernel: example domains (D)

Euclidean space (D = Rd), graphs, texts, time series,

dynamical systems, distributions!

(20)

Problem formulation (Y = R)

Given:

labelled bags ˆz = {(ˆxi, yi)}ℓi=1,

ith bag: ˆx

i= {xi,1, . . . , xi,N} i.i .d.

∼ xi ∈ P (D), yi ∈ R.

(21)

Problem formulation (Y = R)

Given:

labelled bags ˆz = {(ˆxi, yi)}ℓi=1,

ith bag: ˆx

i= {xi,1, . . . , xi,N} i.i .d.

∼ xi ∈ P (D), yi ∈ R.

Task: find a P (D) → R mapping based on ˆz.

Construction: mean embedding (µx)

P(D)−−−−→µ=µ(k) X ⊆ H

| {z }

2 =two-stage sampling

= H(k)

(22)

Problem formulation (Y = R)

Given:

labelled bags ˆz = {(ˆxi, yi)}ℓi=1,

ith bag: ˆx

i= {xi,1, . . . , xi,N} i.i .d.

∼ xi ∈ P (D), yi ∈ R.

Task: find a P (D) → R mapping based on ˆz.

Construction: mean embedding (µx) +ridge regression

P(D)−−−−→µ=µ(k) X ⊆ H | {z } 2 =two-stage sampling = H(k)−−−−−−−→f∈H=H(K ) R | {z } 1 =Hilbert → R regression .

(23)

1

: Hilbert → R regression, well-specified case

Regression function, expected risk (assume for a moment: fρ∈ H):

fρ(µx) =

Z

Ry dρ(y |µ

x), R [f ] = E(x,y )|f (µx) − y|2.

(24)

1

: Hilbert → R regression, well-specified case

Regression function, expected risk (assume for a moment: fρ∈ H):

fρ(µx) = Z Ry dρ(y |µ x), R [f ] = E(x,y )|f (µx) − y|2. Ridge estimator: fzλ = arg min f∈H 1 ℓ ℓ X i=1 |f (µxi) − yi| 2 + λ kf k2H, (λ > 0).

(25)

1

: Hilbert → R regression, well-specified case

Regression function, expected risk (assume for a moment: fρ∈ H):

fρ(µx) = Z Ry dρ(y |µ x), R [f ] = E(x,y )|f (µx) − y|2. Ridge estimator: fzλ = arg min f∈H 1 ℓ ℓ X i=1 |f (µxi) − yi| 2 + λ kf k2H, (λ > 0). Excess risk: E(fzλ, fρ) = R[fzλ] − R[fρ].

(26)

1

: Hilbert → R regression

Known [Caponnetto and De Vito, 2007]: if ρ(µx, y ) ∈ P(b, c),

then the best/achieved rate E(fzλ, fρ) = Op

 ℓ−bc+1bc



(27)

1

: Hilbert → R regression

Known [Caponnetto and De Vito, 2007]: if ρ(µx, y ) ∈ P(b, c),

then the best/achieved rate E(fzλ, fρ) = Op  ℓ−bc+1bc  (1 < b, c ∈ (1, 2]). ρ ∈ P(b, c): T = Z X K (·, µ a)K∗(·, µa)dρX(µa) : H → H.

Eigenvalues of T decay as λn= O(n−b). fρ∈ Im

 Tc−12

 . Intuition: 1/b – effective input dimension, c – smoothness of fρ.

(28)

Our question

Can we reach this Op

 ℓ−bc+1bc  minimax rate? N =? fλ ˆ z = arg min f∈H 1 ℓ ℓ X i=1 |f (µˆxi) − yi| 2 + λ kf k2H, fzλ= arg min f∈H 1 ℓ ℓ X i=1 |f (µxi) − yi| 2 + λ kf k2H.

(29)

2

: mean embedding, µ

xi

→ µ

ˆxi

k : D × D → R kernel; canonical feature map: ϕ(u) = k(·, u).

Mean embedding of a distribution x, ˆxi ∈ P(D):

µx = Z Dk(·, u)dx(u) ∈ H(k), µxˆi = Z Dk(·, u)dˆx i(u) = 1 N N X n=1 k(·, xi ,n).

(30)

2

: mean embedding, µ

xi

→ µ

ˆxi

k : D × D → R kernel; canonical feature map: ϕ(u) = k(·, u).

Mean embedding of a distribution x, ˆxi ∈ P(D):

µx = Z Dk(·, u)dx(u) ∈ H(k), µxˆi = Z Dk(·, u)dˆx i(u) = 1 N N X n=1 k(·, xi ,n).

Linear K ⇒ set kernel:

K (µ , µ ) =µ , µ = 1

N

X

(31)

2

: mean embedding, µ

xi

→ µ

ˆxi

k : D × D → R kernel; canonical feature map: ϕ(u) = k(·, u).

Mean embedding of a distribution x, ˆxi ∈ P(D):

µx = Z Dk(·, u)dx(u) ∈ H(k), µxˆi = Z Dk(·, u)dˆx i(u) = 1 N N X n=1 k(·, xi ,n). Nonlinear K example: K (µxˆi, µˆxj) = e −kµˆxi−µˆxjk2H 2σ2 .

(32)

2

: ridge regression ⇒ analytical solution

Given: training sample: ˆz, test distribution: t. Prediction on t: (fˆzλ◦ µ)(t) = k(K + ℓλIℓ)−1[y1; . . . ; yℓ], K = [K (µˆxi, µˆxj)] ∈ R ℓ×ℓ, k= [K (µˆx1, µt), . . . , K (µˆxℓ, µt)] ∈ R 1×ℓ. (1) (2) (3) ⇒ K (µ , µ ) =K (·, µ ), K (·, µ′) matter.

(33)

1

-

2

: Why can we get consistency? – intuition

Convergence of the mean embedding: kµx− µˆxkH(k)= Op  1 √ N  . H¨older property of K (0 < L, 0 < h ≤ 1): kK (·, µx)−K (·, µˆx)kH(K )≤ L kµx − µˆxkhH(k). fˆzλ depends ’nicely’ on µˆx.

(34)

1

-

2

: Proof idea

By decomposing the excess risk, concentration, on P(b, c) we get E(fˆzλ, fρ) ≤ logh(ℓ) Nhλ  1 λ22 + 1 + 1 ℓλ1+1b  | {z } 2 =two-stage sampling + λc+ 1 ℓ2λ+ 1 ℓλ1b | {z } 1 = H → R regression → 0, s.t. ℓλb+1b ≥ 1,log(ℓ) λ2h ≤ N.

(35)

1

-

2

: Proof idea

By decomposing the excess risk, concentration, on P(b, c) we get E(fˆzλ, fρ) ≤ logh(ℓ) Nhλ  1 λ22 + 1 + 1 ℓλ1+1b  | {z } 2 =two-stage sampling + λc+ 1 ℓ2λ+ 1 ℓλ1b | {z } 1 = H → R regression → 0, s.t. ℓλb+1b ≥ 1,log(ℓ) λ2h ≤ N.

Let N = ℓahlog(ℓ) ⇒ 1st term + constraints simplify.

a > 0: needed, i.e. N > log(ℓ).

Bias-variance trick with constraint-checking ⇒

(36)

Computational & statistical tradeoff (W)

If a ≤ b(c + 1)bc + 1 , then Efˆzλ, fρ  = Op  ℓ−c+1ac  with λ = ℓ−c+1a , a ≥ b(c + 1) bc + 1 then E  fˆzλ, fρ  = Op  ℓ−bc+1bc  with λ = ℓ−bc+1b .

(37)

Computational & statistical tradeoff (W)

If a ≤ b(c + 1)bc + 1 , then Efˆzλ, fρ  = Op  ℓ−c+1ac  with λ = ℓ−c+1a , a ≥ b(c + 1) bc + 1 then E  fˆzλ, fρ  = Op  ℓ−bc+1bc  with λ = ℓ−bc+1b .

Meaning (a-dependence, N = ℓhalog(ℓ)):

’Smaller a’ = computational saving, but reduced statistical efficiency.

(38)

Computational & statistical tradeoff (W)

If a ≤ b(c + 1)bc + 1 , then Efˆzλ, fρ  = Op  ℓ−c+1ac  with λ = ℓ−c+1a , a ≥ b(c + 1) bc + 1 then E  fˆzλ, fρ  = Op  ℓ−bc+1bc  with λ = ℓ−bc+1b .

Meaning (a-dependence, N = ℓhalog(ℓ)):

’Smaller a’ = computational saving, but reduced statistical efficiency.

(39)

Computational & statistical tradeoff (W)

If a ≤ b(c + 1)bc + 1 , then Efˆzλ, fρ  = Op  ℓ−c+1ac  with λ = ℓ−c+1a , a ≥ b(c + 1)bc + 1 then Efˆzλ, fρ  = Op  ℓ−bc+1bc  with λ = ℓ−bc+1b .

Meaning (h-dependence, N = ℓah log(ℓ)):

smoother K kernel is rewarding = bag-size reduction; see smoothness of fρ.

(40)

Computational & statistical tradeoff (W)

If a ≤ b(c + 1) bc + 1 , then E  fˆzλ, fρ  = Op  ℓ−c+1ac  with λ = ℓ−c+1a , a ≥ b(c + 1)bc + 1 then Efˆzλ, fρ  = Op  ℓ−bc+1bc  with λ = ℓ−bc+1b . Meaning (c-dependence):

c 7→ b(c + 1)bc + 1 decreasing: smaller bags are enough for easier problems.

(41)

Misspecified setting

Relevant case: fρ∈ L2ρX\H.

fρ: difficulty parameter = s ∈ (0, 1], larger s = easier problem.

Proof idea:

∞-D exponential family fitting [Sriperumbudur et al., 2014], ridge solution.

(42)

Computational & statistical tradeoff (M)

Let N = ℓ2ah log(ℓ) (a > 0). If a ≤ s + 1s + 2, then Efˆzλ, fρ  = Op  ℓ−s+12sa  with λ = ℓ−s+1a , a ≥ s + 1s + 2, then Efˆzλ, fρ  = Op  ℓ−s+22s  with λ = ℓ−s+21 .

(43)

Computational & statistical tradeoff (M)

Let N = ℓ2ah log(ℓ) (a > 0). If a ≤ s + 1s + 2, then Efˆzλ, fρ  = Op  ℓ−s+12sa  with λ = ℓ−s+1a , a ≥ s + 1s + 2, then Efˆzλ, fρ  = Op  ℓ−s+22s  with λ = ℓ−s+21 . Meaning (a-dependence):

’Smaller a’ = computational saving, but reduced statistical efficiency.

(44)

Computational & statistical tradeoff (M)

Let N = ℓ2ah log(ℓ) (a > 0). If a ≤ s + 1s + 2, then Efˆzλ, fρ  = Op  ℓ−s+12sa  with λ = ℓ−s+1a , a ≥ s + 1s + 2, then Efˆzλ, fρ  = Op  ℓ−s+22s  with λ = ℓ−s+21 . Meaning (a-dependence):

’Smaller a’ = computational saving, but reduced statistical efficiency.

(45)

Computational & statistical tradeoff (M)

Let N = ℓ2ah log(ℓ) (a > 0). If a ≤ s + 1 s + 2, then E  fˆzλ, fρ  = Op  ℓ−s+12sa  with λ = ℓ−s+1a , a ≥ s + 1s + 2, then Efˆzλ, fρ  = Op  ℓ−s+22s  with λ = ℓ−s+21 . Meaning (h-dependence): h 7→ 2a

h is decreasing: smoother K kernel is rewarding.

(46)

Computational & statistical tradeoff (M)

Let N = ℓ2ah log(ℓ) (a > 0). If a ≤ s + 1s + 2, then Efˆzλ, fρ  = Op  ℓ−s+12sa  with λ = ℓ−s+1a , a ≥ s + 1s + 2, then Efˆzλ, fρ  = Op  ℓ−s+22s  with λ = ℓ−s+21 .

Meaning (s-dependence): s 7→ s + 22s is increasing, i.e easier task

= better rate,

s → 0: arbitrary slow rate.

(47)

Optimality of the rate (M)

Our rate: r (ℓ) = ℓ−s+22s – range space assumption (s).

One-stage sampled optimal rate: ro(ℓ) = ℓ−

2s

2s+1 [Steinwart et al., 2009],

range-space assumption + eigendecay constraint, D: compact metric, Y = R. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 Smoothness (s) Rate −log l[ro(l)]=2s/(2s+1) −logl[r(l)]=2s/(s+2)

(48)

Blanket assumptions: both settings

D: separable, topological domain.

k:

bounded: supu∈Dk(u, u) ≤ Bk ∈ (0, ∞),

continuous.

K : bounded; H¨older continuous: ∃L > 0, h ∈ (0, 1] such that kK (·, µa) − K (·, µb)kH≤ L kµa− µbk

h H.

(49)

H¨older K examples (other than the linear K when h=1)

In case of compact metric D, universal k:

KG Ke KC e−kµa−µbk 2 H 2θ2 e− kµa−µbkH 2θ2  1 + kµa− µbk2H/θ2 −1 h = 1 h = 12 h = 1 Kt Ki  1 + kµa− µbkθH −1  kµa− µbk2H+ θ2 −12 h = θ2 (θ ≤ 2) h = 1

Functions of kµa− µbkH ⇒ computation: similar to set kernel.

(50)

Vector-valued output: similarly

K (µa, µb) ∈ L(Y ).

Prediction on a new test distribution (t): (fˆzλ◦ µ)(t) = k(K + ℓλIℓ)−1[y1; . . . ; yℓ], K= [K (µˆxi, µxˆj)] ∈ L(Y ) ℓ×ℓ, k= [K (µˆx1, µt), . . . , K (µˆxℓ, µt)] ∈ L(Y ) 1×ℓ. (4) (5) (6)

(51)

Demo

Supervised entropy learning:

0 1 2 3 −2 −1 0 1 2 RMSE: MERR=0.75, DFDR=2.02 rotation angle (β) entropy true MERR DFDR MERR DFDR 1 2 3 4 RMSE

(52)

Demo

Supervised entropy learning:

0 1 2 3 −2 −1 0 1 2 RMSE: MERR=0.75, DFDR=2.02 rotation angle (β) entropy true MERR DFDR MERR DFDR 1 2 3 4 RMSE

(53)

Summary

Problem: distribution regression (k). Contribution:

computational & statistical tradeoff analysis,

specifically, the set kernel is consistent: 16-year-old open question, minimax optimal rate is achievable: sub-quadratic bag size.

(54)

Summary

Problem: distribution regression (k). Contribution:

computational & statistical tradeoff analysis,

specifically, the set kernel is consistent: 16-year-old open question, minimax optimal rate is achievable: sub-quadratic bag size.

Code in ITE, analysis submitted to JMLR:

https://bitbucket.org/szzoli/ite/ http://arxiv.org/abs/1411.2066.

(55)

Thank you for the attention!

Acknowledgments: This work was supported by the Gatsby Charitable Foundation, and by NSF grants IIS1247658 and IIS1250350. A part of the work was carried out while Bharath K. Sriperumbudur was a research fellow in the Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge, UK.

(56)

Caponnetto, A. and De Vito, E. (2007).

Optimal rates for regularized least-squares algorithm. Foundations of Computational Mathematics, 7:331–368. Chen, Y. and Wu, O. (2012).

Contextual Hausdorff dissimilarity for multi-instance clustering. In International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pages 870–873.

Cuturi, M., Fukumizu, K., and Vert, J.-P. (2005). Semigroup kernels on measures.

Journal of Machine Learning Research, 6:11691198. Edgar, G. (1995).

(57)

Multi-instance kernels.

In International Conference on Machine Learning (ICML), pages 179–186.

Haussler, D. (1999).

Convolution kernels on discrete structures.

Technical report, Department of Computer Science, University of California at Santa Cruz.

(http://cbse.soe.ucsc.edu/sites/default/files/ convolutions.pdf).

Hein, M. and Bousquet, O. (2005).

Hilbertian metrics and positive definite kernels on probability measures.

In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 136–143.

Jebara, T., Kondor, R., and Howard, A. (2004). Probability product kernels.

Journal of Machine Learning Research, 5:819–844.

(58)

Kandasamy, K., Krishnamurthy, A., P´oczos, B., Wasserman, L., and Robins, J. M. (2015).

Influence functions for machine learning: Nonparametric estimators for entropies, divergences and mutual informations. Technical report, Carnegie Mellon University.

http://arxiv.org/abs/1411.4342.

Martins, A. F. T., Smith, N. A., Xing, E. P., Aguiar, P. M. Q., and Figueiredo, M. A. T. (2009).

Nonextensive information theoretical kernels on measures. Journal of Machine Learning Research, 10:935–975. Nielsen, F. and Nock, R. (2012).

A closed-form expression for the Sharma-Mittal entropy of exponential families.

(59)

Fast function to function regression.

In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 717–725.

Oliva, J., P´oczos, B., and Schneider, J. (2013). Distribution to distribution regression.

International Conference on Machine Learning (ICML; JMLR W&CP), 28:1049–1057.

Oliva, J. B., Neiswanger, W., P´oczos, B., Schneider, J., and Xing, E. (2014).

Fast distribution to real regression.

International Conference on Artificial Intelligence and Statistics (AISTATS; JMLR W&CP), 33:706–714.

P´oczos, B., Rinaldo, A., Singh, A., and Wasserman, L. (2013). Distribution-free distribution regression.

International Conference on Artificial Intelligence and Statistics (AISTATS; JMLR W&CP), 31:507–515. P´oczos, B., Xiong, L., and Schneider, J. (2011).

(60)

Nonparametric divergence estimation with applications to machine learning on distributions.

In Uncertainty in Artificial Intelligence (UAI), pages 599–608. Reddi, S. J. and P´oczos, B. (2014).

k-NN regression on functional data with incomplete observations.

In Conference on Uncertainty in Artificial Intelligence (UAI), pages 692–701.

Sriperumbudur, B. K., Fukumizu, K., Kumar, R., Gretton, A.,

and Hyv¨arinen, A. (2014).

Density estimation in infinite dimensional exponential families. Technical report.

(61)

Sutherland, D. J., Oliva, J. B., P´oczos, B., and Schneider, J. (2015).

Linear-time learning on distributions with approximate kernel embeddings.

Technical report, Carnegie Mellon University. http://arxiv.org/abs/1509.07553.

Wang, F., Syeda-Mahmood, T., Vemuri, B. C., Beymer, D., and Rangarajan, A. (2009).

Closed-form Jensen-R´enyi divergence for mixture of Gaussians and applications to group-wise shape registration.

Medical Image Computing and Computer-Assisted Intervention, 12:648–655.

Wang, J. and Zucker, J.-D. (2000).

Solving the multiple-instance problem: A lazy learning approach.

In International Conference on Machine Learning (ICML), pages 1119–1126.

(62)

Wang, Z., Lan, L., and Vucetic, S. (2012).

Mixture model for multiple instance regression and applications in remote sensing.

IEEE Transactions on Geoscience and Remote Sensing, 50:2226–2237.

Wu, O., Gao, J., Hu, W., Li, B., and Zhu, M. (2010). Identifying multi-instance outliers.

In SIAM International Conference on Data Mining (SDM), pages 430–441.

Zhang, M.-L. and Zhou, Z.-H. (2009).

Multi-instance clustering with applications to multi-instance prediction.

(63)

IEEE Transactions on Pattern Analysis and Machine Intelligence, 28:917–929.

References

Related documents

2 2 In reaching its decision, the appellate court proved willing to make the distinction which the district court would not; namely, that although certain actions

In addition, it is also important to men- tion that although Islam-based parties tend to be “the vote-seek- ers” in the electoral arena, the fact demonstrated that Islam-based

Click Save changes The Appointment Schedule Settings window closes, and the default view is updated with your column

Write three sentences using a SINGULAR indefinite pronoun...

As per your request, we have prepared the attached Preliminary Budget Estimate for the subject project as related to our scope of work.. We have attached five

Based on a primary survey of both urban and rural handloom weaver clusters in Ethiopia, one of the country’s most important rural nonfarm sectors, this paper examines the

(i) either on his own behalf or on behalf of any other person, deal in securities of a company listed on any stock exchange when in possession of any unpublished price

OAR 345-021-0010(1)(a)(H) If the applicant is a limited liability company, it shall give: (i) The full name, official designation, mailing address, email address and telephone