Learning graph convolutional network for blind mesh visual quality assessment

(1)

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identifier 10.1109/ACCESS.2017.DOI

Learning graph convolutional network

for blind mesh visual quality assessment

ILYASS ABOUELAZIZ1, ALADINE CHETOUANI2, MOHAMMED EL HASSOUNI3, HOCINE

CHERIFI4_{, LONGIN JAN LATECKI}5_,

1_{L@bisen, Yncrea Ouest, 20 rue Cuirassé Bretagne, 29200, Brest, France (e-mail: [email protected] )} 2_{Laboratoire PRISME, Université d’Orléans, Orleans, France (e-mail: [email protected] )}

3_{FLSH, Mohammed V University in Rabat, Rabat, Morocco(e-mail: [email protected] )} 4_{LIB EA 7534, University of Burgundy, Dijon, France (e-mail: [email protected] )}

5_{Department of Computer and Information Sciences, Temple University, Philadelphia, USA (e-mail: [email protected] )}

Corresponding author: Ilyass Abouelaziz (e-mail: [email protected]).

ABSTRACT

This paper proposes a new method for blind mesh visual quality assessment (MVQA) based on a graph convolutional network. For that, we address the node classification problem to predict the perceived visual quality. First, two matrices represent the 3D mesh: a graph adjacency matrix and a feature matrix. Both matrices are used as input to a shallow graph convolutional network. The network consists of two convolutional layers followed by a max-pooling layer to provide the final feature representation. With this structure, the Softmax classifier predicts the quality score category without the reference mesh’s availability. Experiments are conducted on four publicly available databases constructed explicitly for the mesh quality assessment task. We investigate several perceptual and visual features to select the most effective combination. Comparisons with the state-of-the-art alternative methods show the effectiveness of the proposed framework.

INDEX TERMS Mesh visual quality assessment, graph convolutional networks, mesh graph representa-tion, geometric attributes.

I. INTRODUCTION

1

R

ECENTLY, the issue of perceptual 3D MVQA has 2

become an essential field of study since 3D models 3

are widely used in a diversity of applications. A 3D object 4

can undergo several geometric transformations resulting in 5

various distortion types that affect the object’s perceived 6

quality. The perceived visual quality can be estimated by 7

human observers (subjective visual quality assessment) or 8

by implementing automatic metrics that try to mimic an 9

ideal human observer (objective visual quality assessment) 10

[20]–[23]. One can classify objective methods according to 11

the availability of the reference object. In Full-Reference 12

methods (FR), the original mesh is known [3]–[7]. In No-13

Reference (NR) or Blind methods, no information about 14

the reference is available [8]–[12]. Finally, in Reduced-15

Reference (RR) methods only part of the reference is avail-16

able (i.e., features extracted from the reference) [13]–[17]. 17

Most quality evaluation methods rely on the quality pre-18

diction technical tool and the feature type such as geometric 19

(angles, curvature), topological (mesh connection), spatial 20

(vertices and edges), frequency (Wavelet Transform [66], 21

Discrete cosine transform [67]), spectral (Laplace-Beltrami 22

operator [68]), etc. 23

One can distinguish two main categories of methods for 24

mesh quality assessment. The first category is based on 25

logistic regression to estimate the quality scores. A plethora 26

of such methods have been proposed in the literature and 27

have shown good performances [18]–[20], [25], [27]–[30]. 28

The second category relies on machine learning to predict 29

quality [31], [32]. More precisely, quality score estimation 30

is tackled into a classification context, or eventually, re-31

gression or regression by classification. In a previous work, 32

we have been the first to investigate Convolutional Neural 33

Network (CNN) as one of the most recent and relevant deep 34

learning tools to estimate the perceived visual quality of 3D 35

meshes [33]. 36

Intuitively, to handle simply CNN in our context, 2D im-37

ages are rendered from multiple views of the 3D mesh. Then, 38

each image is split into small patches, which are learned 39

(2)

[24], 3D visual saliency is adopted in [62] to select the most 41

relevant regions with high saliency level before using patches 42

from these regions to feed the CNN. This method shows good 43

performances and successfully predicts the perceived visual 44

quality. However, the perceptual mechanisms applied to the 45

images generated from the 3D data make the evaluation view-46

dependent. As demonstrated by Rogowitz and Rushmeier 47

[35], the main problem of the image-based metrics is that, 48

in general, the perceived degradation of still images may 49

not be adequate to evaluate the perceived degradation of the 50

equivalent 3D model. Recently, a new blind method using 51

a set of spatial and spectral features has been proposed 52

by Lin et al. [37]. Inspired by Graph Signal Processing 53

(GSP) theory, features are extracted from the Graph Fourier 54

Transform (GFT) derived from the adjacency matrix, and 55

combined with the spatial features. The quality score predic-56

tion is estimated by random forest regression. The method 57

achieves good performances and competitive quality scores. 58

Unfortunately, it relies only on spatial and spectral features. 59

It does not incorporate perceptual information, which is an 60

essential feature to assess the mesh visual quality. However, 61

several transforms can be adopted instead of GFT, or to get 62

straight to the point, graph-based deep learning solutions can 63

be a new lead in 3D MVQA. 64

In this context, we propose here a model-based method 65

using a shallow Graph Convolutional Network (GCN) to 66

process the 3D model itself directly. GCN has been mainly 67

applied for node classification tasks in which the convolution 68

representation vector for a node function is the only feature 69

to classify that node. For the mesh quality assessment task, 70

the 3D model is represented by a graph, and the graph-71

based convolution vector of nodes can be used to predict 72

the perceived visual quality. Although graph and hypergraph 73

representations have been widely used in image and video 74

processing [38]–[44]. To the best of our knowledge, it is the 75

first work where GCN is used to estimate the quality of 3D 76

meshes. 77

Our contributions are summarized as follows: 78

• Development of a model-based method to blindly assess 79

the perceived visual quality of a distorted mesh. 80

• Use a shallow GCN model that predicts the quality from 81

a graph representation of the 3D mesh and handcrafted 82

features. 83

• In-depth analysis of the performance contribution of 84

the considered handcrafted features, including visual 85

saliency features. 86

The remainder of this paper is organized as follows: In 87

Section II, we introduce the required important notions. We 88

give in Section III a detailed description of the proposed 89

method using the GCN. In Section IV, we present the ex-90

perimental setup. Results of a comparative evaluation are 91

presented in Section V. Finally, we conclude and give some 92

perspectives in Section. VI. 93

II. BACKGROUND

94

In this section, we provide a brief introduction to the required 95

background including the CNNs and essential notions about 96

graph theory. 97

A. GRAPH

98

• Graphs: A graph G(V, E) is defined by a set of vertices 99

V = {v1, v2, ..., vn} and a set of edges E ⊆ V × V . 100

We denote n and m the number of vertices and edges, 101

respectively. 102

• Adjacency matrix: The adjacency matrix A of size 103

n × n representing a graph is defined as Ai,j = 1 if the 104

vertices viand vjare connected by an edge (i.e. adjacent 105

vertices), otherwise Ai,j= 0. Each node and edge may 106

have attribute values which are considered as features 107

of the graph. The term attribute value is used instead 108

of label to make the distinction with the concept of 109

labeling in graph-theory. A walk is a sequence of nodes 110

in a graph, in which consecutive nodes are connected 111

by an edge. A path is a walk with distinct nodes. We 112

denote d(u; v) the distance between u and v, that is, the 113

length of the shortest path between u and v. N1(v) is 114

the 1-neighborhood of a node, that is, all nodes that are 115

adjacent to v. 116

• Graph Laplacian: The graph Laplacian operator L is 117

defined as L = D − A, where A is the adjacency matrix 118

and D is the degree matrix with Dii =PjAij. For a 119

graph ascending from a regular mesh, the graph Lapla-120

cian corresponds to the standard stencil approximation 121

of the continuous Laplacian. A normalized form of the 122

Laplacian is used and is defined as follows: 123

Lnorm_{= D}−1/2_LD−1/2_{= I − D}−1/2_AD−1/2 ₍₁₎ It is noteworthy that the eigenvectors of L and Lnorm 124

are not similar since the two matrices are different. 125

B. FROM CONVOLUTIONAL NEURAL NETWORKS TO

126

GRAPH CONVOLUTIONAL NETWORKS

127

Convolutional neural networks offer an efficient architecture 128

to extract significant statistical patterns in large-scale and 129

high-dimensional datasets. The ability of CNNs to learn 130

local stationary structures and compose them to form multi-131

scale hierarchical patterns has led to breakthroughs in image, 132

video, and sound recognition tasks [48]. Precisely, CNNs 133

extract the local stationarity property of the input data or 134

signals by revealing local features shared across the data 135

domain. These similar features are identified with localized 136

convolutional filters or kernels, which are learned from the 137

data. Convolutional filters are shift- or translation-invariant 138

filters. In other words, they are able to recognize identical 139

features independently of their spatial locations. Localized 140

kernels or compactly supported filters refer to filters that 141

extract local features independently of the input data size, 142

with a support size. The latter can be much smaller than 143

(3)

Conv oluti on 1 Convolut ion 2 Max -pooli ng Softmax Excellent Excellent Excellent Excellent Excellent Input Data

A) Learning data preparation

B) Graph convolutional network

FIGURE 1. The overall scheme of the proposed method.

many researchers. They have been successfully employed in 145

various computer vision applications allowing them to reach 146

high performances. One of their main advantages over classi-147

cal neural networks is that they adequately consider the input 148

data’s spatial structure. Moreover, CNNs allow the critical 149

property of weights sharing between the convolutional layers, 150

which restricts the number of parameters to learn. In quality 151

assessment, using CNNs has shown notable improvement in 152

correlation with human judgment. Some datasets can natu-153

rally be modeled as scalar data defined on the vertices of 154

graphs [36]. For example, computer networks, transporta-155

tion (road, rail, airplane) networks, or social networks can 156

be described by graphs, with the vertices corresponding to 157

individual computers, cities, or people, respectively. These 158

data can be structured with graphs, which are universal rep-159

resentations of heterogeneous pairwise relationships. Graphs 160

can encode complex geometric structures and be studied with 161

strong mathematical tools such as spectral graph theory [50]. 162

A generalization of CNNs to graphs is not straightforward 163

as the convolution and pooling operators are only defined 164

for regular grids. It makes the extension a challenging issue, 165

theoretically and implementation-wise. The major bottleneck 166

of generalizing CNNs to graphs is the definition of localized 167

graph filters, which are efficient to evaluate and learn. 168

III. PROPOSED METHOD

169

Fig. 1 presents the block diagram of the proposed method. 170

The first module refers to the learning data preparation. It 171

consists of converting the 3D mesh to a graph representation. 172

Then, the adjacency matrix, which indicates whether pairs 173

of nodes are connected or not, is computed. In parallel, hand-174

crafted features are extracted to represent multiple perceptual 175

aspects of the 3D mesh. Both the features extracted and the 176

adjacency matrix are used as input data for a GCN to estimate 177

the perceived quality based on deformation level. 178

A. LEARNING DATA PREPARATION

179

The first step of the proposed method relies on preparing 180

the learning data. It consists of a graph represented by an 181

adjacency matrix A of size n × n and a n × p feature matrix, 182

with p is the number of features per node. 183

1) Graph network representation and adjacency matrix

184

The proposed framework’s main idea is to use graphs to rep-185

resent the 3D model (Triangular mesh). This representation 186

allows us to take advantage of graph properties to manipulate 187

the mesh itself and conceive a model-based method to assess 188

the visual quality blindly. Triangular meshes are a set of 189

nodes connected by edges to form triangular faces. A graph is 190

a data structure consisting of vertices and edges. Each vertex 191

is linked to other vertices by an edge, so-called neighbors. 192

There are different ways to represent a graph. In this work, 193

it is characterized by an adjacency matrix as discussed in 194

Sec. II-A. Fig. 2 depicts the graph representation of a 3D 195

mesh and its corresponding adjacency matrix. 196

2) Handcrafted features

197

We use three geometric features: curvature, angles, and 198

Laplacian of Gaussian curvatureand one perceptual feature: 199

Saliency. 200

• Curvature describes the deviation of the surface from 201

being flat. In this work, we use the maximum curvature 202

amplitude Cmaxand the minimum curvature amplitude 203

Cmin. The mean curvature Cmean= (Cmax+Cmin)/2 204

(4)

1 2 4 3 5 7 6 8 10 9 1 2 3 4 5 6 7 8 910 … n 1 0 1 1 1 0 0 1 0 0 1 … 2 1 0 1 1 1 0 0 0 0 0 … 3 1 1 0 0 1 1 1 0 0 0 … 4 1 1 0 0 0 0 0 0 0 1 … 5 0 1 1 0 0 1 0 0 0 0 … 6 0 0 1 0 1 0 1 0 1 0 … 7 1 0 1 0 0 1 0 0 0 1 … 8 0 0 0 0 0 0 0 0 0 1 … 9 0 0 0 0 0 1 0 0 0 0 … 10 1 0 0 1 0 0 1 1 0 0 … 0 n 0 3D object Graph network Adjacency matrix

FIGURE 2. Graph representation and adjacency matrix from 3D object.

Cmin are also computed. In the next, all the curvature 206

features will be considered as one feature labeled by C. 207

• Laplacian of Gaussian curvature corresponds to the 208

Laplacian operator applied to the Gaussian curvature 209

field. This feature is labeled here as LoG. 210

• Angle is computed for each edge and corresponds to 211

the angle between normal vectors of its adjacent faces. 212

It represents the structural aspect of the mesh. The 213

computed angle values are averaged to obtain a scalar 214

value for each vertex. This feature is labeled here as An. 215

• Saliency is a perceptual concept that describes the atten-216

tion of the Human Visual System to some regions due 217

to their specificity’s (curvature, orientation, and so on). 218

In this work, we use the mesh visual saliency method 219

proposed in [61]. This feature is labeled here as S. 220

Fig. 3 shows the used features described above. Mean 221

and Gaussian curvature are depicted separately to give more 222

visibility about the difference between them. 223 Gaussian curvature Curvature Saliency Dihedral angle Laplacian of gaussian C Cgauss An LoG S 3D object

Feature vector of size n

Feature matrix of size f * n

n

f

FIGURE 3. Handcrafted features preparation.

B. GRAPH CONVOLUTIONAL NETWORK

224

In this section, we provide a detailed description of the GCN 225

implemented in the proposed method f (X, A), where A is an 226

adjacency matrix and X is a feature matrix. 227

A multi-layer GCN is used with the layer-wise propagation 228

rule defined as: 229 H(l+1) = σ( eD−1/2A eeD−1/2H(l)W(l)) (2) with 230 e A = A + IN (3) and 231 e Dii = X j e Aij (4)

and σ(.) is the activation function defined as ReLU(.) = 232

max(0, .). eA is the adjacency matrix of the undirected graph. 233

IN is the identity matrix. W(l) is a trainable weight matrix. 234

H(l) _{∈ R}N ×D _{is the matrix of activations in the l}th_layer; 235

H(0) _{= X. This rule is motivated by an approximation} 236

presented in [60]. 237

We note that the multiplication with A means that, for every 238

node, we sum up all the feature vectors of all neighboring 239

nodes but not the node itself (unless there are self-loops in 240

the graph). This is fixed by enforcing self-loops in the graph 241

by considering eA (adding the identity matrix to A). 242

1) Graph convolution

243

Let G = {V, E} denotes the graph representing the distorted 244

mesh. vi∈ V and (vi, vj) ∈ E are the sets of nodes and edges 245

of G, respectively. The graph convolution is the process of 246

filtering a signal x ∈ RN with a filter fθdefined as follows: 247 Y = fθ∗ x = θ(IN + D− 1 2AD− 1 2)x (5)

Using a deep model, the operator IN + D− 1

2AD−12 may

248

cause numerical instabilities. Thus, a normalization is intro-249 duced 250 IN + D− 1 2_AD−12 → e_D−12_{A e}_e_D−12 ₍₆₎

Finally, the generalized graph convolution is defined as 251

follows: 252

Y = eD−12_{A e}_e_D−12_XΘ ₍₇₎ where X ∈ RN ×Pis the feature matrix of size N ×P with N 253

the number of nodes and P the number of features for each 254

(5)

matrix of filter parameters and Y ∈ RN ×F is the convolved 256

matrix. 257

258

Convolution is a key process in a graph convolutional 259

network, allowing to highlight local receptive features. When 260

applying convolution on graphs, two important steps are 261

considered: 262

• Determine the neighboring nodes around each given 263

graph node for the convolution process to build locally 264

connected subsets of the global graph. These subsets can 265

be obtained using search strategies. We use the breadth-266

first search to enlarge the neighborhood nodes on the 267

graph. The breadth-first search crosses an entire level of 268

child nodes first, before passing through the grandchil-269

dren nodes. In addition, it goes through a whole level 270

of grandchildren nodes before passing through great 271

grandchildren nodes and so on ... 272

Fig. 4(a) shows the graph representing the 3D object and 273

Fig. 4(b) shows the neighborhood sets of each source 274

node. 275

• Arrange the order of execution of the convolution pro-276

cess on each set. Indeed, it is crucial to reasonably sort 277

the nodes in the subsets to ensure that the same rules 278

are used to convolve the elements thus convolution can 279

better activate features for each node. To this end, the 280

L2 similarity method is used. To do so, we use nodes 281

features to compute the similarity between them. The 282

neighbor node with the higher level of similarity is ex-283

ecuted first. For example, in the convolution order pro-284

vided in Fig. 4(c), node 3 is the most similar to 6, then, 285

we find node 9, and finally node 1 is the least similar. We 286

can denote this by: sim(3, 6) > sim(9, 6) > sim(9, 1). 287

Thus, the order of execution in the first subset is 3-9-1. 288

This process is repeated for each subset in the graph. 289

Fig. 4(c) shows the final convolution order for each 290 subset. 291 8 2 1 7 6 9 3 4 5 8 2 1 7 6 9 3 4 5 3 9 1 6 4 8 8 2 6 6 9 1

(a) 3D shape presented by a graph

(b) Subgraphs: sets of nodes neighborhood

(c) Convolution order

FIGURE 4. Convolution on graph. (a) 3D object represented by a graph; (b) subsets searched by a breadth-first search (c) Convolution order of each subset.

2) Graph coarsening and pooling

292

The pooling operation requires meaningful neighborhoods on 293

graphs, where similar vertices are clustered together. Doing 294

this for multiple layers is equivalent to a multi-scale clus-295

tering of the graph that preserves local geometric structures. 296

There exist many clustering techniques, we are interested 297

here in multilevel clustering algorithms where each level 298

produces a coarser graph. In this work, we make use of the 299

coarsening phase of the Graclus multilevel clustering algo-300

rithm [45]. It uses a greedy algorithm to compute successive 301

coarser versions of a given graph. 302

For each coarsening level Fig. 5(a), the algorithm picks and 303

matchs an unmarked vertex vi with one of its unmarked 304

neighbors vj. These latter are chosen because this maximizes 305

the local normalized cut defined as Aij(1/di+1/dj). A is the 306

adjacency matrix, di and dj are the weights corresponding 307

to the vertices viand vjrespectively. After that, the vertices 308

are marked, and the coarsened weights correspond to the 309

sum of their weights. This operation is performed for all 310

the nodes in the graph. The coarsening operation allows 311

dividing the number of nodes by 2. As a result, the coarsened 312

vertices and the original vertices are not arranged. However, 313

it is possible to arrange the vertices, so the graph pooling 314

operation becomes efficient. We note that, for a pooling of 315

size 4, two coarsening levels of size 2 are needed. Fig. 5 316

illustrates an example of graph pooling. 317

After coarsening, each node has either two children, if it was 318

matched at the finer level, or one, if it was not, i.e the node 319

was a singleton. From the coarsest to finest level, fake nodes, 320

i.e. disconnected nodes, are added to pair with the singletons 321

such that each node has two children. Let us carry out a max-322

pooling of size 4 on the finest G0 graph given as input of 323

size n0= 12. Two coarsening levels of size 2 are needed: let 324

Graclus gives G1of size n1 = 6 and G2of size n2 = 3 the 325

coarsest graph. Fake nodes (in red) are added to G1(1 node) 326

and G1(4 nodes) to pair with the singletons (in yellow), such 327

that each node has exactly two children. 328 2 1 5 9 6 3 4 8 10 1 3 ₅ 6 4 2 1 2 ₃ 1 2 3 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11 12 11 7 12 G0 G1 G2 (a) Coarsening (b) Pooling

FIGURE 5. Graph coarsening and pooling.

3) Node classification

329

In the following, we consider a two-layer GCN for node 330

classification on a graph with a symmetric adjacency matrix 331

A. We first calculate ˆA = ˜D−1/2A ˜˜D−1/2in a pre-processing 332

step. The forward model then takes the simple form: 333

(6)

Here, W0∈ RP ×H _{is an input-to-hidden weight matrix for a} 334

hidden layer with H feature maps. W1_{∈ R}H×F _{is a} hidden-335

to-output weight matrix. The softmax activation function, 336 defined as: 337 softmax(xi) = 1 zexp(xi) (9) with z = P

iexp(xi) is applied row-wise. For multiclass 338

classification, we then evaluate the cross-entropy error over 339

all labeled examples: 340 L = X l∈YL F X f =1 Ylfln Zlf (10)

where YLis the set of node indices that have labels, and F is 341

the number of filters. Zlf is the result of the model for each 342

node l and each filter f . 343

The neural network weights W(0)and W(1)are trained using 344

gradient descent. 345

C. TRAINING AND QUALITY SCORE ESTIMATION

346

To estimate the visual quality of a distorted mesh, we rely 347

on a classification process. We consider five classes based on 348

the subjective Mean Opinion Score (MOS) for each database 349

(very bad, bad, medium, good, and excellent quality). 350

The training process follows the leave-one-out cross-351

validation (LOOCV) according to the following process: 352

• We build a training regression model using all the exist-353

ing 3D objects in the repository except one object and 354

its distorted versions. 355

• The excluded subset is then used for the test using the 356

constructed training model. 357

• The process is repeated for each 3D object in the repos-358

itory. 359

It is noteworthy that we generate one training model for each 360

excluded object. In addition, for the cross-dataset evaluation 361

( Section. V-C) we train one model using one database and 362

the other databases are used for the test. 363

The negative log- following the model described in likeli-364

hood is minimized over the training set using the Stochastic 365

Gradient Descent (SGD). The back-propagation algorithm is 366

used to compute the gradients. At the same time, dropout 367

is employed to avoid over-fitting. For the test phase, the 368

excluded objects serve as a test set, and the trained network 369

classifies the given distorted meshes according to their visual 370

quality. Finally, comparisons with the subjective labels are 371

performed using correlation measurements. 372

IV. EXPERIMENTAL SETUP

373

In the following, we describe the experimental protocol, 374

including the evaluation criteria. We also report a brief de-375

scription of the databases. 376

A. DATASETS

377

The goal of MVQA is to provide quality predictions corre-378

lated with human observer’s opinions. Therefore, a dataset 379

of distorted meshes graded by human observers is needed 380

to evaluate the algorithms. As the meshes geometric aspect 381

significantly influences the evaluation process, care must be 382

taken when choosing the dataset. It must contain meshes 383

that reflect adequate diversity in their content, and generated 384

distortions should reflect a broad range of mesh degradation. 385

To comply with this argument, four datasets are used in the 386

experiments. These databases, specially designed for quality 387

metrics evaluation, are made of original and distorted mesh. 388

Furthermore, the MOS values are provided for each dataset. 389

Fig. 6 shows the reference objects of the four databases 390

briefly described below. 391

(a)

(b)

(c)

(d)

FIGURE 6. The reference models from: the LIRIS masking database (a), the general-purpose database (b) and the UWB compression database (c)and the IEETA simplification database (d).

• LIRIS/EPFL General-Purpose database [46]: This 392

database, created at the Ecole Polytechnique Federale 393

de Lausanne, contains four reference meshes and 84 394

distorted models (88 models in total). Two types of 395

distortion are applied, smoothing and noise addition, 396

either locally or globally, on the reference mesh. 397

• LIRIS Masking database [47]: This database, originally 398

from the University of Lyon, consists of four refer-399

ence meshes and 24 distorted models with local noise 400

addition. This database’s specific objective is to test 401

the ability of MVQA methods in capturing the visual 402

masking effect. 403

• UWB compression database [2] includes five reference 404

models and 63 distorted models. From Twelve to thir-405

teen distorted versions, obtained by a compression al-406

gorithm, are associated with each reference model. 407

• The IEETA simplification database [59] comprises 35 408

models: five reference meshes and six simplified models 409

for each reference mesh. The simplified models are 410

obtained using three simplification algorithms with two 411

(7)

Even though some reference objects are the same in some 413

databases, their distortions are completely different regarding 414

the type and the level of distortions, which provides an 415

important diversity to evaluate MVQ assessment methods. 416

For instance, we have two similar objects (Armadillo and 417

Dyno ) in the LIRIS masking database Fig. 6 (a) and the 418

general-purpose database Fig. 6 (b). In the first database, 419

distortions are obtained by adding only local noise. However, 420

in the second database, distortions are obtained by smoothing 421

and adding global noise with different levels. 422

B. VALIDATION PROTOCOL

423

The correlation between the scores of an objective MVQ 424

method and the MOS provided by human subjects allows 425

measuring its performance. Two types of correlation coef-426

ficients are commonly used. The Pearson linear correlation 427

coefficient (rp), which measures the prediction accuracy, and 428

the Spearman rank-order correlation coefficient (rs), which 429

measures the prediction monotonicity [64]. The Spearman 430

rank-order correlation coefficient (rp) depends only on the 431

rank of the models objective scores. If the rank of objective 432

scores is similar to the rank of MOSs, a high correlation value 433

is obtained, regardless of the distance between the objec-434

tive score and the corresponding MOS. For both criteria, a 435

higher value indicates better prediction performance. These 436

measures are defined as follows: 437

rp=

Pn

i=1(Qsi− ¯Qs)(MOSi−MOS)¯ q

Pn

i=1(Qsi− ¯Qs)2 q

Pn

i=1(MOSi−MOS)¯ 2 (11)

rs= 1 − Pn

i=1(rank(MOSi) − rank(Qsi))2

n(n2_{− 1)} (12)

where n denotes the numbers of distortions in a given 438

database. The mean opinion scores provided in the database 439

are defined by MOSi, and Qsiis the objective quality score 440

obtained by a given method. 441

442

The relation between the values of the scores obtained 443

by the objective method and the mean opinion scores is 444

non-linear, and they are not easy to interpret by users. It is 445

highly recommended to introduce a psychometric fitting to 446

handle this non-linearity. Several functions can be used to fit 447

the objective scores such as cubic polynomial function [69] 448

and logistic function [70]. Most compared methods in this 449

work used a cumulative Gaussian psychometric function as 450

in [25], [28]. We use the same function to guarantee a fair 451

comparison. The cumulative Gaussian psychometric function 452 [65] is defined by: 453 p(a, b, X) = √1 2π Z ∝ a+bX exp − t 2 2 dt (13)

where X is the objective score, a and b are two parameters 454

to be determined. These parameters are estimated for each 455

database using the objective values of the distorted versions 456

and the corresponding MOS with the help of the curve fitting 457

toolbox under Matlab. 458

V. EXPERIMENTAL RESULTS

459

Here, we report the main results on the influence of the mesh 460

features on objective quality effectiveness. Then, we present 461

the results of a comparative evaluation with the state-of-the-462

art methods using four subjective databases. Finally, a cross-463

database assessment is performed to test the generalization 464

ability of the proposed method. 465

A. INFLUENCE OF THE GEOMETRIC ATTRIBUTES

466

In this section, we evaluate the impact on the performance of 467

the extracted handcrafted features. Several sets used as input 468

to the GCN have been tested. It should be noted that results 469

only take into account three geometric features: curvature 470

(including mean and Gaussian curvature), angles and Lapla-471

cian of Gaussian, and one perceptual attribute (saliency), 472

more features have been investigated before finding the 473

best combination. Here, we provide the results for different 474

feature combinations. The features under test are minimum 475

and maximum directions (Umin and Umax), curvature-based 476

(Cmin, Cmax, Cmean, and Cgauss), Normal (Nor), Saliency 477

(S), Angle (An), and Laplacian of Gaussian (LoG). 478

Table 1 illustrates the main findings. Curvature-based and 479

angle features exhibit good correlation values. Adding the 480

Saliency improves the performance considerably. Adding 481

LoG leads to similar results. Unfortunately, adding the di-482

rections (Umin and Umax) and the normal (Nor) does not 483

improve the results. The dihedral angle is the angle between 484

normal vectors of two adjacent faces. Adding the information 485

of normal vectors does not provide added value since it is 486

already included in the dihedral angle. In the same vein, 487

Umin and Umax present only the directions of minimum and 488

maximum curvature which are already included in the curva-489

ture feature itself. Experimentally, adding these features leads 490

only to redundant information without any added value and 491

the training time increased remarkably. The best performance 492

is obtained using the four features (C + Ang + LoG + S). 493

Indeed, this leads us to the best correlation scores. Therefore, 494

we only rely on Cmin, Cmax, Cmean, Cgauss, Ang, LoG, 495

and S to construct the features matrix. 496

B. COMPARISON WITH THE STATE-OF-THE-ART

497

Here, a comparative analysis is conducted. The proposed 498

method is compared to the state-of-the-art methods including 499

full reference, reduced reference and no reference methods: 500

• Full reference methods: HD [19], RMS [18], MSDM2 501

[20], TPDM [25], Chetouani [26]. 502

(8)

TABLE 1. Correlation coefficients rs(%) and rp(%) using different feature combinations on the general-purpose database.

Features Armadillo Dyno Venus Rocker All models

rs rp rs rp rs rp rs rp rs rp C 75.6 78.1 82.1 81.8 88.1 86.0 74.7 73.1 76.5 74.5 C + An 86.3 85.9 88.0 88.6 90.7 91.2 86.9 87.8 83.7 82.9 C + An+ S 90.9 89.1 90.1 88.3 92.7 89.8 87.1 86.0 86.7 85.7 C + An+ LoG 93.8 92.7 88.9 90.5 91.3 91.0 90.7 88.8 87.9 88.6 C + An+ LoG + S 92.4 93.2 90.5 91.0 94.2 93.6 89.5 89.2 90.3 89.8 C + An+ LoG + S + U 90.5 91.4 92.5 92.3 91.4 91.2 88.9 88.4 88.5 87.3

C + An+ LoG + S + U + Nor 91.2 90.1 89.1 88.9 92.8 93.1 88.1 87.0 87.9 86.3

The correlation coefficients values rsand rpon the LIRIS 508

masking, LIRIS/EPFL General-purpose, UWB compression 509

and the IEETA simplification databases are listed in Tables. 2, 510

3, 4, 5, respectively. 511

We note that, to compute the correlation coefficients for a 512

given object, the statistical dependence is computed between 513

predicted classes of all its distorted versions and the corre-514

sponding ground truth classes. To compute the correlation 515

coefficients for the whole database, the statistical dependence 516

is computed between classes of all objects in the repository 517

and their ground truth classes. 518

• The General-purpose database (see Table. 2 ) is the 519

largest MVQ database so far. It contains the highest 520

number of distorted meshes than the other databases 521

(i.e., 84 distorted meshes and a variety of distortion 522

types). On this database, the proposed method shows 523

good performance and provides good correlation coeffi-524

cients (rs= 89.3% and rp= 88.6%). 525

• On the LIRIS masking database (see Table 3), the pro-526

posed method exhibit high Spearman and Pearson corre-527

lation coefficients on the whole corpus (rs= 91.7% and 528

rp = 90.9%). It outperforms the NR methods (BMQI, 529

NRCNN1, and NR-GRNN) and the most effective FR 530

and RR methods. 531

• On the UWB compression database (see Table 4 ), the 532

proposed method is the most efficient in terms of rs 533

score among the NR methods rs = 90.5%. Besides, 534

it provides competitive scores by outperforming many 535

effective FR and RR methods. 536

• On the IEETA simplification database (see Table 5 ), 537

the proposed method provides competitive correlation 538

coefficients (rs = 89.9% and rp = 89.4%). The 539

perceptual methods MSDM2, TPDM, and FMPD are 540

also shown to be effective in this database. 541

The classical measures, HD and RMS, are widely used to 542

estimate the perceived visual quality. These methods are 543

easy to use since a simple geometric distance permits to 544

predict the quality score. However, they generally fail to 545

assess the perceived quality score. It is the reason why their 546

scores are not high enough on all test databases. Another 547

drawback is that they require the same connectivity on the 548

compared meshes. Including perceptual features increase 549

the performances considerably. For instance, MSDM2 and 550

TPDM incorporate perceptual information (mesh curvature). 551

Thus, a better prediction is achieved than the geometric 552

measures as reflected by the correlation coefficient scores. 553

FMPD includes a roughness measure which is an essential 554

feature in mesh processing. The scores provided by this 555

method are very competitive. Indeed, they correctly estimate 556

the perceived quality. 557

Based on these results, the proposed method shows high 558

performance on all subjectively-rated MVQ databases, as 559

proven by its high scores on the individual models and the 560

whole repositories. 561

The method CNNs-CMP indeed provides the highest 562

scores. However, its main drawback is that it uses 2D images 563

rendered from 3D meshes to feed a deep CNN. Moreover, 564

the performance of this method is obtained using three pre-565

trained CNNs, which is time-consuming. Indeed, we use in 566

this work the 3D mesh directly without rendering to feed a 567

GCN. To the best of our knowledge, this is the first method 568

that uses a graph network to evaluate the perceived visual 569

quality. The obtained scores are very promising (exceed or 570

close to 90%), our method outperforms many full reference 571

and reduced reference methods including BMQI, TPDM, 572

3DWPM, and other methods. In addition, It is noteworthy 573

that we used a shallow network to test if graph represen-574

tations are useful for such an application. The proposed 575

network provides competitive results with the least possible 576

resources. 577

C. CROSS DATASET EVALUATION

578

In this section, we investigate the generalization ability of the 579

proposed method to predict the quality. To do so, we perform 580

a cross-database evaluation by training our network on the 581

General-purpose database and using the other databases for 582

the test. We choose this database for the training process be-583

cause it contains the highest number of distorted models and 584

a wide variety of distortion types. Table. 6 shows the evalu-585

ation results. It presents the correlation coefficients of each 586

3D object in the three tested databases (i.e., LIRIS masking, 587

UWB compression, and IEETA simplification) and the scores 588

for the whole repositories. The cross-dataset evaluation is 589

performed to test the generalization ability of a method. In 590

which a network trained on one corpus would be evaluated 591

on a range of out-of-dataset corpora. Instead of evaluating the 592

performance solely based on one dataset or multiple datasets 593

(9)

TABLE 2. Correlation coefficients rs(%) and rp(%) of different objective methods on the LIRIS/EPFL general-purpose database.

Type Method Armadillo Dyno Venus Rocker All models

rs rp rs rp rs rp rs rp rs rp Full Reference HD [19] 69.5 30.2 30.9 22.6 1.6 0.8 18.1 5.5 13.8 1.3 RMS [18] 62.7 32.3 0.3 0.0 90.1 77.3 7.3 3.0 26.8 7.9 MSDM2 [20] 81.6 85.3 85.9 85.7 89.3 87.5 89.6 87.2 80.4 81.4 TPDM [25] 84.5 78.8 92.2 89.0 90.6 91.0 92.2 91.4 89.6 89.2 Chetouani [26] 75.7 86.1 90.6 90.0 94.9 95.5 91.4 92.1 88.1 90.9 Reduced Reference 3DWPM1 [56] 65.8 35.7 62.7 35.7 71.6 46.6 87.5 53.2 69.3 38.4 3DWPM2 [56] 74.1 43.1 52.4 19.9 34.8 16.4 37.8 29.9 49.0 24.6 FMPD [28] 75.4 83.2 89.6 88.9 87.5 83.9 88.8 84.7 81.9 83.5 No-Reference NR-SVR [52] 76.8 91.5 78.6 84.1 85.7 88.6 86.2 86.6 81.5 87.8 NR-GRNN [51] 87.1 97.3 91.2 94.1 86.3 85.0 78.6 74.8 86.2 88.7 NR-CNN1 [49] 87.2 84.3 86.4 86.2 92.2 85.6 91.3 85.2 83.6 82.7 BMQI [57] 20.1 - 83.5 - 88.9 - 92.7 - 78.1 -BMQA-GSES [37] 98.7 80 99.2 80.4 98.8 80.1 99.5 99.9 90.5 87.9 CNN-BMQA [62] 89.8 91.4 91.6 92.2 94.6 93.8 91.9 93.9 90.0 92.0 CNNs-CMP [63] 95.8 95.6 93.6 92.9 93.4 91.3 94.5 95.2 92.6 91.3 MVQ-GCN (proposed) 91.8 92.5 87.7 84.5 93.7 91.9 89.6 88.4 89.3 88.6

TABLE 3. Correlation coefficients rs(%) and rp(%) of different objective methods on the LIRIS masking database.

Type Method Armadillo Lion Bimba Dyno All models

rs rp rs rp rs rp rs rp rs rp Full Reference HD [19] 48.6 37.7 71.4 25.1 25.7 7.5 48.6 31.1 26.6 4.1 RMS [18] 65.6 44.6 71.4 23.8 71.4 21.8 71.4 50.3 48.8 17.0 MSDM2 [20] 81.1 88.6 93.5 94.3 96.8 100 95.6 100 87.3 89.6 TPDM [25] 91.4 88.6 88.4 82.9 97.1 100 71.1 100 88.6 90.0 Chetouani [26] 99.0 99.0 83.0 94.0 99.0 99.0 93.0 98.0 93.9 97.8 Reduced Reference 3DWPM1 [56] 58.0 41.8 20.0 9.7 20.0 8.4 66.7 45.3 29.4 10.2 3DWPM2 [56] 48.6 37.9 38.3 22.0 37.1 14.4 71.4 50.1 37.4 18.2 FMPD [28] 94.2 88.6 93.5 94.3 98.9 100 96.9 94.3 80.8 80.2 No-Reference NR-SVR [52] 89.5 84.7 100 96.3 94.2 93.6 94.4 89.7 90.4 91.2 NR-GRNN [51] 82.3 80.5 94.1 97.0 90.2 94.3 78.2 82.3 90.2 82.4 NR-CNN1 [49] 95.2 97.6 89.4 91.6 93.4 98.7 96.3 89.9 88.2 85.4 BMQI [57] 94.3 - 94.3 - 100 - 83.0 - 78.1 -CNN-BMQA [62] 92.4 91.7 92.2 93.1 97.9 97.3 93.4 92.6 91.4 90.8 CNNs-CMP [63] 96.2 95.5 93.1 92.4 92.5 92.8 94.2 94.0 93.2 92.8 MVQ-GCN (proposed) 92.6 91.2 93.5 89.9 88.6 89.8 89.3 88.7 91.7 90.9

model performance from a different point of view. 595

The results of the masking database in Table. 6 (rs= 90.8% 596

and rp = 91.2%) are close to the results in Table. 3 597

(rs = 91.7% and rp = 90.9%) and even better in terms of 598

rp. It is due that this database has the same distortion types 599

as the training dataset (general purpose). For the compression 600

and simplification databases that have unknown objects and 601

distortions, it is quite normal to get slightly lower scores 602

compared to Tables. 4 and 5. In terms of generalization 603

ability, the obtained results ensure a good performance. 604

VI. CONCLUSION

605

In this work, we rely on a shallow graph convolutional 606

network to accurately estimate distorted meshes perceived 607

visual quality. The graph consists of two convolutional lay-608

ers, a pooling layer and a classification layer. The network 609

is fed by a graph represented by an adjacency matrix and 610

a feature matrix containing three geometric (curvature, an-611

gles and Laplacian of Gaussian curvature) and a perceptual 612

feature (Saliency). The proposed method successfully pre-613

dicts distorted meshes visual quality, as proven by the high 614

correlations with human judgment. To study the impact of 615

the network architecture and improve the performances, a 616

(10)

TABLE 4. Correlation coefficients rs(%) and rp(%) of different objective methods on the UWB compression database.

Type Method Bunny James Jessy Nissan Helix All models

rs rp rs rp rs rp rs rp rs rp Full Reference HD [19] 34.1 52.2 -16.8 6.8 -23.6 12.5 14.4 23.6 45.1 46.4 10.6 28.3 RMS [18] 34.2 20.9 14.0 10.8 0.0 14.8 17.8 29.7 46.9 44.6 22.0 24.1 MSDM2 [20] 97.4 90.1 82.6 69.2 84.3 63.1 84.4 73.1 98.1 94.7 89.3 78.0 TPDM [25] 95.1 96.5 90.8 73.6 85.8 75.8 82.7 73.4 98.7 95.0 91.5 82.9 Reduced Reference 3DWPM1 [56] 94.7 93.4 77.3 72.3 87.2 89.5 63.6 59.3 98.0 95.2 84.1 81.9 3DWPM2 [56] 96.0 91.2 76.9 65.3 86.9 85.9 56.3 67.6 95.5 94.3 82.3 80.9 FMPD [28] 94.2 89.6 95.3 91.2 63.3 60.0 92.4 77.5 98.4 90.8 88.8 81.8 DAME [2] 96.8 93.4 95.7 93.4 84.4 70.5 93.9 75.3 96.6 95.2 93.5 85.6 No-Reference CNN-BMQA [62] 95.8 91.7 96.2 95.6 92.3 90.5 88.7 84.7 96.7 94.6 90.1 88.3 CNNs-CMP [63] 95.6 94.8 92.5 90.6 92.5 87.1 88.7 89.0 90.7 90.4 90.2 90.9 MVQ-GCN (proposed) 93.4 92.3 90.6 90.9 91.2 91.0 88.6 89.5 90.1 91.2 90.5 87.7

TABLE 5. Correlation coefficients rs(%) and rp(%) of different objective methods on the IEETA simplification database.

Type Method Bones Bunny Head Lung Strange All models

rs rp rs rp rs rp rs rp rs rp Full Reference HD [19] 92.0 94.3 37.8 39.5 72.8 88.6 80.6 88.6 52.3 37.1 50.5 49.4 RMS [18] 86.4 94.3 94.5 77.1 49.6 42.9 89.0 100 90.4 88.6 59.6 70.2 MSDM2 [20] 98.3 94.3 98.1 77.1 88.9 88.6 92.3 60.0 99.0 94.3 89.2 86.7 TPDM [25] 99.0 94.3 98.0 94.3 63.1 65.7 98.6 94.3 98.7 94.3 86.9 88.2 Reduced Reference FMPD [28] 96.0 88.6 98.0 94.3 70.4 65.7 95.5 88.6 96.0 65.7 89.3 87.2 No-Reference CNN-BMQA [62] 96.8 93.7 96.7 90.9 96.4 93.6 94.3 89.6 95.4 93.5 90.4 90.2 CNNs-CMP [63] 91.3 88.9 91.1 92.4 91.8 91.5 95.3 89.4 91.1 88.9 90.6 90.4 MVQ-GCN (proposed) 90.4 89.6 91.0 91.8 90.6 92.1 93.6 92.5 89.1 88.8 89.9 89.4

TABLE 6. Correlation coefficients rs(%) and rp(%) obtained by training on the General-purpose and testing on LIRIS masking, UWB compression and IEETA simplification.

Database Object Scores Database Object Scores Database Object Scores

rs rp rs rp rs rp

LIRIS Masking Armadillo 87.3 85.1 UWB Compression Bunny 84.9 91.0 IEETA Simplification Bones 84.2 81.6

Lion 90.6 87.1 James 88.1 85.3 Bunny 90.1 89.3

Bimba 92.3 93.5 Jessy 83.4 84.6 Head 86.7 80.1

Dyno 90.1 88.7 Nissan 92.7 90.4 Lung 83.6 84.5

Helix 86.5 85.3 Strange 86.1 84.8

All models 90.8 91.2 All models 87.2 86.3 All models 84.8 83.3

network architectures (especially more convolutional layers). 618

A weighted graph could be adopted by including perceptual 619

information to the nodes, such as visual saliency. 620

REFERENCES

621

[1] Cignoni, Paolo, Claudio Rocchini, and Roberto Scopigno. "Metro:

mea-622

suring error on simplified surfaces." In Computer graphics forum, vol. 17,

623

no. 2, pp. 167-174. Oxford, UK and Boston, USA: Blackwell Publishers,

624

1998.

625

[2] Váša, Libor, and Jan Rus. "Dihedral angle mesh error: a fast perception

626

correlated distortion measure for fixed connectivity triangle meshes." In

627

Computer Graphics Forum, vol. 31, no. 5, pp. 1715-1724. Oxford, UK:

628

Blackwell Publishing Ltd, 2012.

629

[3] Yang, Junfeng, Wei Zhang, Xiaolong Li, Ting Zhou, and Bo Ou. "Full

630

Reference Image Quality Assessment by Considering Intra-Block

Struc-631

ture and Inter-Block Texture." IEEE Access 8 (2020): 179702-179715.

632

[4] Meynet, Gabriel, Yana Nehmé, Julie Digne, and Guillaume Lavoué.

633

"PCQM: A full-reference quality metric for colored 3D point clouds." In

634

2020 Twelfth International Conference on Quality of Multimedia

Experi-635

ence (QoMEX), pp. 1-6. IEEE, 2020.

636

[5] Ding, Keyan, Kede Ma, Shiqi Wang, and Eero P. Simoncelli. "Comparison

637

of full-reference image quality models for optimization of image

process-638

ing systems." International Journal of Computer Vision 129, no. 4 (2021):

639

1258-1281.

640

[6] Xu, Munan, Junming Chen, Haiqiang Wang, Shan Liu, Ge Li, and

641

Zhiqiang Bai. "C3DVQA: Full-reference video quality assessment with 3d

642

convolutional neural network." In ICASSP 2020-2020 IEEE International

643

Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.

644

4447-4451. IEEE, 2020.

645

[7] Varga, Domonkos. "Composition-preserving deep approach to

full-646

reference image quality assessment." Signal, Image and Video Processing

647

14, no. 6 (2020): 1265-1272.

648

[8] Agarla, Mirko, Luigi Celona, and Raimondo Schettini. "No-Reference

649

Quality Assessment of In-Capture Distorted Videos." Journal of Imaging

650

6, no. 8 (2020): 74.

651

[9] Zhang, Ziliang, Yuming Fang, Jiebin Yan, and Rengang Du.

"No-652

Reference Quality Assessment for Realistic Distorted Images by Color

653

Moment and Texture Features." In 2020 IEEE Conference on Multimedia

654

Information Processing and Retrieval (MIPR), pp. 342-347. IEEE, 2020.

655

[10] Liu, Yutao, and Xiu Li. "No-Reference Quality Assessment for

Contrast-656

Distorted Images." IEEE Access 8 (2020): 84105-84115.

657

[11] Xiaohan Yang, Fan Li, Hantao Liu,”Deep feature importance awareness

(11)

based no-reference image quality prediction”, Neurocomputing,Volume

659

401, 2020, pp 209-223

660

[12] Yang, Jiachen, Yang Zhao, Bin Jiang, Wen Lu, and Xinbo Gao.

"No-661

Reference Quality Evaluation of Stereoscopic Video Based on

Spatio-662

Temporal Texture." IEEE Transactions on Multimedia 22, no. 10 (2019):

663

2635-2644.

664

[13] Abbas, Naveed, Tanzila Saba, Siraj Khan, Zahid Mehmood, Amjad

665

Rehman, and Rubby Tabasum. "Reduced Reference Image Quality

Assess-666

ment Technique Based on DWT and Path Integral Local Binary Patterns."

667

Arabian Journal for Science and Engineering 45, no. 4 (2020): 3387-3401.

668

[14] Ma, Jian, Xinxin Zhao, and Youxin Xu. "Reduced-Reference Stereoscopic

669

Image Quality Assessment Based on Entropy of Gradient Primitives." In

670

2020 IEEE 5th International Conference on Signal and Image Processing

671

(ICSIP), pp. 206-209. IEEE, 2020.

672

[15] Viola, Irene, and Pablo Cesar. "A reduced reference metric for visual

673

quality evaluation of point cloud contents." IEEE Signal Processing Letters

674

27 (2020): 1660-1664.

675

[16] Xiongkuo Min, Ke Gu, Guangtao Zhai, Menghan Hu, Xiaokang

676

Yang,”Saliency-induced reduced-reference quality index for natural scene

677

and screen content images”,Signal Processing,Vol 145,2018,pp 127-136

678

[17] Lijuan Tang, Kezheng Sun, Luping Liu, Guangcheng Wang, Yutao Liu,

679

“A reduced-reference quality assessment metric for super-resolution

re-680

constructed images with information gain and texture similarity”, Signal

681

Processing: Image Communication,Volume 79,2019,pp 32-39,

682

[18] Paolo Cignoni, Claudio Rocchini, and Roberto Scopigno, “Metro:

Mea-683

suring error on simplified surfaces,” in Computer Graphics Forum.Wiley

684

Online Library, 1998, vol. 17, pp. 167– 174.

685

[19] Nicolas Aspert, Diego Santa-Cruz, and Touradj Ebrahimi, “Mesh:

Measur-686

ing errors between surfaces using the hausdorff distance,” in Multimedia

687

and Expo, 2002. ICME’02. Proceedings. 2002 IEEE International

Confer-688

ence on. IEEE, 2002, vol. 1, pp. 705–708.

689

[20] Guillaume Lavoué, “A multiscale metric for 3d mesh visual quality

assess-690

ment,” in Computer Graphics Forum. Wiley Online Library, 2011, vol. 30,

691

pp. 1427–1437.

692

[21] Ricardo Pastrana-Vidal, Jean Claude Gicquel and Jean-Louis Blin, and

693

Hocine Cherifi, "Predicting subjective video quality from separated spatial

694

and temporal assessment", Proceedings SPIE Human Vision and

Elec-695

tronic Imaging XI, 2006, Vol6057, 60570S

696

[22] Pastrana-Vidal, Ricardo R and Gicquel, Jean-Charles and Colomes,

697

Catherine and Cherifi, Hocine,”Frame dropping effects on user quality

698

perception”, Proc. 5th Int. WIAMIS,2004

699

[23] El Hassouni, Mohammed, Cherifi, Hocine and Aboutajdine, Driss,

"HOS-700

based image sequence noise removal", IEEE Transactions on Image

Pro-701

cessing,vol15, num 3, pp572–581,2006

702

[24] Chetouani, Aladine, and Leida Li. "On the use of a scanpath predictor and

703

convolutional neural network for blind image quality assessment." Signal

704

Processing: Image Communication 89 (2020): 115963.

705

[25] Fakhri Torkhani, Kai Wang, and Jean-Marc Chassery, “A curvature tensor

706

distance for mesh visual quality assessment,” Computer Vision and

Graph-707

ics, pp. 253–263, 2012.

708

[26] Chetouani, Aladine. "Three-dimensional mesh quality metric with

refer-709

ence based on a support vector regression model." Journal of Electronic

710

Imaging 27.4 (2018): 043048.

711

[27] Massimiliano Corsini, Elisa Drelie Gelasca, Touradj Ebrahimi, and Mauro

712

Barni, “Watermarked 3-d mesh quality assessment,” IEEE Transactions on

713

Multimedia, vol. 9, no. 2, pp. 247–256, 2007.

714

[28] Kai Wang, Fakhri Torkhani, and Annick Montanvert, “A fast

roughness-715

based approach to the assessment of 3D mesh visual quality,” Computers

716

& Graphics, vol. 36, no. 7, pp. 808–818, 2012.

717

[29] Cao, Keming, Yi Xu, and Pamela Cosman. "Visual Quality of Compressed

718

Mesh and Point Cloud Sequences." IEEE Access 8 (2020):

171203-719

171217.

720

[30] Zerman, Emin, Cagri Ozcinar, Pan Gao, and Aljosa Smolic. "Textured

721

mesh vs coloured point cloud: A subjective study for volumetric video

722

compression." In 2020 Twelfth International Conference on Quality of

723

Multimedia Experience (QoMEX), pp. 1-6. IEEE, 2020.

724

[31] Xiang Feng, Wanggen Wan, Richard Yi Da Xu, Stuart Perry, Pengfei Li,

725

Song Zhu,”A novel spatial pooling method for 3D mesh quality assessment

726

based on percentile weighting strategy”, Computers and Graphics,Volume

727

74,2018,pp 12-22

728

[32] Yildiz, Zeynep Cipiloglu, A. Cengiz Oztireli, and Tolga Capin. "A

ma-729

chine learning framework for full-reference 3D shape quality assessment."

730

The Visual Computer 36, no. 1 (2020): 127-139.

731

[33] Abouelaziz, Ilyass, Aladine Chetouani, Mohammed El Hassouni, and

732

Hocine Cherifi, "A blind mesh visual quality assessment method based on

733

convolutional neural network." Electronic Imaging 2018.18 (2018): 1-5.

734

[34] Abouelaziz, Ilyass, Mohammed El Hassouni, and Hocine Cherifi. "A

735

convolutional neural network framework for blind mesh visual quality

as-736

sessment." Image Processing (ICIP), 2017 IEEE International Conference

737

on. IEEE, 2017.

738

[35] Rogowitz, Bernice E., and Holly E. Rushmeier. "Are image quality metrics

739

adequate to evaluate the quality of geometric objects?." In Human Vision

740

and Electronic Imaging VI, vol. 4299, pp. 340-348. International Society

741

for Optics and Photonics, 2001.

742

[36] Wu, Zonghan, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi

743

Zhang, and S. Yu Philip. "A comprehensive survey on graph neural

744

networks." IEEE transactions on neural networks and learning systems

745

(2020).

746

[37] Lin, Yaoyao, Mei Yu, Ken Chen, Gangyi Jiang, Fen Chen, and Zongju

747

Peng. "Blind mesh assessment based on graph spectral entropy and spatial

748

features." Entropy 22, no. 2 (2020): 190.

749

[38] Cheung Gene, Magli Enrico, Tanaka Yuichi and Ng Michael K., "Graph

750

Spectral Image Processing," in Proceedings of the IEEE, vol. 106, no. 5,

751

pp. 907-930, May 2018.

752

[39] Rital Soufiane, Bretto Alain, Cherifi Hocine and Aboutajdine Driss “A

753

combinatorial edge detection algorithm on noisy images,International

754

Symposium on VIPromCom Video/Image Processing and Multimedia

755

Communications,2002 pp351-355,IEEE

756

[40] Rital Soufiane, Cherifi Hocine and Miguet Serge “Weighted

757

adaptive neighborhood hypergraph partitioning for image

758

segmentation”,International Conference on Pattern Recognition and

759

Image Analysi, 2005, PP 522–531, Springer, Berlin, Heidelberg

760

[41] Sanfeliu, Alberto, René Alquézar, J. Andrade, Joan Climent, Francesc

761

Serratosa, and J. Vergés. "Graph-based representations and techniques

762

for image processing and image analysis." Pattern recognition 35, no. 3

763

(2002): 639-650.

764

[42] Yu, Jun, Dacheng Tao, and Meng Wang. "Adaptive hypergraph learning

765

and its application in image classification." IEEE Transactions on Image

766

Processing 21, no. 7 (2012): 3262-3272.

767

[43] Wu, Zonghan, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi

768

Zhang, and S. Yu Philip. "A comprehensive survey on graph neural

769

networks." IEEE transactions on neural networks and learning systems

770

(2020).

771

[44] Lasfar, A., S. Mouline, Driss Aboutajdine, and Hocine Cherifi.

"Content-772

based retrieval in fractal coded image databases." In Proceedings 15th

773

International Conference on Pattern Recognition. ICPR-2000, vol. 1, pp.

774

1031-1034. IEEE, 2000.

775

[45] Dhillon, Inderjit S., Yuqiang Guan, and Brian Kulis. "Weighted graph cuts

776

without eigenvectors a multilevel approach." IEEE transactions on pattern

777

analysis and machine intelligence 29.11 (2007): 1944-1957.

778

[46] Lavoué, Guillaume, Elisa Drelie Gelasca, Florent Dupont, Atilla Baskurt,

779

and Touradj Ebrahimi. "Perceptually driven 3D distance metrics with

780

application to watermarking." In Applications of Digital Image Processing

781

XXIX, vol. 6312, p. 63120L. International Society for Optics and

Photon-782

ics, 2006.

783

[47] Lavoué, Guillaume, Mohamed Chaker Larabi, and Libor Váša. "On the

784

efficiency of image metrics for evaluating the visual quality of 3D models."

785

IEEE Transactions on Visualization and Computer Graphics 22, no. 8

786

(2015): 1987-1999.

787

[48] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning."

788

nature 521, no. 7553 (2015): 436.

789

[49] Abouelaziz, Ilyass, Mohammed El Hassouni, and Hocine Cherifi. "A

790

convolutional neural network framework for blind mesh visual quality

791

assessment." In 2017 IEEE International Conference on Image Processing

792

(ICIP), pp. 755-759. IEEE, 2017.

793

[50] Chung, Fan RK, and Fan Chung Graham. Spectral graph theory. No. 92.

794

American Mathematical Soc., 1997.

795

[51] Abouelaziz I, El Hassouni M, Cherifi H (2016) A curvature based method

796

for blind mesh visual quality assessment using a general regression neural

797

network. In: 2016 12th international conference on signal-image

technol-798

ogy & internet-based systems (SITIS). IEEE, pp 793–797.

799

[52] Abouelaziz, Ilyass, Mohammed El Hassouni, and Hocine Cherifi.

"No-800

reference 3d mesh quality assessment based on dihedral angles model

801

and support vector regression." In International Conference on Image and

802

Signal Processing, pp. 369-377. Springer, Cham, 2016.

803

[53] Zachi Karni and Craig Gotsman, "Spectral compression of mesh

geome-804

try," in Proceedings of the 27th annual conference on Computer graphics

(12)

and interactive techniques. ACMPress Addison- 604 Wesley Publishing

806

Co., 2000, pp. 279–286.

807

[54] Olga Sorkine, Daniel Cohen-Or, and Sivan Toledo, "High-pass

quantiza-808

tion for mesh encoding.," in Symposium on Geometry Processing, 2003,

809

vol. 42.

810

[55] Zhe Bian, Shi-Min Hu, and Ralph R Martin, "Evaluation for small visual

811

difference between conforming meshes on strain field," Journal of

Com-812

puter Science and Technology, vol. 24, no. 1, pp. 65–75, 2009.

813

[56] Massimiliano Corsini, Elisa Drelie Gelasca, Touradj Ebrahimi, and Mauro

814

Barni, "Watermarked 3 dmesh quality assessment," IEEE Transactions

815

onMultimedia, vol. 9, no. 2, pp. 247–256, 2007.

816

[57] Anass Nouri, Christophe Charrier, and Olivier Lézoray, "3d blindmesh

817

quality assessment index," Electronic Imaging, vol. 2017, no. 20, pp. 9–26,

818

2017.

819

[58] Guillaume Lavoué, Elisa Drelie Gelasca, Florent Dupont, Atilla Baskurt,

820

and Touradj Ebrahimi, "Perceptually driven 3d distance metrics with

821

application to watermarking," in SPIE Optics Photonics. International

822

Society for Optics and Photonics, 2006, pp. 63120L–63120L.

823

[59] Silva, Samuel, Beatriz Sousa Santos, Carlos Ferreira, and Joaquim

824

Madeira. "A perceptual data repository for polygonal meshes." In 2009

825

Second International Conference in Visualisation, pp. 207-212. IEEE,

826

2009.

827

[60] Hammond, David K., Pierre Vandergheynst, and Rémi Gribonval.

828

"Wavelets on graphs via spectral graph theory." Applied and

Computa-829

tional Harmonic Analysis 30, no. 2 (2011): 129-150.

830

[61] Lee, Chang Ha, Amitabh Varshney, and David W. Jacobs. "Mesh saliency."

831

In ACM SIGGRAPH 2005 Papers, pp. 659-666. 2005.

832

[62] Abouelaziz, Ilyass, Aladine Chetouani, Mohammed El Hassouni, Longin

833

Jan Latecki, and Hocine Cherifi. "3D visual saliency and convolutional

834

neural network for blind mesh quality assessment." Neural Computing and

835

Applications (2019): 1-15.

836

[63] Abouelaziz, Ilyass, Aladine Chetouani, Mohammed El Hassouni, Longin

837

Jan Latecki, and Hocine Cherifi. "No-reference mesh visual quality

assess-838

ment via ensemble of convolutional neural networks and compact

multi-839

linear pooling." Pattern Recognition 100 (2020): 107174.

840

[64] Wang, Zhou, and Alan C. Bovik. "Modern image quality assessment."

841

Synthesis Lectures on Image, Video, and Multimedia Processing 2.1

842

(2006): 1-156.

843

[65] Engeldrum, Peter G. "Psychometric scaling: avoiding the pitfalls and

844

hazards." PICS. 2001.

845

[66] Kim, Min-Su, Sébastien Valette, Ho-Youl Jung, and Rémy Prost.

"Wa-846

termarking of 3D irregular meshes based on wavelet multiresolution

847

analysis." In International Workshop on Digital Watermarking, pp.

313-848

324. Springer, Berlin, Heidelberg, 2005.

849

[67] Lmaati, Elmustapha Ait, Ahmed El Oirrak, Mohammed Najib Kaddioui,

850

Abdellah Ait Ouahman, and Mohammed Sadgal. "3D Model Retrieval

851

Based on 3D Discrete Cosine Transform." int. Arab J. Inf. Technol. 7, no.

852

3 (2010): 264-270.

853

[68] Caissard, Thomas, David Coeurjolly, Jacques-Olivier Lachaud, and

Tris-854

tan Roussillon. "Laplace–beltrami operator on digital surfaces." Journal of

855

Mathematical Imaging and Vision 61, no. 3 (2019): 359-379.

856

[69] Antkowiak, Jochen, T. D. F. Jamal Baina, France Vittorio Baroncini, Noel

857

Chateau, France FranceTelecom, Antonio Claudio França Pessoa, F. U.

858

B. Stephanie Colonnese, Italy Laura Contin, Jorge Caviedes, and France

859

Philips. "Final report from the video quality experts group on the validation

860

of objective models of video quality assessment march 2000." (2000).

861

[70] TRP, I. "Methods metrics and procedures for statistical evaluation

quali-862

fication and comparison of objective quality prediction models." (2012):

863

1401.

864

ILYASS ABOUELAZIZ received his Ph.D. in

865

computer vision from the Mohammed V

Univer-866

sity in Rabat - Morocco. His research is about 3D

867

mesh visual quality assessment and machine

learn-868

ing. He started his research in December 2014

869

in the Computer Science & Telecommunications

870

Laboratory Research "Laboratoire de recherche en

871

informatique et telecommunicatins (LRIT)". He

872

received his master’s degree in computer science

873

and telecommunications in 2014 at the same

Uni-874

versity. His research focuses on image and video processing, visual saliency,

875

machine learning, and mainly on mesh visual quality assessment.

876 877

ALADINE CHETOUANI received his master’s

878

degree in computer science from the University

879

Pierre and Marie Curie, France, in 2005, and his

880

Ph.D. degree in image processing from the

Uni-881

versity of Paris 13, France, in 2010. From 2010

882

to 2011, he was a postdoctoral researcher with the

883

L2TI Laboratory, Paris 13 University. He is

cur-884

rently an associate professor with the Laboratory

885

PRISME, Orleans, France. He also received the

886

Habilitation degree from Université d’Orléans in

887

2020. He serves as a reviewer for major conferences and journals in the field

888

of image analysis and pattern recognition. He co-edited different Special

889

Issues. He is co-author of more than 100 research publications. His present

890

interests are in image quality, perceptual analysis, visual attention, and image

891

processing for cultural heritage using deep learning models.