Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identifier 10.1109/ACCESS.2017.DOI
Learning graph convolutional network
for blind mesh visual quality assessment
ILYASS ABOUELAZIZ1, ALADINE CHETOUANI2, MOHAMMED EL HASSOUNI3, HOCINE
CHERIFI4, LONGIN JAN LATECKI5,
1L@bisen, Yncrea Ouest, 20 rue Cuirassé Bretagne, 29200, Brest, France (e-mail: [email protected] ) 2Laboratoire PRISME, Université d’Orléans, Orleans, France (e-mail: [email protected] )
3FLSH, Mohammed V University in Rabat, Rabat, Morocco(e-mail: [email protected] ) 4LIB EA 7534, University of Burgundy, Dijon, France (e-mail: [email protected] )
5Department of Computer and Information Sciences, Temple University, Philadelphia, USA (e-mail: [email protected] )
Corresponding author: Ilyass Abouelaziz (e-mail: [email protected]).
ABSTRACT
This paper proposes a new method for blind mesh visual quality assessment (MVQA) based on a graph convolutional network. For that, we address the node classification problem to predict the perceived visual quality. First, two matrices represent the 3D mesh: a graph adjacency matrix and a feature matrix. Both matrices are used as input to a shallow graph convolutional network. The network consists of two convolutional layers followed by a max-pooling layer to provide the final feature representation. With this structure, the Softmax classifier predicts the quality score category without the reference mesh’s availability. Experiments are conducted on four publicly available databases constructed explicitly for the mesh quality assessment task. We investigate several perceptual and visual features to select the most effective combination. Comparisons with the state-of-the-art alternative methods show the effectiveness of the proposed framework.
INDEX TERMS Mesh visual quality assessment, graph convolutional networks, mesh graph representa-tion, geometric attributes.
I. INTRODUCTION
1
R
ECENTLY, the issue of perceptual 3D MVQA has 2become an essential field of study since 3D models 3
are widely used in a diversity of applications. A 3D object 4
can undergo several geometric transformations resulting in 5
various distortion types that affect the object’s perceived 6
quality. The perceived visual quality can be estimated by 7
human observers (subjective visual quality assessment) or 8
by implementing automatic metrics that try to mimic an 9
ideal human observer (objective visual quality assessment) 10
[20]–[23]. One can classify objective methods according to 11
the availability of the reference object. In Full-Reference 12
methods (FR), the original mesh is known [3]–[7]. In No-13
Reference (NR) or Blind methods, no information about 14
the reference is available [8]–[12]. Finally, in Reduced-15
Reference (RR) methods only part of the reference is avail-16
able (i.e., features extracted from the reference) [13]–[17]. 17
Most quality evaluation methods rely on the quality pre-18
diction technical tool and the feature type such as geometric 19
(angles, curvature), topological (mesh connection), spatial 20
(vertices and edges), frequency (Wavelet Transform [66], 21
Discrete cosine transform [67]), spectral (Laplace-Beltrami 22
operator [68]), etc. 23
One can distinguish two main categories of methods for 24
mesh quality assessment. The first category is based on 25
logistic regression to estimate the quality scores. A plethora 26
of such methods have been proposed in the literature and 27
have shown good performances [18]–[20], [25], [27]–[30]. 28
The second category relies on machine learning to predict 29
quality [31], [32]. More precisely, quality score estimation 30
is tackled into a classification context, or eventually, re-31
gression or regression by classification. In a previous work, 32
we have been the first to investigate Convolutional Neural 33
Network (CNN) as one of the most recent and relevant deep 34
learning tools to estimate the perceived visual quality of 3D 35
meshes [33]. 36
Intuitively, to handle simply CNN in our context, 2D im-37
ages are rendered from multiple views of the 3D mesh. Then, 38
each image is split into small patches, which are learned 39
[24], 3D visual saliency is adopted in [62] to select the most 41
relevant regions with high saliency level before using patches 42
from these regions to feed the CNN. This method shows good 43
performances and successfully predicts the perceived visual 44
quality. However, the perceptual mechanisms applied to the 45
images generated from the 3D data make the evaluation view-46
dependent. As demonstrated by Rogowitz and Rushmeier 47
[35], the main problem of the image-based metrics is that, 48
in general, the perceived degradation of still images may 49
not be adequate to evaluate the perceived degradation of the 50
equivalent 3D model. Recently, a new blind method using 51
a set of spatial and spectral features has been proposed 52
by Lin et al. [37]. Inspired by Graph Signal Processing 53
(GSP) theory, features are extracted from the Graph Fourier 54
Transform (GFT) derived from the adjacency matrix, and 55
combined with the spatial features. The quality score predic-56
tion is estimated by random forest regression. The method 57
achieves good performances and competitive quality scores. 58
Unfortunately, it relies only on spatial and spectral features. 59
It does not incorporate perceptual information, which is an 60
essential feature to assess the mesh visual quality. However, 61
several transforms can be adopted instead of GFT, or to get 62
straight to the point, graph-based deep learning solutions can 63
be a new lead in 3D MVQA. 64
In this context, we propose here a model-based method 65
using a shallow Graph Convolutional Network (GCN) to 66
process the 3D model itself directly. GCN has been mainly 67
applied for node classification tasks in which the convolution 68
representation vector for a node function is the only feature 69
to classify that node. For the mesh quality assessment task, 70
the 3D model is represented by a graph, and the graph-71
based convolution vector of nodes can be used to predict 72
the perceived visual quality. Although graph and hypergraph 73
representations have been widely used in image and video 74
processing [38]–[44]. To the best of our knowledge, it is the 75
first work where GCN is used to estimate the quality of 3D 76
meshes. 77
Our contributions are summarized as follows: 78
• Development of a model-based method to blindly assess 79
the perceived visual quality of a distorted mesh. 80
• Use a shallow GCN model that predicts the quality from 81
a graph representation of the 3D mesh and handcrafted 82
features. 83
• In-depth analysis of the performance contribution of 84
the considered handcrafted features, including visual 85
saliency features. 86
The remainder of this paper is organized as follows: In 87
Section II, we introduce the required important notions. We 88
give in Section III a detailed description of the proposed 89
method using the GCN. In Section IV, we present the ex-90
perimental setup. Results of a comparative evaluation are 91
presented in Section V. Finally, we conclude and give some 92
perspectives in Section. VI. 93
II. BACKGROUND
94
In this section, we provide a brief introduction to the required 95
background including the CNNs and essential notions about 96
graph theory. 97
A. GRAPH
98
• Graphs: A graph G(V, E) is defined by a set of vertices 99
V = {v1, v2, ..., vn} and a set of edges E ⊆ V × V . 100
We denote n and m the number of vertices and edges, 101
respectively. 102
• Adjacency matrix: The adjacency matrix A of size 103
n × n representing a graph is defined as Ai,j = 1 if the 104
vertices viand vjare connected by an edge (i.e. adjacent 105
vertices), otherwise Ai,j= 0. Each node and edge may 106
have attribute values which are considered as features 107
of the graph. The term attribute value is used instead 108
of label to make the distinction with the concept of 109
labeling in graph-theory. A walk is a sequence of nodes 110
in a graph, in which consecutive nodes are connected 111
by an edge. A path is a walk with distinct nodes. We 112
denote d(u; v) the distance between u and v, that is, the 113
length of the shortest path between u and v. N1(v) is 114
the 1-neighborhood of a node, that is, all nodes that are 115
adjacent to v. 116
• Graph Laplacian: The graph Laplacian operator L is 117
defined as L = D − A, where A is the adjacency matrix 118
and D is the degree matrix with Dii =PjAij. For a 119
graph ascending from a regular mesh, the graph Lapla-120
cian corresponds to the standard stencil approximation 121
of the continuous Laplacian. A normalized form of the 122
Laplacian is used and is defined as follows: 123
Lnorm= D−1/2LD−1/2= I − D−1/2AD−1/2 (1) It is noteworthy that the eigenvectors of L and Lnorm 124
are not similar since the two matrices are different. 125
B. FROM CONVOLUTIONAL NEURAL NETWORKS TO
126
GRAPH CONVOLUTIONAL NETWORKS
127
Convolutional neural networks offer an efficient architecture 128
to extract significant statistical patterns in large-scale and 129
high-dimensional datasets. The ability of CNNs to learn 130
local stationary structures and compose them to form multi-131
scale hierarchical patterns has led to breakthroughs in image, 132
video, and sound recognition tasks [48]. Precisely, CNNs 133
extract the local stationarity property of the input data or 134
signals by revealing local features shared across the data 135
domain. These similar features are identified with localized 136
convolutional filters or kernels, which are learned from the 137
data. Convolutional filters are shift- or translation-invariant 138
filters. In other words, they are able to recognize identical 139
features independently of their spatial locations. Localized 140
kernels or compactly supported filters refer to filters that 141
extract local features independently of the input data size, 142
with a support size. The latter can be much smaller than 143
Conv oluti on 1 Convolut ion 2 Max -pooli ng Softmax Excellent Excellent Excellent Excellent Excellent Input Data
A) Learning data preparation
B) Graph convolutional network
FIGURE 1. The overall scheme of the proposed method.
many researchers. They have been successfully employed in 145
various computer vision applications allowing them to reach 146
high performances. One of their main advantages over classi-147
cal neural networks is that they adequately consider the input 148
data’s spatial structure. Moreover, CNNs allow the critical 149
property of weights sharing between the convolutional layers, 150
which restricts the number of parameters to learn. In quality 151
assessment, using CNNs has shown notable improvement in 152
correlation with human judgment. Some datasets can natu-153
rally be modeled as scalar data defined on the vertices of 154
graphs [36]. For example, computer networks, transporta-155
tion (road, rail, airplane) networks, or social networks can 156
be described by graphs, with the vertices corresponding to 157
individual computers, cities, or people, respectively. These 158
data can be structured with graphs, which are universal rep-159
resentations of heterogeneous pairwise relationships. Graphs 160
can encode complex geometric structures and be studied with 161
strong mathematical tools such as spectral graph theory [50]. 162
A generalization of CNNs to graphs is not straightforward 163
as the convolution and pooling operators are only defined 164
for regular grids. It makes the extension a challenging issue, 165
theoretically and implementation-wise. The major bottleneck 166
of generalizing CNNs to graphs is the definition of localized 167
graph filters, which are efficient to evaluate and learn. 168
III. PROPOSED METHOD
169
Fig. 1 presents the block diagram of the proposed method. 170
The first module refers to the learning data preparation. It 171
consists of converting the 3D mesh to a graph representation. 172
Then, the adjacency matrix, which indicates whether pairs 173
of nodes are connected or not, is computed. In parallel, hand-174
crafted features are extracted to represent multiple perceptual 175
aspects of the 3D mesh. Both the features extracted and the 176
adjacency matrix are used as input data for a GCN to estimate 177
the perceived quality based on deformation level. 178
A. LEARNING DATA PREPARATION
179
The first step of the proposed method relies on preparing 180
the learning data. It consists of a graph represented by an 181
adjacency matrix A of size n × n and a n × p feature matrix, 182
with p is the number of features per node. 183
1) Graph network representation and adjacency matrix
184
The proposed framework’s main idea is to use graphs to rep-185
resent the 3D model (Triangular mesh). This representation 186
allows us to take advantage of graph properties to manipulate 187
the mesh itself and conceive a model-based method to assess 188
the visual quality blindly. Triangular meshes are a set of 189
nodes connected by edges to form triangular faces. A graph is 190
a data structure consisting of vertices and edges. Each vertex 191
is linked to other vertices by an edge, so-called neighbors. 192
There are different ways to represent a graph. In this work, 193
it is characterized by an adjacency matrix as discussed in 194
Sec. II-A. Fig. 2 depicts the graph representation of a 3D 195
mesh and its corresponding adjacency matrix. 196
2) Handcrafted features
197
We use three geometric features: curvature, angles, and 198
Laplacian of Gaussian curvatureand one perceptual feature: 199
Saliency. 200
• Curvature describes the deviation of the surface from 201
being flat. In this work, we use the maximum curvature 202
amplitude Cmaxand the minimum curvature amplitude 203
Cmin. The mean curvature Cmean= (Cmax+Cmin)/2 204
1 2 4 3 5 7 6 8 10 9 1 2 3 4 5 6 7 8 910 … n 1 0 1 1 1 0 0 1 0 0 1 … 2 1 0 1 1 1 0 0 0 0 0 … 3 1 1 0 0 1 1 1 0 0 0 … 4 1 1 0 0 0 0 0 0 0 1 … 5 0 1 1 0 0 1 0 0 0 0 … 6 0 0 1 0 1 0 1 0 1 0 … 7 1 0 1 0 0 1 0 0 0 1 … 8 0 0 0 0 0 0 0 0 0 1 … 9 0 0 0 0 0 1 0 0 0 0 … 10 1 0 0 1 0 0 1 1 0 0 … 0 n 0 3D object Graph network Adjacency matrix
FIGURE 2. Graph representation and adjacency matrix from 3D object.
Cmin are also computed. In the next, all the curvature 206
features will be considered as one feature labeled by C. 207
• Laplacian of Gaussian curvature corresponds to the 208
Laplacian operator applied to the Gaussian curvature 209
field. This feature is labeled here as LoG. 210
• Angle is computed for each edge and corresponds to 211
the angle between normal vectors of its adjacent faces. 212
It represents the structural aspect of the mesh. The 213
computed angle values are averaged to obtain a scalar 214
value for each vertex. This feature is labeled here as An. 215
• Saliency is a perceptual concept that describes the atten-216
tion of the Human Visual System to some regions due 217
to their specificity’s (curvature, orientation, and so on). 218
In this work, we use the mesh visual saliency method 219
proposed in [61]. This feature is labeled here as S. 220
Fig. 3 shows the used features described above. Mean 221
and Gaussian curvature are depicted separately to give more 222
visibility about the difference between them. 223 Gaussian curvature Curvature Saliency Dihedral angle Laplacian of gaussian C Cgauss An LoG S 3D object
Feature vector of size n
Feature matrix of size f * n
n
f
FIGURE 3. Handcrafted features preparation.
B. GRAPH CONVOLUTIONAL NETWORK
224
In this section, we provide a detailed description of the GCN 225
implemented in the proposed method f (X, A), where A is an 226
adjacency matrix and X is a feature matrix. 227
A multi-layer GCN is used with the layer-wise propagation 228
rule defined as: 229 H(l+1) = σ( eD−1/2A eeD−1/2H(l)W(l)) (2) with 230 e A = A + IN (3) and 231 e Dii = X j e Aij (4)
and σ(.) is the activation function defined as ReLU(.) = 232
max(0, .). eA is the adjacency matrix of the undirected graph. 233
IN is the identity matrix. W(l) is a trainable weight matrix. 234
H(l) ∈ RN ×D is the matrix of activations in the lthlayer; 235
H(0) = X. This rule is motivated by an approximation 236
presented in [60]. 237
We note that the multiplication with A means that, for every 238
node, we sum up all the feature vectors of all neighboring 239
nodes but not the node itself (unless there are self-loops in 240
the graph). This is fixed by enforcing self-loops in the graph 241
by considering eA (adding the identity matrix to A). 242
1) Graph convolution
243
Let G = {V, E} denotes the graph representing the distorted 244
mesh. vi∈ V and (vi, vj) ∈ E are the sets of nodes and edges 245
of G, respectively. The graph convolution is the process of 246
filtering a signal x ∈ RN with a filter fθdefined as follows: 247 Y = fθ∗ x = θ(IN + D− 1 2AD− 1 2)x (5)
Using a deep model, the operator IN + D− 1
2AD−12 may
248
cause numerical instabilities. Thus, a normalization is intro-249 duced 250 IN + D− 1 2AD−12 → eD−12A eeD−12 (6)
Finally, the generalized graph convolution is defined as 251
follows: 252
Y = eD−12A eeD−12XΘ (7) where X ∈ RN ×Pis the feature matrix of size N ×P with N 253
the number of nodes and P the number of features for each 254
matrix of filter parameters and Y ∈ RN ×F is the convolved 256
matrix. 257
258
Convolution is a key process in a graph convolutional 259
network, allowing to highlight local receptive features. When 260
applying convolution on graphs, two important steps are 261
considered: 262
• Determine the neighboring nodes around each given 263
graph node for the convolution process to build locally 264
connected subsets of the global graph. These subsets can 265
be obtained using search strategies. We use the breadth-266
first search to enlarge the neighborhood nodes on the 267
graph. The breadth-first search crosses an entire level of 268
child nodes first, before passing through the grandchil-269
dren nodes. In addition, it goes through a whole level 270
of grandchildren nodes before passing through great 271
grandchildren nodes and so on ... 272
Fig. 4(a) shows the graph representing the 3D object and 273
Fig. 4(b) shows the neighborhood sets of each source 274
node. 275
• Arrange the order of execution of the convolution pro-276
cess on each set. Indeed, it is crucial to reasonably sort 277
the nodes in the subsets to ensure that the same rules 278
are used to convolve the elements thus convolution can 279
better activate features for each node. To this end, the 280
L2 similarity method is used. To do so, we use nodes 281
features to compute the similarity between them. The 282
neighbor node with the higher level of similarity is ex-283
ecuted first. For example, in the convolution order pro-284
vided in Fig. 4(c), node 3 is the most similar to 6, then, 285
we find node 9, and finally node 1 is the least similar. We 286
can denote this by: sim(3, 6) > sim(9, 6) > sim(9, 1). 287
Thus, the order of execution in the first subset is 3-9-1. 288
This process is repeated for each subset in the graph. 289
Fig. 4(c) shows the final convolution order for each 290 subset. 291 8 2 1 7 6 9 3 4 5 8 2 1 7 6 9 3 4 5 3 9 1 6 4 8 8 2 6 6 9 1
(a) 3D shape presented by a graph
(b) Subgraphs: sets of nodes neighborhood
(c) Convolution order
FIGURE 4. Convolution on graph. (a) 3D object represented by a graph; (b) subsets searched by a breadth-first search (c) Convolution order of each subset.
2) Graph coarsening and pooling
292
The pooling operation requires meaningful neighborhoods on 293
graphs, where similar vertices are clustered together. Doing 294
this for multiple layers is equivalent to a multi-scale clus-295
tering of the graph that preserves local geometric structures. 296
There exist many clustering techniques, we are interested 297
here in multilevel clustering algorithms where each level 298
produces a coarser graph. In this work, we make use of the 299
coarsening phase of the Graclus multilevel clustering algo-300
rithm [45]. It uses a greedy algorithm to compute successive 301
coarser versions of a given graph. 302
For each coarsening level Fig. 5(a), the algorithm picks and 303
matchs an unmarked vertex vi with one of its unmarked 304
neighbors vj. These latter are chosen because this maximizes 305
the local normalized cut defined as Aij(1/di+1/dj). A is the 306
adjacency matrix, di and dj are the weights corresponding 307
to the vertices viand vjrespectively. After that, the vertices 308
are marked, and the coarsened weights correspond to the 309
sum of their weights. This operation is performed for all 310
the nodes in the graph. The coarsening operation allows 311
dividing the number of nodes by 2. As a result, the coarsened 312
vertices and the original vertices are not arranged. However, 313
it is possible to arrange the vertices, so the graph pooling 314
operation becomes efficient. We note that, for a pooling of 315
size 4, two coarsening levels of size 2 are needed. Fig. 5 316
illustrates an example of graph pooling. 317
After coarsening, each node has either two children, if it was 318
matched at the finer level, or one, if it was not, i.e the node 319
was a singleton. From the coarsest to finest level, fake nodes, 320
i.e. disconnected nodes, are added to pair with the singletons 321
such that each node has two children. Let us carry out a max-322
pooling of size 4 on the finest G0 graph given as input of 323
size n0= 12. Two coarsening levels of size 2 are needed: let 324
Graclus gives G1of size n1 = 6 and G2of size n2 = 3 the 325
coarsest graph. Fake nodes (in red) are added to G1(1 node) 326
and G1(4 nodes) to pair with the singletons (in yellow), such 327
that each node has exactly two children. 328 2 1 5 9 6 3 4 8 10 1 3 5 6 4 2 1 2 3 1 2 3 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11 12 11 7 12 G0 G1 G2 (a) Coarsening (b) Pooling
FIGURE 5. Graph coarsening and pooling.
3) Node classification
329
In the following, we consider a two-layer GCN for node 330
classification on a graph with a symmetric adjacency matrix 331
A. We first calculate ˆA = ˜D−1/2A ˜˜D−1/2in a pre-processing 332
step. The forward model then takes the simple form: 333
Here, W0∈ RP ×H is an input-to-hidden weight matrix for a 334
hidden layer with H feature maps. W1∈ RH×F is a hidden-335
to-output weight matrix. The softmax activation function, 336 defined as: 337 softmax(xi) = 1 zexp(xi) (9) with z = P
iexp(xi) is applied row-wise. For multiclass 338
classification, we then evaluate the cross-entropy error over 339
all labeled examples: 340 L = X l∈YL F X f =1 Ylfln Zlf (10)
where YLis the set of node indices that have labels, and F is 341
the number of filters. Zlf is the result of the model for each 342
node l and each filter f . 343
The neural network weights W(0)and W(1)are trained using 344
gradient descent. 345
C. TRAINING AND QUALITY SCORE ESTIMATION
346
To estimate the visual quality of a distorted mesh, we rely 347
on a classification process. We consider five classes based on 348
the subjective Mean Opinion Score (MOS) for each database 349
(very bad, bad, medium, good, and excellent quality). 350
The training process follows the leave-one-out cross-351
validation (LOOCV) according to the following process: 352
• We build a training regression model using all the exist-353
ing 3D objects in the repository except one object and 354
its distorted versions. 355
• The excluded subset is then used for the test using the 356
constructed training model. 357
• The process is repeated for each 3D object in the repos-358
itory. 359
It is noteworthy that we generate one training model for each 360
excluded object. In addition, for the cross-dataset evaluation 361
( Section. V-C) we train one model using one database and 362
the other databases are used for the test. 363
The negative log- following the model described in likeli-364
hood is minimized over the training set using the Stochastic 365
Gradient Descent (SGD). The back-propagation algorithm is 366
used to compute the gradients. At the same time, dropout 367
is employed to avoid over-fitting. For the test phase, the 368
excluded objects serve as a test set, and the trained network 369
classifies the given distorted meshes according to their visual 370
quality. Finally, comparisons with the subjective labels are 371
performed using correlation measurements. 372
IV. EXPERIMENTAL SETUP
373
In the following, we describe the experimental protocol, 374
including the evaluation criteria. We also report a brief de-375
scription of the databases. 376
A. DATASETS
377
The goal of MVQA is to provide quality predictions corre-378
lated with human observer’s opinions. Therefore, a dataset 379
of distorted meshes graded by human observers is needed 380
to evaluate the algorithms. As the meshes geometric aspect 381
significantly influences the evaluation process, care must be 382
taken when choosing the dataset. It must contain meshes 383
that reflect adequate diversity in their content, and generated 384
distortions should reflect a broad range of mesh degradation. 385
To comply with this argument, four datasets are used in the 386
experiments. These databases, specially designed for quality 387
metrics evaluation, are made of original and distorted mesh. 388
Furthermore, the MOS values are provided for each dataset. 389
Fig. 6 shows the reference objects of the four databases 390
briefly described below. 391
(a)
(b)
(c)
(d)
FIGURE 6. The reference models from: the LIRIS masking database (a), the general-purpose database (b) and the UWB compression database (c)and the IEETA simplification database (d).
• LIRIS/EPFL General-Purpose database [46]: This 392
database, created at the Ecole Polytechnique Federale 393
de Lausanne, contains four reference meshes and 84 394
distorted models (88 models in total). Two types of 395
distortion are applied, smoothing and noise addition, 396
either locally or globally, on the reference mesh. 397
• LIRIS Masking database [47]: This database, originally 398
from the University of Lyon, consists of four refer-399
ence meshes and 24 distorted models with local noise 400
addition. This database’s specific objective is to test 401
the ability of MVQA methods in capturing the visual 402
masking effect. 403
• UWB compression database [2] includes five reference 404
models and 63 distorted models. From Twelve to thir-405
teen distorted versions, obtained by a compression al-406
gorithm, are associated with each reference model. 407
• The IEETA simplification database [59] comprises 35 408
models: five reference meshes and six simplified models 409
for each reference mesh. The simplified models are 410
obtained using three simplification algorithms with two 411
Even though some reference objects are the same in some 413
databases, their distortions are completely different regarding 414
the type and the level of distortions, which provides an 415
important diversity to evaluate MVQ assessment methods. 416
For instance, we have two similar objects (Armadillo and 417
Dyno ) in the LIRIS masking database Fig. 6 (a) and the 418
general-purpose database Fig. 6 (b). In the first database, 419
distortions are obtained by adding only local noise. However, 420
in the second database, distortions are obtained by smoothing 421
and adding global noise with different levels. 422
B. VALIDATION PROTOCOL
423
The correlation between the scores of an objective MVQ 424
method and the MOS provided by human subjects allows 425
measuring its performance. Two types of correlation coef-426
ficients are commonly used. The Pearson linear correlation 427
coefficient (rp), which measures the prediction accuracy, and 428
the Spearman rank-order correlation coefficient (rs), which 429
measures the prediction monotonicity [64]. The Spearman 430
rank-order correlation coefficient (rp) depends only on the 431
rank of the models objective scores. If the rank of objective 432
scores is similar to the rank of MOSs, a high correlation value 433
is obtained, regardless of the distance between the objec-434
tive score and the corresponding MOS. For both criteria, a 435
higher value indicates better prediction performance. These 436
measures are defined as follows: 437
rp=
Pn
i=1(Qsi− ¯Qs)(MOSi−MOS)¯ q
Pn
i=1(Qsi− ¯Qs)2 q
Pn
i=1(MOSi−MOS)¯ 2 (11)
rs= 1 − Pn
i=1(rank(MOSi) − rank(Qsi))2
n(n2− 1) (12)
where n denotes the numbers of distortions in a given 438
database. The mean opinion scores provided in the database 439
are defined by MOSi, and Qsiis the objective quality score 440
obtained by a given method. 441
442
The relation between the values of the scores obtained 443
by the objective method and the mean opinion scores is 444
non-linear, and they are not easy to interpret by users. It is 445
highly recommended to introduce a psychometric fitting to 446
handle this non-linearity. Several functions can be used to fit 447
the objective scores such as cubic polynomial function [69] 448
and logistic function [70]. Most compared methods in this 449
work used a cumulative Gaussian psychometric function as 450
in [25], [28]. We use the same function to guarantee a fair 451
comparison. The cumulative Gaussian psychometric function 452 [65] is defined by: 453 p(a, b, X) = √1 2π Z ∝ a+bX exp − t 2 2 dt (13)
where X is the objective score, a and b are two parameters 454
to be determined. These parameters are estimated for each 455
database using the objective values of the distorted versions 456
and the corresponding MOS with the help of the curve fitting 457
toolbox under Matlab. 458
V. EXPERIMENTAL RESULTS
459
Here, we report the main results on the influence of the mesh 460
features on objective quality effectiveness. Then, we present 461
the results of a comparative evaluation with the state-of-the-462
art methods using four subjective databases. Finally, a cross-463
database assessment is performed to test the generalization 464
ability of the proposed method. 465
A. INFLUENCE OF THE GEOMETRIC ATTRIBUTES
466
In this section, we evaluate the impact on the performance of 467
the extracted handcrafted features. Several sets used as input 468
to the GCN have been tested. It should be noted that results 469
only take into account three geometric features: curvature 470
(including mean and Gaussian curvature), angles and Lapla-471
cian of Gaussian, and one perceptual attribute (saliency), 472
more features have been investigated before finding the 473
best combination. Here, we provide the results for different 474
feature combinations. The features under test are minimum 475
and maximum directions (Umin and Umax), curvature-based 476
(Cmin, Cmax, Cmean, and Cgauss), Normal (Nor), Saliency 477
(S), Angle (An), and Laplacian of Gaussian (LoG). 478
Table 1 illustrates the main findings. Curvature-based and 479
angle features exhibit good correlation values. Adding the 480
Saliency improves the performance considerably. Adding 481
LoG leads to similar results. Unfortunately, adding the di-482
rections (Umin and Umax) and the normal (Nor) does not 483
improve the results. The dihedral angle is the angle between 484
normal vectors of two adjacent faces. Adding the information 485
of normal vectors does not provide added value since it is 486
already included in the dihedral angle. In the same vein, 487
Umin and Umax present only the directions of minimum and 488
maximum curvature which are already included in the curva-489
ture feature itself. Experimentally, adding these features leads 490
only to redundant information without any added value and 491
the training time increased remarkably. The best performance 492
is obtained using the four features (C + Ang + LoG + S). 493
Indeed, this leads us to the best correlation scores. Therefore, 494
we only rely on Cmin, Cmax, Cmean, Cgauss, Ang, LoG, 495
and S to construct the features matrix. 496
B. COMPARISON WITH THE STATE-OF-THE-ART
497
Here, a comparative analysis is conducted. The proposed 498
method is compared to the state-of-the-art methods including 499
full reference, reduced reference and no reference methods: 500
• Full reference methods: HD [19], RMS [18], MSDM2 501
[20], TPDM [25], Chetouani [26]. 502
TABLE 1. Correlation coefficients rs(%) and rp(%) using different feature combinations on the general-purpose database.
Features Armadillo Dyno Venus Rocker All models
rs rp rs rp rs rp rs rp rs rp C 75.6 78.1 82.1 81.8 88.1 86.0 74.7 73.1 76.5 74.5 C + An 86.3 85.9 88.0 88.6 90.7 91.2 86.9 87.8 83.7 82.9 C + An+ S 90.9 89.1 90.1 88.3 92.7 89.8 87.1 86.0 86.7 85.7 C + An+ LoG 93.8 92.7 88.9 90.5 91.3 91.0 90.7 88.8 87.9 88.6 C + An+ LoG + S 92.4 93.2 90.5 91.0 94.2 93.6 89.5 89.2 90.3 89.8 C + An+ LoG + S + U 90.5 91.4 92.5 92.3 91.4 91.2 88.9 88.4 88.5 87.3
C + An+ LoG + S + U + Nor 91.2 90.1 89.1 88.9 92.8 93.1 88.1 87.0 87.9 86.3
The correlation coefficients values rsand rpon the LIRIS 508
masking, LIRIS/EPFL General-purpose, UWB compression 509
and the IEETA simplification databases are listed in Tables. 2, 510
3, 4, 5, respectively. 511
We note that, to compute the correlation coefficients for a 512
given object, the statistical dependence is computed between 513
predicted classes of all its distorted versions and the corre-514
sponding ground truth classes. To compute the correlation 515
coefficients for the whole database, the statistical dependence 516
is computed between classes of all objects in the repository 517
and their ground truth classes. 518
• The General-purpose database (see Table. 2 ) is the 519
largest MVQ database so far. It contains the highest 520
number of distorted meshes than the other databases 521
(i.e., 84 distorted meshes and a variety of distortion 522
types). On this database, the proposed method shows 523
good performance and provides good correlation coeffi-524
cients (rs= 89.3% and rp= 88.6%). 525
• On the LIRIS masking database (see Table 3), the pro-526
posed method exhibit high Spearman and Pearson corre-527
lation coefficients on the whole corpus (rs= 91.7% and 528
rp = 90.9%). It outperforms the NR methods (BMQI, 529
NRCNN1, and NR-GRNN) and the most effective FR 530
and RR methods. 531
• On the UWB compression database (see Table 4 ), the 532
proposed method is the most efficient in terms of rs 533
score among the NR methods rs = 90.5%. Besides, 534
it provides competitive scores by outperforming many 535
effective FR and RR methods. 536
• On the IEETA simplification database (see Table 5 ), 537
the proposed method provides competitive correlation 538
coefficients (rs = 89.9% and rp = 89.4%). The 539
perceptual methods MSDM2, TPDM, and FMPD are 540
also shown to be effective in this database. 541
The classical measures, HD and RMS, are widely used to 542
estimate the perceived visual quality. These methods are 543
easy to use since a simple geometric distance permits to 544
predict the quality score. However, they generally fail to 545
assess the perceived quality score. It is the reason why their 546
scores are not high enough on all test databases. Another 547
drawback is that they require the same connectivity on the 548
compared meshes. Including perceptual features increase 549
the performances considerably. For instance, MSDM2 and 550
TPDM incorporate perceptual information (mesh curvature). 551
Thus, a better prediction is achieved than the geometric 552
measures as reflected by the correlation coefficient scores. 553
FMPD includes a roughness measure which is an essential 554
feature in mesh processing. The scores provided by this 555
method are very competitive. Indeed, they correctly estimate 556
the perceived quality. 557
Based on these results, the proposed method shows high 558
performance on all subjectively-rated MVQ databases, as 559
proven by its high scores on the individual models and the 560
whole repositories. 561
The method CNNs-CMP indeed provides the highest 562
scores. However, its main drawback is that it uses 2D images 563
rendered from 3D meshes to feed a deep CNN. Moreover, 564
the performance of this method is obtained using three pre-565
trained CNNs, which is time-consuming. Indeed, we use in 566
this work the 3D mesh directly without rendering to feed a 567
GCN. To the best of our knowledge, this is the first method 568
that uses a graph network to evaluate the perceived visual 569
quality. The obtained scores are very promising (exceed or 570
close to 90%), our method outperforms many full reference 571
and reduced reference methods including BMQI, TPDM, 572
3DWPM, and other methods. In addition, It is noteworthy 573
that we used a shallow network to test if graph represen-574
tations are useful for such an application. The proposed 575
network provides competitive results with the least possible 576
resources. 577
C. CROSS DATASET EVALUATION
578
In this section, we investigate the generalization ability of the 579
proposed method to predict the quality. To do so, we perform 580
a cross-database evaluation by training our network on the 581
General-purpose database and using the other databases for 582
the test. We choose this database for the training process be-583
cause it contains the highest number of distorted models and 584
a wide variety of distortion types. Table. 6 shows the evalu-585
ation results. It presents the correlation coefficients of each 586
3D object in the three tested databases (i.e., LIRIS masking, 587
UWB compression, and IEETA simplification) and the scores 588
for the whole repositories. The cross-dataset evaluation is 589
performed to test the generalization ability of a method. In 590
which a network trained on one corpus would be evaluated 591
on a range of out-of-dataset corpora. Instead of evaluating the 592
performance solely based on one dataset or multiple datasets 593
TABLE 2. Correlation coefficients rs(%) and rp(%) of different objective methods on the LIRIS/EPFL general-purpose database.
Type Method Armadillo Dyno Venus Rocker All models
rs rp rs rp rs rp rs rp rs rp Full Reference HD [19] 69.5 30.2 30.9 22.6 1.6 0.8 18.1 5.5 13.8 1.3 RMS [18] 62.7 32.3 0.3 0.0 90.1 77.3 7.3 3.0 26.8 7.9 MSDM2 [20] 81.6 85.3 85.9 85.7 89.3 87.5 89.6 87.2 80.4 81.4 TPDM [25] 84.5 78.8 92.2 89.0 90.6 91.0 92.2 91.4 89.6 89.2 Chetouani [26] 75.7 86.1 90.6 90.0 94.9 95.5 91.4 92.1 88.1 90.9 Reduced Reference 3DWPM1 [56] 65.8 35.7 62.7 35.7 71.6 46.6 87.5 53.2 69.3 38.4 3DWPM2 [56] 74.1 43.1 52.4 19.9 34.8 16.4 37.8 29.9 49.0 24.6 FMPD [28] 75.4 83.2 89.6 88.9 87.5 83.9 88.8 84.7 81.9 83.5 No-Reference NR-SVR [52] 76.8 91.5 78.6 84.1 85.7 88.6 86.2 86.6 81.5 87.8 NR-GRNN [51] 87.1 97.3 91.2 94.1 86.3 85.0 78.6 74.8 86.2 88.7 NR-CNN1 [49] 87.2 84.3 86.4 86.2 92.2 85.6 91.3 85.2 83.6 82.7 BMQI [57] 20.1 - 83.5 - 88.9 - 92.7 - 78.1 -BMQA-GSES [37] 98.7 80 99.2 80.4 98.8 80.1 99.5 99.9 90.5 87.9 CNN-BMQA [62] 89.8 91.4 91.6 92.2 94.6 93.8 91.9 93.9 90.0 92.0 CNNs-CMP [63] 95.8 95.6 93.6 92.9 93.4 91.3 94.5 95.2 92.6 91.3 MVQ-GCN (proposed) 91.8 92.5 87.7 84.5 93.7 91.9 89.6 88.4 89.3 88.6
TABLE 3. Correlation coefficients rs(%) and rp(%) of different objective methods on the LIRIS masking database.
Type Method Armadillo Lion Bimba Dyno All models
rs rp rs rp rs rp rs rp rs rp Full Reference HD [19] 48.6 37.7 71.4 25.1 25.7 7.5 48.6 31.1 26.6 4.1 RMS [18] 65.6 44.6 71.4 23.8 71.4 21.8 71.4 50.3 48.8 17.0 MSDM2 [20] 81.1 88.6 93.5 94.3 96.8 100 95.6 100 87.3 89.6 TPDM [25] 91.4 88.6 88.4 82.9 97.1 100 71.1 100 88.6 90.0 Chetouani [26] 99.0 99.0 83.0 94.0 99.0 99.0 93.0 98.0 93.9 97.8 Reduced Reference 3DWPM1 [56] 58.0 41.8 20.0 9.7 20.0 8.4 66.7 45.3 29.4 10.2 3DWPM2 [56] 48.6 37.9 38.3 22.0 37.1 14.4 71.4 50.1 37.4 18.2 FMPD [28] 94.2 88.6 93.5 94.3 98.9 100 96.9 94.3 80.8 80.2 No-Reference NR-SVR [52] 89.5 84.7 100 96.3 94.2 93.6 94.4 89.7 90.4 91.2 NR-GRNN [51] 82.3 80.5 94.1 97.0 90.2 94.3 78.2 82.3 90.2 82.4 NR-CNN1 [49] 95.2 97.6 89.4 91.6 93.4 98.7 96.3 89.9 88.2 85.4 BMQI [57] 94.3 - 94.3 - 100 - 83.0 - 78.1 -CNN-BMQA [62] 92.4 91.7 92.2 93.1 97.9 97.3 93.4 92.6 91.4 90.8 CNNs-CMP [63] 96.2 95.5 93.1 92.4 92.5 92.8 94.2 94.0 93.2 92.8 MVQ-GCN (proposed) 92.6 91.2 93.5 89.9 88.6 89.8 89.3 88.7 91.7 90.9
model performance from a different point of view. 595
The results of the masking database in Table. 6 (rs= 90.8% 596
and rp = 91.2%) are close to the results in Table. 3 597
(rs = 91.7% and rp = 90.9%) and even better in terms of 598
rp. It is due that this database has the same distortion types 599
as the training dataset (general purpose). For the compression 600
and simplification databases that have unknown objects and 601
distortions, it is quite normal to get slightly lower scores 602
compared to Tables. 4 and 5. In terms of generalization 603
ability, the obtained results ensure a good performance. 604
VI. CONCLUSION
605
In this work, we rely on a shallow graph convolutional 606
network to accurately estimate distorted meshes perceived 607
visual quality. The graph consists of two convolutional lay-608
ers, a pooling layer and a classification layer. The network 609
is fed by a graph represented by an adjacency matrix and 610
a feature matrix containing three geometric (curvature, an-611
gles and Laplacian of Gaussian curvature) and a perceptual 612
feature (Saliency). The proposed method successfully pre-613
dicts distorted meshes visual quality, as proven by the high 614
correlations with human judgment. To study the impact of 615
the network architecture and improve the performances, a 616
TABLE 4. Correlation coefficients rs(%) and rp(%) of different objective methods on the UWB compression database.
Type Method Bunny James Jessy Nissan Helix All models
rs rp rs rp rs rp rs rp rs rp Full Reference HD [19] 34.1 52.2 -16.8 6.8 -23.6 12.5 14.4 23.6 45.1 46.4 10.6 28.3 RMS [18] 34.2 20.9 14.0 10.8 0.0 14.8 17.8 29.7 46.9 44.6 22.0 24.1 MSDM2 [20] 97.4 90.1 82.6 69.2 84.3 63.1 84.4 73.1 98.1 94.7 89.3 78.0 TPDM [25] 95.1 96.5 90.8 73.6 85.8 75.8 82.7 73.4 98.7 95.0 91.5 82.9 Reduced Reference 3DWPM1 [56] 94.7 93.4 77.3 72.3 87.2 89.5 63.6 59.3 98.0 95.2 84.1 81.9 3DWPM2 [56] 96.0 91.2 76.9 65.3 86.9 85.9 56.3 67.6 95.5 94.3 82.3 80.9 FMPD [28] 94.2 89.6 95.3 91.2 63.3 60.0 92.4 77.5 98.4 90.8 88.8 81.8 DAME [2] 96.8 93.4 95.7 93.4 84.4 70.5 93.9 75.3 96.6 95.2 93.5 85.6 No-Reference CNN-BMQA [62] 95.8 91.7 96.2 95.6 92.3 90.5 88.7 84.7 96.7 94.6 90.1 88.3 CNNs-CMP [63] 95.6 94.8 92.5 90.6 92.5 87.1 88.7 89.0 90.7 90.4 90.2 90.9 MVQ-GCN (proposed) 93.4 92.3 90.6 90.9 91.2 91.0 88.6 89.5 90.1 91.2 90.5 87.7
TABLE 5. Correlation coefficients rs(%) and rp(%) of different objective methods on the IEETA simplification database.
Type Method Bones Bunny Head Lung Strange All models
rs rp rs rp rs rp rs rp rs rp Full Reference HD [19] 92.0 94.3 37.8 39.5 72.8 88.6 80.6 88.6 52.3 37.1 50.5 49.4 RMS [18] 86.4 94.3 94.5 77.1 49.6 42.9 89.0 100 90.4 88.6 59.6 70.2 MSDM2 [20] 98.3 94.3 98.1 77.1 88.9 88.6 92.3 60.0 99.0 94.3 89.2 86.7 TPDM [25] 99.0 94.3 98.0 94.3 63.1 65.7 98.6 94.3 98.7 94.3 86.9 88.2 Reduced Reference FMPD [28] 96.0 88.6 98.0 94.3 70.4 65.7 95.5 88.6 96.0 65.7 89.3 87.2 No-Reference CNN-BMQA [62] 96.8 93.7 96.7 90.9 96.4 93.6 94.3 89.6 95.4 93.5 90.4 90.2 CNNs-CMP [63] 91.3 88.9 91.1 92.4 91.8 91.5 95.3 89.4 91.1 88.9 90.6 90.4 MVQ-GCN (proposed) 90.4 89.6 91.0 91.8 90.6 92.1 93.6 92.5 89.1 88.8 89.9 89.4
TABLE 6. Correlation coefficients rs(%) and rp(%) obtained by training on the General-purpose and testing on LIRIS masking, UWB compression and IEETA simplification.
Database Object Scores Database Object Scores Database Object Scores
rs rp rs rp rs rp
LIRIS Masking Armadillo 87.3 85.1 UWB Compression Bunny 84.9 91.0 IEETA Simplification Bones 84.2 81.6
Lion 90.6 87.1 James 88.1 85.3 Bunny 90.1 89.3
Bimba 92.3 93.5 Jessy 83.4 84.6 Head 86.7 80.1
Dyno 90.1 88.7 Nissan 92.7 90.4 Lung 83.6 84.5
Helix 86.5 85.3 Strange 86.1 84.8
All models 90.8 91.2 All models 87.2 86.3 All models 84.8 83.3
network architectures (especially more convolutional layers). 618
A weighted graph could be adopted by including perceptual 619
information to the nodes, such as visual saliency. 620
REFERENCES
621
[1] Cignoni, Paolo, Claudio Rocchini, and Roberto Scopigno. "Metro:
mea-622
suring error on simplified surfaces." In Computer graphics forum, vol. 17,
623
no. 2, pp. 167-174. Oxford, UK and Boston, USA: Blackwell Publishers,
624
1998.
625
[2] Váša, Libor, and Jan Rus. "Dihedral angle mesh error: a fast perception
626
correlated distortion measure for fixed connectivity triangle meshes." In
627
Computer Graphics Forum, vol. 31, no. 5, pp. 1715-1724. Oxford, UK:
628
Blackwell Publishing Ltd, 2012.
629
[3] Yang, Junfeng, Wei Zhang, Xiaolong Li, Ting Zhou, and Bo Ou. "Full
630
Reference Image Quality Assessment by Considering Intra-Block
Struc-631
ture and Inter-Block Texture." IEEE Access 8 (2020): 179702-179715.
632
[4] Meynet, Gabriel, Yana Nehmé, Julie Digne, and Guillaume Lavoué.
633
"PCQM: A full-reference quality metric for colored 3D point clouds." In
634
2020 Twelfth International Conference on Quality of Multimedia
Experi-635
ence (QoMEX), pp. 1-6. IEEE, 2020.
636
[5] Ding, Keyan, Kede Ma, Shiqi Wang, and Eero P. Simoncelli. "Comparison
637
of full-reference image quality models for optimization of image
process-638
ing systems." International Journal of Computer Vision 129, no. 4 (2021):
639
1258-1281.
640
[6] Xu, Munan, Junming Chen, Haiqiang Wang, Shan Liu, Ge Li, and
641
Zhiqiang Bai. "C3DVQA: Full-reference video quality assessment with 3d
642
convolutional neural network." In ICASSP 2020-2020 IEEE International
643
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.
644
4447-4451. IEEE, 2020.
645
[7] Varga, Domonkos. "Composition-preserving deep approach to
full-646
reference image quality assessment." Signal, Image and Video Processing
647
14, no. 6 (2020): 1265-1272.
648
[8] Agarla, Mirko, Luigi Celona, and Raimondo Schettini. "No-Reference
649
Quality Assessment of In-Capture Distorted Videos." Journal of Imaging
650
6, no. 8 (2020): 74.
651
[9] Zhang, Ziliang, Yuming Fang, Jiebin Yan, and Rengang Du.
"No-652
Reference Quality Assessment for Realistic Distorted Images by Color
653
Moment and Texture Features." In 2020 IEEE Conference on Multimedia
654
Information Processing and Retrieval (MIPR), pp. 342-347. IEEE, 2020.
655
[10] Liu, Yutao, and Xiu Li. "No-Reference Quality Assessment for
Contrast-656
Distorted Images." IEEE Access 8 (2020): 84105-84115.
657
[11] Xiaohan Yang, Fan Li, Hantao Liu,”Deep feature importance awareness
based no-reference image quality prediction”, Neurocomputing,Volume
659
401, 2020, pp 209-223
660
[12] Yang, Jiachen, Yang Zhao, Bin Jiang, Wen Lu, and Xinbo Gao.
"No-661
Reference Quality Evaluation of Stereoscopic Video Based on
Spatio-662
Temporal Texture." IEEE Transactions on Multimedia 22, no. 10 (2019):
663
2635-2644.
664
[13] Abbas, Naveed, Tanzila Saba, Siraj Khan, Zahid Mehmood, Amjad
665
Rehman, and Rubby Tabasum. "Reduced Reference Image Quality
Assess-666
ment Technique Based on DWT and Path Integral Local Binary Patterns."
667
Arabian Journal for Science and Engineering 45, no. 4 (2020): 3387-3401.
668
[14] Ma, Jian, Xinxin Zhao, and Youxin Xu. "Reduced-Reference Stereoscopic
669
Image Quality Assessment Based on Entropy of Gradient Primitives." In
670
2020 IEEE 5th International Conference on Signal and Image Processing
671
(ICSIP), pp. 206-209. IEEE, 2020.
672
[15] Viola, Irene, and Pablo Cesar. "A reduced reference metric for visual
673
quality evaluation of point cloud contents." IEEE Signal Processing Letters
674
27 (2020): 1660-1664.
675
[16] Xiongkuo Min, Ke Gu, Guangtao Zhai, Menghan Hu, Xiaokang
676
Yang,”Saliency-induced reduced-reference quality index for natural scene
677
and screen content images”,Signal Processing,Vol 145,2018,pp 127-136
678
[17] Lijuan Tang, Kezheng Sun, Luping Liu, Guangcheng Wang, Yutao Liu,
679
“A reduced-reference quality assessment metric for super-resolution
re-680
constructed images with information gain and texture similarity”, Signal
681
Processing: Image Communication,Volume 79,2019,pp 32-39,
682
[18] Paolo Cignoni, Claudio Rocchini, and Roberto Scopigno, “Metro:
Mea-683
suring error on simplified surfaces,” in Computer Graphics Forum.Wiley
684
Online Library, 1998, vol. 17, pp. 167– 174.
685
[19] Nicolas Aspert, Diego Santa-Cruz, and Touradj Ebrahimi, “Mesh:
Measur-686
ing errors between surfaces using the hausdorff distance,” in Multimedia
687
and Expo, 2002. ICME’02. Proceedings. 2002 IEEE International
Confer-688
ence on. IEEE, 2002, vol. 1, pp. 705–708.
689
[20] Guillaume Lavoué, “A multiscale metric for 3d mesh visual quality
assess-690
ment,” in Computer Graphics Forum. Wiley Online Library, 2011, vol. 30,
691
pp. 1427–1437.
692
[21] Ricardo Pastrana-Vidal, Jean Claude Gicquel and Jean-Louis Blin, and
693
Hocine Cherifi, "Predicting subjective video quality from separated spatial
694
and temporal assessment", Proceedings SPIE Human Vision and
Elec-695
tronic Imaging XI, 2006, Vol6057, 60570S
696
[22] Pastrana-Vidal, Ricardo R and Gicquel, Jean-Charles and Colomes,
697
Catherine and Cherifi, Hocine,”Frame dropping effects on user quality
698
perception”, Proc. 5th Int. WIAMIS,2004
699
[23] El Hassouni, Mohammed, Cherifi, Hocine and Aboutajdine, Driss,
"HOS-700
based image sequence noise removal", IEEE Transactions on Image
Pro-701
cessing,vol15, num 3, pp572–581,2006
702
[24] Chetouani, Aladine, and Leida Li. "On the use of a scanpath predictor and
703
convolutional neural network for blind image quality assessment." Signal
704
Processing: Image Communication 89 (2020): 115963.
705
[25] Fakhri Torkhani, Kai Wang, and Jean-Marc Chassery, “A curvature tensor
706
distance for mesh visual quality assessment,” Computer Vision and
Graph-707
ics, pp. 253–263, 2012.
708
[26] Chetouani, Aladine. "Three-dimensional mesh quality metric with
refer-709
ence based on a support vector regression model." Journal of Electronic
710
Imaging 27.4 (2018): 043048.
711
[27] Massimiliano Corsini, Elisa Drelie Gelasca, Touradj Ebrahimi, and Mauro
712
Barni, “Watermarked 3-d mesh quality assessment,” IEEE Transactions on
713
Multimedia, vol. 9, no. 2, pp. 247–256, 2007.
714
[28] Kai Wang, Fakhri Torkhani, and Annick Montanvert, “A fast
roughness-715
based approach to the assessment of 3D mesh visual quality,” Computers
716
& Graphics, vol. 36, no. 7, pp. 808–818, 2012.
717
[29] Cao, Keming, Yi Xu, and Pamela Cosman. "Visual Quality of Compressed
718
Mesh and Point Cloud Sequences." IEEE Access 8 (2020):
171203-719
171217.
720
[30] Zerman, Emin, Cagri Ozcinar, Pan Gao, and Aljosa Smolic. "Textured
721
mesh vs coloured point cloud: A subjective study for volumetric video
722
compression." In 2020 Twelfth International Conference on Quality of
723
Multimedia Experience (QoMEX), pp. 1-6. IEEE, 2020.
724
[31] Xiang Feng, Wanggen Wan, Richard Yi Da Xu, Stuart Perry, Pengfei Li,
725
Song Zhu,”A novel spatial pooling method for 3D mesh quality assessment
726
based on percentile weighting strategy”, Computers and Graphics,Volume
727
74,2018,pp 12-22
728
[32] Yildiz, Zeynep Cipiloglu, A. Cengiz Oztireli, and Tolga Capin. "A
ma-729
chine learning framework for full-reference 3D shape quality assessment."
730
The Visual Computer 36, no. 1 (2020): 127-139.
731
[33] Abouelaziz, Ilyass, Aladine Chetouani, Mohammed El Hassouni, and
732
Hocine Cherifi, "A blind mesh visual quality assessment method based on
733
convolutional neural network." Electronic Imaging 2018.18 (2018): 1-5.
734
[34] Abouelaziz, Ilyass, Mohammed El Hassouni, and Hocine Cherifi. "A
735
convolutional neural network framework for blind mesh visual quality
as-736
sessment." Image Processing (ICIP), 2017 IEEE International Conference
737
on. IEEE, 2017.
738
[35] Rogowitz, Bernice E., and Holly E. Rushmeier. "Are image quality metrics
739
adequate to evaluate the quality of geometric objects?." In Human Vision
740
and Electronic Imaging VI, vol. 4299, pp. 340-348. International Society
741
for Optics and Photonics, 2001.
742
[36] Wu, Zonghan, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi
743
Zhang, and S. Yu Philip. "A comprehensive survey on graph neural
744
networks." IEEE transactions on neural networks and learning systems
745
(2020).
746
[37] Lin, Yaoyao, Mei Yu, Ken Chen, Gangyi Jiang, Fen Chen, and Zongju
747
Peng. "Blind mesh assessment based on graph spectral entropy and spatial
748
features." Entropy 22, no. 2 (2020): 190.
749
[38] Cheung Gene, Magli Enrico, Tanaka Yuichi and Ng Michael K., "Graph
750
Spectral Image Processing," in Proceedings of the IEEE, vol. 106, no. 5,
751
pp. 907-930, May 2018.
752
[39] Rital Soufiane, Bretto Alain, Cherifi Hocine and Aboutajdine Driss “A
753
combinatorial edge detection algorithm on noisy images,International
754
Symposium on VIPromCom Video/Image Processing and Multimedia
755
Communications,2002 pp351-355,IEEE
756
[40] Rital Soufiane, Cherifi Hocine and Miguet Serge “Weighted
757
adaptive neighborhood hypergraph partitioning for image
758
segmentation”,International Conference on Pattern Recognition and
759
Image Analysi, 2005, PP 522–531, Springer, Berlin, Heidelberg
760
[41] Sanfeliu, Alberto, René Alquézar, J. Andrade, Joan Climent, Francesc
761
Serratosa, and J. Vergés. "Graph-based representations and techniques
762
for image processing and image analysis." Pattern recognition 35, no. 3
763
(2002): 639-650.
764
[42] Yu, Jun, Dacheng Tao, and Meng Wang. "Adaptive hypergraph learning
765
and its application in image classification." IEEE Transactions on Image
766
Processing 21, no. 7 (2012): 3262-3272.
767
[43] Wu, Zonghan, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi
768
Zhang, and S. Yu Philip. "A comprehensive survey on graph neural
769
networks." IEEE transactions on neural networks and learning systems
770
(2020).
771
[44] Lasfar, A., S. Mouline, Driss Aboutajdine, and Hocine Cherifi.
"Content-772
based retrieval in fractal coded image databases." In Proceedings 15th
773
International Conference on Pattern Recognition. ICPR-2000, vol. 1, pp.
774
1031-1034. IEEE, 2000.
775
[45] Dhillon, Inderjit S., Yuqiang Guan, and Brian Kulis. "Weighted graph cuts
776
without eigenvectors a multilevel approach." IEEE transactions on pattern
777
analysis and machine intelligence 29.11 (2007): 1944-1957.
778
[46] Lavoué, Guillaume, Elisa Drelie Gelasca, Florent Dupont, Atilla Baskurt,
779
and Touradj Ebrahimi. "Perceptually driven 3D distance metrics with
780
application to watermarking." In Applications of Digital Image Processing
781
XXIX, vol. 6312, p. 63120L. International Society for Optics and
Photon-782
ics, 2006.
783
[47] Lavoué, Guillaume, Mohamed Chaker Larabi, and Libor Váša. "On the
784
efficiency of image metrics for evaluating the visual quality of 3D models."
785
IEEE Transactions on Visualization and Computer Graphics 22, no. 8
786
(2015): 1987-1999.
787
[48] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning."
788
nature 521, no. 7553 (2015): 436.
789
[49] Abouelaziz, Ilyass, Mohammed El Hassouni, and Hocine Cherifi. "A
790
convolutional neural network framework for blind mesh visual quality
791
assessment." In 2017 IEEE International Conference on Image Processing
792
(ICIP), pp. 755-759. IEEE, 2017.
793
[50] Chung, Fan RK, and Fan Chung Graham. Spectral graph theory. No. 92.
794
American Mathematical Soc., 1997.
795
[51] Abouelaziz I, El Hassouni M, Cherifi H (2016) A curvature based method
796
for blind mesh visual quality assessment using a general regression neural
797
network. In: 2016 12th international conference on signal-image
technol-798
ogy & internet-based systems (SITIS). IEEE, pp 793–797.
799
[52] Abouelaziz, Ilyass, Mohammed El Hassouni, and Hocine Cherifi.
"No-800
reference 3d mesh quality assessment based on dihedral angles model
801
and support vector regression." In International Conference on Image and
802
Signal Processing, pp. 369-377. Springer, Cham, 2016.
803
[53] Zachi Karni and Craig Gotsman, "Spectral compression of mesh
geome-804
try," in Proceedings of the 27th annual conference on Computer graphics
and interactive techniques. ACMPress Addison- 604 Wesley Publishing
806
Co., 2000, pp. 279–286.
807
[54] Olga Sorkine, Daniel Cohen-Or, and Sivan Toledo, "High-pass
quantiza-808
tion for mesh encoding.," in Symposium on Geometry Processing, 2003,
809
vol. 42.
810
[55] Zhe Bian, Shi-Min Hu, and Ralph R Martin, "Evaluation for small visual
811
difference between conforming meshes on strain field," Journal of
Com-812
puter Science and Technology, vol. 24, no. 1, pp. 65–75, 2009.
813
[56] Massimiliano Corsini, Elisa Drelie Gelasca, Touradj Ebrahimi, and Mauro
814
Barni, "Watermarked 3 dmesh quality assessment," IEEE Transactions
815
onMultimedia, vol. 9, no. 2, pp. 247–256, 2007.
816
[57] Anass Nouri, Christophe Charrier, and Olivier Lézoray, "3d blindmesh
817
quality assessment index," Electronic Imaging, vol. 2017, no. 20, pp. 9–26,
818
2017.
819
[58] Guillaume Lavoué, Elisa Drelie Gelasca, Florent Dupont, Atilla Baskurt,
820
and Touradj Ebrahimi, "Perceptually driven 3d distance metrics with
821
application to watermarking," in SPIE Optics Photonics. International
822
Society for Optics and Photonics, 2006, pp. 63120L–63120L.
823
[59] Silva, Samuel, Beatriz Sousa Santos, Carlos Ferreira, and Joaquim
824
Madeira. "A perceptual data repository for polygonal meshes." In 2009
825
Second International Conference in Visualisation, pp. 207-212. IEEE,
826
2009.
827
[60] Hammond, David K., Pierre Vandergheynst, and Rémi Gribonval.
828
"Wavelets on graphs via spectral graph theory." Applied and
Computa-829
tional Harmonic Analysis 30, no. 2 (2011): 129-150.
830
[61] Lee, Chang Ha, Amitabh Varshney, and David W. Jacobs. "Mesh saliency."
831
In ACM SIGGRAPH 2005 Papers, pp. 659-666. 2005.
832
[62] Abouelaziz, Ilyass, Aladine Chetouani, Mohammed El Hassouni, Longin
833
Jan Latecki, and Hocine Cherifi. "3D visual saliency and convolutional
834
neural network for blind mesh quality assessment." Neural Computing and
835
Applications (2019): 1-15.
836
[63] Abouelaziz, Ilyass, Aladine Chetouani, Mohammed El Hassouni, Longin
837
Jan Latecki, and Hocine Cherifi. "No-reference mesh visual quality
assess-838
ment via ensemble of convolutional neural networks and compact
multi-839
linear pooling." Pattern Recognition 100 (2020): 107174.
840
[64] Wang, Zhou, and Alan C. Bovik. "Modern image quality assessment."
841
Synthesis Lectures on Image, Video, and Multimedia Processing 2.1
842
(2006): 1-156.
843
[65] Engeldrum, Peter G. "Psychometric scaling: avoiding the pitfalls and
844
hazards." PICS. 2001.
845
[66] Kim, Min-Su, Sébastien Valette, Ho-Youl Jung, and Rémy Prost.
"Wa-846
termarking of 3D irregular meshes based on wavelet multiresolution
847
analysis." In International Workshop on Digital Watermarking, pp.
313-848
324. Springer, Berlin, Heidelberg, 2005.
849
[67] Lmaati, Elmustapha Ait, Ahmed El Oirrak, Mohammed Najib Kaddioui,
850
Abdellah Ait Ouahman, and Mohammed Sadgal. "3D Model Retrieval
851
Based on 3D Discrete Cosine Transform." int. Arab J. Inf. Technol. 7, no.
852
3 (2010): 264-270.
853
[68] Caissard, Thomas, David Coeurjolly, Jacques-Olivier Lachaud, and
Tris-854
tan Roussillon. "Laplace–beltrami operator on digital surfaces." Journal of
855
Mathematical Imaging and Vision 61, no. 3 (2019): 359-379.
856
[69] Antkowiak, Jochen, T. D. F. Jamal Baina, France Vittorio Baroncini, Noel
857
Chateau, France FranceTelecom, Antonio Claudio França Pessoa, F. U.
858
B. Stephanie Colonnese, Italy Laura Contin, Jorge Caviedes, and France
859
Philips. "Final report from the video quality experts group on the validation
860
of objective models of video quality assessment march 2000." (2000).
861
[70] TRP, I. "Methods metrics and procedures for statistical evaluation
quali-862
fication and comparison of objective quality prediction models." (2012):
863
1401.
864
ILYASS ABOUELAZIZ received his Ph.D. in
865
computer vision from the Mohammed V
Univer-866
sity in Rabat - Morocco. His research is about 3D
867
mesh visual quality assessment and machine
learn-868
ing. He started his research in December 2014
869
in the Computer Science & Telecommunications
870
Laboratory Research "Laboratoire de recherche en
871
informatique et telecommunicatins (LRIT)". He
872
received his master’s degree in computer science
873
and telecommunications in 2014 at the same
Uni-874
versity. His research focuses on image and video processing, visual saliency,
875
machine learning, and mainly on mesh visual quality assessment.
876 877
ALADINE CHETOUANI received his master’s
878
degree in computer science from the University
879
Pierre and Marie Curie, France, in 2005, and his
880
Ph.D. degree in image processing from the
Uni-881
versity of Paris 13, France, in 2010. From 2010
882
to 2011, he was a postdoctoral researcher with the
883
L2TI Laboratory, Paris 13 University. He is
cur-884
rently an associate professor with the Laboratory
885
PRISME, Orleans, France. He also received the
886
Habilitation degree from Université d’Orléans in
887
2020. He serves as a reviewer for major conferences and journals in the field
888
of image analysis and pattern recognition. He co-edited different Special
889
Issues. He is co-author of more than 100 research publications. His present
890
interests are in image quality, perceptual analysis, visual attention, and image
891
processing for cultural heritage using deep learning models.