Toward an interactive system
for checking spatio-temporal
data quality
Dounia Azzi,
Christine Plumejeaud, Marlène Villanova-Oliver, Jérôme Gensel
Laboratory of Informatics of Grenoble (LIG), Grenoble, France
data quality
Overview
•
Context
–
Data quality
–
Examples
•
Qualestim
•
Qualestim
–
Conception
–
Validation
•
Three dimensions
–
Time
: 1950 - 2050
–
Thematic
: Social, economic,
environmental, demographic
–
Space
: from local to world scales
UNEP
A spatio-temporal
information cube
•
Objective :
–
Checking
data quality
geographical information HETEROGENEITY
through space, time, and thematic
•
Metadata
describe data : the identification, the provider, the
lineage, the quality, …
•
ISO 19115
defines a standard for describing geographic information
that can be
adapted to statistical data (profile)
•
Poor
quality report
•
Produce
reports
on data quality : HOW ?
•
Outliers detection
: find the values that don’t look like the
neighborhood
Complete metadata
•
Allow
visualization
of data and metadata in interactive tabs
Average evolution
(high range)
Tools for visualization
and geo-computing
•
Domain analysis
Data
Visualisation
Statistical
Analysis
DB
Connection
Metadata
Visualisation
Sada
[1997]
x
x
GeoDa
[1998]
x
x
•
None of those allow visualization of data and metadata !
CrimStat
[2004]
x
x
QuantumGis
[2002]
x
Grass
[2010]
[R programming]
x
•
Outlier :
An outlying observation, or outlier, is one that appears to deviate
markedly from other members of the sample in which it occurs.
[Grubbs, 69]
•
Geostatistic methods
–
Check abnormal values
–
Spatial, temporal and thematic dimensions
–
Uni/bi/multi-variate
Outlier detection
Univariate Bivariate*/Multivariate Methods
Thematic Spatial Thematic Spatial
–
Uni/bi/multi-variate
•
R
[
http://www.r-project.org/
]
is a well-known tool for
statistics, with spatial
packages
Thematic Spatial Thematic Spatial Standard Boxplot
x
Adjusted Boxplotx
Bagplotx
* Mahalanobis distancex
Principal Components Analysis (PCA)x
Local Regressionx
x
Multiple Linear Regressionx
Geographically Weighted Regressionx
x
Example : Hawkins’s test
f (d)
=
α
2
2
π
−
α
d
e
Exponential
Tobler’s first law of geography:
Everything is related to everything
else, but near things are more
related than distant things.
[Tobler, 1970]
Overview
•
Context
–
Data quality
–
Examples
•
Qualestim
•
Qualestim
–
Conception
–
Validation
Qualestim architecture
SPATIO-TEMPORAL DATABASE
R-stat embeded
in JAVA (JRI)
-spatial analysis
- thematic
Outliers map
Export metadata
on quality with
complete reports
Expert can
visually explore
the data
- thematic
analysis
- time series
Qualestim architecture
SPATIO-TEMPORAL DATABASE
R-stat embeded
in JAVA (JRI)
-spatial analysis
- thematic
Outliers map
Export metadata
on quality with
complete reports
Expert can
visually explore
the data
- thematic
analysis
- time series
ISO 19115 profile
for metadata
[Plumejeaud, 2010]
Dataset Stock Values
Metadata
for:
ISO 19115 profile
for metadata
[Plumejeaud, 2010]
Dataset Stock Values
Metadata
for:
ISO 19115 profile
for metadata
[Plumejeaud, 2010]
Dataset Stock Values
Metadata
for:
Java-R Interface
•
Create
a R virtual machine in the JVM
•
Allow us to
use R objects
to put the data into, like
SpatialPolygonsDataFrame object
•
(method + parameters)
•
(method + parameters)
Model of a quality report
•
(method + parameters)
Model of a quality report
Qualestim
•
Data and metadata visualization
•
Outlier detection
Qualestim
– Soon to come
Qualestim – Soon to come
•
Edition of quality reports
Export
Spatio-temporal
Database
+
Overview
•
Context
–
Data quality
–
Examples
•
Qualestim
•
Qualestim
–
Conception
–
Validation
Outcome
Interactive viewer of data with metadata together
Quality reports are created using outliers detection
methods, and can be viewed/modified
Outcome
Interactive viewer of data with metadata together
Quality reports are created using outliers detection
methods, and can be viewed/modified
Future work
Short term
Short term
Test of the system with users
Integrate pre-computed parameters and suggestions for
parameters (connection with a knowledge database)
Outcome
Interactive viewer of data with metadata together
Quality reports are created using outliers detection
methods, and can be viewed/modified
Future work
Short term
Short term
Test of the system with users
Integrate pre-computed parameters and suggestions for
parameters (connection with a knowledge database)
Long term
END