Technologies and Developments for Earth
Observations Data Analysis and Visualisation
Uttam Kumar
Centre for Ecological Sciences, Indian Institute of Science,
Bangalore- 560012. Email: [email protected] Volumes of Data Mining Information Pattern?
10
th
January, 2013 Agenda
•
Some new Image Classification Techniques
for
handling coarse resolution data
for LCLU
applications.
–
The mixed pixel problem
–
Hybrid Bayesian Classifier
•
Development of Free and Open Source Software:
GRDSS for Geospatial Applications
•
Web Based Application for Geovisualisation
•
Some New Image Classification Techniques for
handling coarse resolution data for LCLU
applications.
–
The mixed pixel problem
–
Hybrid Bayesian Classifier
Time scale
Satellite / Source Sensor Spectral bands Spatial resolution in metres (m) Temporal resolution 1972 – 1999 Landsat -1, 5, and 7 MSS, TM, ETM+ PAN, VIS, NIR, MIR,
TIR 15 m – 120 m (moderate spatial resolution) 16-18 days (free) 1988 – 2010
IRS-1C/1D, P6 PAN, LISS-III PAN, VIS-2, NIR-1 (low spectral resolution) 5.8 m – 23.5 (high to moderate spatial resolution) 24 days (medium cost, moderate temporal resolution) 1999 – Till date
IKONOS OSA PAN, VIS-3, NIR-1 1 m (PAN) 4 m (Others) (high spatial) 1-3 days (costly) : : : : : : : : : : : : 1999 – Till date MODIS (Terra, Aqua) VIS, NIR, MIR, TIR 36 (high spectral resolution) 250 m – 1 km (low spatial resolution) 1-2 days
(free & high temporal resolution) 2002 SRTM (Shuttle Radar Topography Mission) --- DEM-1 90 m 1 time (free) 2002 Radar- Hydro 1K Asia --- Precipitation, Slope, Aspect-1 1 Km 1 time (free)
Develop techniques for deriving information from coarse
spatial resolution data (such as MODIS).
1.) What are the techniques to obtain class proportions from mixed pixels?
2.) What are the ways of identifying/extracting endmembers from the bands?
3.) How to address the mixed pixels when objects reflectance’s are non-linear mixtures in nature?
4.) How can we address the intra-class spectral variation or endmember variability?
5.) How can we predict class abundance’s spatial distribution at sub-pixel resolution within a particular pixel obtained from linear/non-linear mixture models?
n 1 ( , ) . ( , ) N n n y x y e
x y
η n( , )
x y
is a scalar value representing the functional coverage of endmember vector en at pixel y(x, y).Constraints:
Abundance nonnegativity constraint
Abundance sum-to-one constraint
1.) 2.)
y
Eα η
0, : 1
nn
n
N
1 1 N n n
This can be solved in two ways: 1. Ordinary least square
2. Orthogonal subspace projection
Linear unmixing
a
b c
Ordinary Least Squares
• The conventional approach to extract the abundance values is to minimise
||
y Eα
||
T 1 T
(
)
α
E E E y
The Unconstrained Least Squares (ULS) estimate of the abundance is
||yEα||
Gives the Constrained Least Squares (CLS) estimate of the abundance as,
T 1 T ( ) 1 2 α E E E y T T 1 T T T 1 2(1 ( ) 1) 1 (E E) 1 E E E y
Orthogonal Subspace Projection
• The technique involves(i) finding an operator which eliminates undesired spectral signatures, and then (ii) choosing a vector operator which maximises the SNR of the residual spectral
signature
• General linear unmixing equation:
r
=
M
+
n
r = column vector of digital numbers
M = matrix representing target spectral signature α = abundance fraction
n = model error
=
p+
r
d
U
n
the (d, U) model to annihilate U We apply an operator “P” on this modelto the (d, U) model that results in a new signal detection model
#
P
I UU
Where is the pseudo-inverse of # T -1 T U
=( )
p
P
r
P
d
P
U
P
n
On applying P on
r
=
d
p+
U
n
we getP operating on Uγ reduces the contribution of U to about zero
p
P
r
P
d
P
n
we get
On using a linear filter specified by a weight vector xT on the OSP model, the filter output is given by
T T T
x Pr x Pd
p x PnNow, we need to maximize signal to noise ratio (SNR) of the filter output
T 2 T T T x x SNR(x) x E{ } x T p T P P P P
d d nn = 2 T T 2 T x x x x T p T P P PP ddMaximisation of this is a generalized eigenvalue-eigenvector problem
T
x=λ
x
T TP
dd
P
PP
2 pλ=λ(σ /α )
whereThe eigenvector which has the maximum λ is the solution of the problem and it turns out to be d.
One of the eigenvalues is
d
TP
d
and it turns out that the value of xT (filter) which maximizes the SNR is
T T
x
k
d
Applying
d
TP
on T T T p PP PP
PP d r d d d n T T pP
P
d
r
d
d
α is the abundance estimate of the pth target material.
p
P
r
P
d
P
n
applying “P” on Obtained by= p+
FCC of the study area from (a) IKONOS (PAN and MS fused), (b) IKONOS MS, (c) Landsat ETM+ and (d) MODIS.
Remote sensing data sets used for validating CLS and OSP algorithms
Data Spectral bands
Spatial resolution
Dimension 2 classes 3 classes 4 classes
IKONOS PAN and MS fused 4 1 m 8000 x 8000 vegetation, non-vegetation urban, vegetation, water urban, vegetation, water, open area IKONOS 4 4 m 2000 x 2000 vegetation, non-vegetation urban, vegetation, water ---Landsat 6 30 m resampled to 25 m 320 x 320 vegetation, non-vegetation urban, vegetation, water urban, vegetation, water, open area MODIS 7 250 m 32 x 32 vegetation, non-vegetation urban, vegetation, water urban, vegetation, water, open area
Unmixed outputs from CLS and OSP for 2 classes (vegetation, non-vegetation), and 3 classes (urban, vegetation and water) from IKONOS MS data.
Results
•
Correlation and RMSE for IKONOS, Landsat ETM+
and MODIS images for 2, 3 and 4 classes.
Endmember Selection
• The rationale behind the new method is that given the
spectral reflectance of the mixed pixel, if the
proportions of all the endmembers (
n
;
n
= 1 to
N
) in
that pixel are known, then the spectral reflectance of
each endmember that constitute the mixed pixel can be
approximated by inverting the LMM.
~ , m i j y ~ , n i j T -1 T
[
] (
)
E
α α
α y
y
Eα η
The endmember estimate for each band turns out to be
For 1 1 2 2 n n N N 1 1 ~1,1 ~1 ~1,1~1 ~1,1~1 ~1,1~1 ~1,1 ~1,1 1 1 2 2 n n N N 1 1 ~1,2 ~1 ~1,2 ~1 ~1,2 ~1 ~1,2 ~1 ~1,2 ~1,2 1 1 2 2 n n N N 1 1 ~i, j ~1 ~i, j ~1 ~i, j ~1 ~i, j ~1 ~i, j ~i, j ~ r ... ... ... ... : ... ... : e e e e y e e e e y e e e e y 1 1 2 2 n n N N 1 1 ~ ~ ~ ~ ~ ~ ~ ,c 1 r,c 1 r,c 1 r,c 1 ~r,c ~r,c ... ... e e e e y ~ , ~ ~ ~ , 1 ,
(
)
N m n n m i j m i j i j ny
e
1 1 2 1 1,1 1,1 1,1 1,1 1 1 1 2 2 1,2 1,2 1,2 1 1,2 1 2 1 1 , , , , ... ... : : : : : : ... N N N N r c r c r c r c y e y e e y : : : : : : : : : : : : : : : T -1 T[
] (
)
E
α α
α y
This is done for each
band separately.
To compare the performance of PBEE, three endmember identification methods were used:
1. a fully automatic endmember extraction technique – N-FINDR,
2. a supervised interactive technique – a combination of N-Dimensional Visualisation and Scatter Plot and
Scatter plots for various band
Abundance maps for the 6 classes Row1 – original abundances obtained from LISS-III classified map resampled to MODIS image size, row2 – PBEE, row3 – N-FINDR, row4 – N-Dimensional Visualisation, row5 – K-Means clustering.
Original PBEE N-FINDR N-Dimensional Visualisation K-Means Clustering
PBEE
NFINDR Endmember behaviour for the 6 classes (a) to (f) in 7 bands for various techniques.
• From CC and RMSE, it is concluded that
inversion of
the LMM can provide a better estimate than other
automatic, supervised interactive and semi-automatic
methods.
• Shortcoming
– abundances should be available per class from some high
resolution classified image of the same time frame as that of the
low spatial resolution image with detailed ground information.
Non-linear Mixture Model
Kumar, U., S. Kumar Raja, Mukhopadhyay, C., and Ramachandra T. V., (2011), A Multi-layer Perceptron based Non-linear Mixture Model to estimate class abundance from mixed pixels, Proceedings of the 2011 IEEE Students’ Technology Symposium, Indian Institute of Technology, Kharagpur, India, 14-16 January, 2011, Abstract page Number – 31, Track 4 – Image and Multi-dimensional Signal Processing.
►
NLMM accounts for interactions among the ground cover
materials (multiple reflections among the materials on the
surface).
►
Also accounts for topographic features (slope) of the
ground surface.
Non-linear Mixture Model
= ( , ) +
f
y
E α
η
where, f is an unknown non-linear function that defines the interaction between E
Architecture of the MLP model.
Structural diagram of the MLP.
The activation rule used here for the hidden and output layer nodes is defined by the logistic function
1 ( ) 1 x f x e
Simulated Data Set
A 200 band hyperspectral images generated from spectral libraries of four different minerals - (a) band 1 (b) band 100 (c) band 200.
(x,y)
n
n(x,y)
ns
4 =1y
sig
n sigwhere is the signature corresponding to nth mineral,
n
(x,y)log(1 α (x,y))
n
s
is the contribution of endmember en and α (x,y)n is the fractional abundance of en in the pixel at (x,y).
Abundances details of four minerals obtained from the LMM and NLMM.
BDFs of simulated test data obtained from LMM and NLMM.
LMM
NLMM
(a) LISS-3 classified map resampled to 100 x 100 pixels.
(b) agriculture, (c) builtup / settlement, (d) forest, (e) plantation / orchard, (f) waste land / orchard, (g) Water bodies
Abundances of six categories from NLMM.
BDFs of MODIS test data from NLMM. (a) agriculture, (b) builtup / settlement, (c) forest, (d) plantation / orchard, (e) waste land / orchard, (f) water bodies.
Correlation and RMSE between actual and predicted proportions.
Classes Correlation (r) (p < 2.2e-16) RMSE LMM NLMM LMM NLMM Agriculture 0.6730 0.9110 0.0518 0.0271 Builtup / Settlement 0.6390 0.9345 1.0519 0.0083 Forest 0.7310 0.9411 0.0257 0.0062 Plantation / Orchard 0.6990 0.9447 0.0280 0.0061 Waste/Barre n land 0.6599 0.9342 0.0431 0.0073 Water bodies 0.7799 0.9855 0.0061 0.0016
Error distribution of MODIS abundance obtained from NLMM (X and Y axes are the two dimensions in feature space and Z axis is the absolute difference between real and estimated class proportion) for the six classes.
• Computer simulated data - overall RMSE
0.0089±0.00215 with LMM and 0.0030±0.0001 with the
NLMM when compared to actual class proportions.
• The unmixed MODIS images - overall RMSE of NLMM
was 0.0191±0.022 as compared to LMM 0.2005±0.41
indicating that individual class abundances obtained
from NLMM is very close to what is present on the
ground
and observed in the high resolution classified
image.
Which side of pixel is the class situated?
Unmixed abundance map of builtup
Pixel Swapping Algorithm
Kumar, U., Mukhopadhyay, C., Kumar Raja S., and Ramachandra T. V., (2008), Soft classification based Sub-pixel allocation model, International Conference on Operations Research for a growing nation in conjunction with the 41st Annual Convention of Operational Research Society of India, Tirupati, AP, India, 15-17 December, 2008.
Pixel swapping algorithm
can increase the
resolution of the OSP output from
136 x 140
to
1360 x 1400
The swapping algorithm
1. Requires some spatial correlation between pixels.
2. Maximize the autocorrelation between the pixels of the image
3. It takes the abundance output and transforms it into a map of hard LC class map defined at the sub-pixel scale.
Limitation - it only allows the mapping of hard binary LC (target, non-target) classes.
Atkinson, P. M., 2005, Sub-pixel target mapping from soft-classified, remotely sensed imagery. Photogrammetric Engineering & Remote Sensing, 71(7), pp. 839–846.
Sub-pixel mapping of a linear feature and a circle:
(A) Test image-line (B) Abundance (C) Random allocation (D) After convergence (E) Test image-circle, (F) Abundance (G) Random allocation, (H) Converged map
Nearest neighbour - 3 and non-linear parameter of the exponential model α was also set to 3.
PS on MODIS image
(A) Builtup pixels shown in black and non-built shown in white, (B) Sub-pixel map of builtup,
(C) Converged map of the builtup after applying PS algorithm.
LISS-III (25 m) MODIS abundance (250 m) PS MODIS (25 m) LISS-III Classified (25 m)
• Sensitivity - (0.6) (proportion of actual positives which are correctly
identified)
• Specificity - 0.69 (proportion of negatives which are correctly
identified)
• PPV - 0.6 (precision of positives that were correctly identified)
• NPV - 0.69 (precision of negatives correctly identified)
• With the ground truth, the accuracy was 76.6%
Hybrid Bayesian Classifier
Kumar, U., Kumar Raja S., Mukhopadhyay, C., and Ramachandra T. V., (2011), Hybrid Bayesian Classifier for Improved Classification Accuracy. IEEE Geoscience and Remote Sensing Letters, vol. 8, no. 3, pp. 473-476.
• In HBC, the class prior probabilities are determined by unmixing a
supplement low spatial-high spectral resolution multi-spectral (MS)
data that are assigned to every pixel in a high spatial-low spectral
resolution MS data in Bayesian classification.
Results
Classifiers → Bayesian classifier HFC Class ↓ PA UA PA UA Concrete roofs 69.99 84.01 76.49 ↑ 93.89 ↑ Asbestos roofs 84.77 87.77 91.89 ↑ 94.46 ↑ Vegetation 94.21 87.55 87.24 ↓ 89.13 ↑ Blue plastic roof 84.33 81.17 97.00 ↑ 85.60 ↑ Open area 51.49 69.49 95.00 ↑ 74.22 ↑ Average 76.96 81.99 89.52 ↑ 87.46 ↑Accuracy Assessment for IKONOS data Accuracy Assessment for LISS-III data
Classifiers → Bayesian classifier HBC
Class ↓ PA* UA* PA* UA*
Agriculture 87.54 87.47 90.15 ↑ 95.56 ↑ Builtup 85.11 81.68 89.39 ↑ 98.33 ↑ Forest 85.71 88.73 92.61 ↑ 96.36 ↑ Plantation 84.44 91.73 95.95 ↑ 91.03 ↓ Waste land 88.03 90.37 98.67 ↑ 89.66 ↓ Water bodies 90.91 88.89 88.18 ↓ 97.00 ↑ Average 86.96 88.15 92.49 ↑ 94.66 ↑
• Increase in overall accuracy by
– 6% for with IRS LISS-III MS and MODIS
– 9% with IKONOS MS and Landsat MS
as compared to conventional Bayesian classifier.
Free and Open Source Tools for Geoinformatics
GRASS GIS
• GRASS (Geographic Resources Analysis Support
System) is a free GIS software used for
– geospatial data management and analysis,
– image processing,
– graphics/maps production,
– spatial modelling, and
– visualization.
• One of the world’s biggest open source project,
• Official project of the Open Source Geospatial
First GRASS Mirror Site (Tier 1) in India at IISc
http://wgbis.ces.iisc.ernet.in/grass
GRASS Wiki:
GRDSS
design and
conceptual
diagram.
GRDSS
data
flow
Functionalities of GRDSS…
…A Quick Look
Applications …
Web services
User Interface Platforms: Linux, Handheld Raster map operations
Vector map operation Image Processing LiDAR
FOSS Kiosk
GIS Layers and Visualisation Front end
• Elevation • LULC • Place names • Roads • Energy • Communication facilities • Anganwadi centres • Educational Facilities • Medical Facilities • General Facilities • Watershed boundaries• Water Flow structures
• Sacred groves
• Canals, rivers, ponds
• Streams
• Admin boundary
• Ka-Map from maptools.org (works on
Apache, UMN Mapserver, PHP
Current on going project:
LCLUC studies of major metropolitan
cities of India: A glimpse
Bangalore City
“We are waiting for the city to come to
us…”
LCLUC in Bangalore
2010
2010
1973
1992
1999
2006
2010
Types of urban outlying growth highlighted in box –
(A) isolated growth,
(B) linear branching (road/corridor), (C) clustered growth.
Diffusive growth
Urban growth map
(A) 1973 to 1992, (B) 1992 to 2000, (C) 2000 to 2006, (D) 2006 to 2010.
Analysis of Land Surface
Temperature
Decreasing Lakes and Parks Urbanising Bangalore
N S E W NW NE SW SE
Dividing Bangalore into directional zones
0 500 1000 1500 2000 2500 3000 3500 4000 1970 1980 1990 2000 2010 Year A re a (h a) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1970 1980 1990 2000 2010 Year A re a (h a ) 0 500 1000 1500 2000 2500 3000 3500 4000 1970 1980 1990 2000 2010 Year A re a (h a ) 0 500 1000 1500 2000 2500 3000 1970 1980 1990 2000 2010 Year A re a (h a) 0 1000 2000 3000 4000 5000 6000 7000 8000 1970 1980 1990 2000 2010 Year A re a (h a ) 0 1000 2000 3000 4000 5000 6000 7000 8000 1970 1980 1990 2000 2010 Year A re a (h a ) 0 1000 2000 3000 4000 5000 6000 7000 8000 1970 1980 1990 2000 2010 Year A re a (h a) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 1970 1980 1990 2000 2010 Year A re a (h a) N NE E SE S SW W NW
Directional Analysis of Land Surface Temperature
Direction Mean LST±SD N 21.30±2.39 NE 22.15±2.22 E 21.01±2.47 SE 21.34±2.30 S 21.71±2.07 SW 22.19±1.92 W 22.97±1.72 NW 22.07±2.25Use of Spatial metrics to
quantify the structure of the landscape
quantify the spatial pattern and composition of
features
0 2000 4000 6000 8000 N NE E SE S SW W NW 1973 1992 2000 2006 2010 Largest Patch 0 10000 20000 30000 40000 N NE E SE S SW W NW 1973 1992 2000 2006 2010
Built up(total land area)
0 1000 2000 3000 N NE E SE S SW W NW 1973 1992 2000 2006 2010 Number of Patches 0 0.2 0.4 0.6 0.8 1 N NE E SE S SW W NW 1973 1992 2000 2006 2010 0 500 1000 1500 2000 2500 N NE E SE S SW W NW
1973 Ratio of Open space 1992 2000 2006 2010
Results of Spatial Metrics
Clumpiness Aggregation index 0 20 40 60 80 100 N NE E SE S SW W NW 1973 1992 2000 2006 2010 Compactness index of the largest patch
0 0.2 0.4 0.6 N NE E SE S SW W NW 1973 1992 2000 2006 2010
Largest patch in N and E in 2010 and medium urban development in W, SW and S. Urban growth is more prominent
in west, southwest and south direction.
Separate clusters of huge urban patches have come in north (Bengaluru International Airport)
and east (International Tech Park Limited).
More compact and moving towards single big patch in 2010. Open space decreased and urban density increased.
Urban dynamics through
Cellular Automata (CA) based
growth models
Growth model: CA
• CA is based on pixels, states, neighbourhood and transition
rules.
Transition External factorsPixel
(Final State)
Pixel
(Initial State)
•
584% increase in urban areas during 37 years (1973 to 2010).
•
↑ ~2-4 ºC in local LST.
•
74% ↓ vegetation cover and 66% ↓ in water bodies.
Percent Impervious surface NDVI Temperature
Spatial Thinking …
Thank you