TAN WEI LUN
A thesis submitted in fulfilment of the
requirements for the award of the degree of
Doctor of Philosophy (Mathematics)
Faculty of Science
Universiti Teknologi Malaysia
ACKNOWLEDGEMENT
I would like to express my deepest appreciation and gratitude to my main
supervisor, Assoc. Prof. Dr. Fadhilah Yusof, and to my co-supervisor, Prof Dr.
Zulkifli Yusop, for their continuous support and encouragement throughout my
studies. During my research process, they have given me extensive supervision,
criticism, advice and guidance. Without their encouragement and enthusiasm, this
thesis would not been completed.
Apart from both of my supervisors, I would also like to thank the Department
of Irrigation and Drainage, Malaysia, for providing the valuable rainfall data that
have been applied in my research work. My sincere appreciation is extended to the
fund and sponsorship from the MyBrain15 Scholarship; their financial support is
gratefully acknowledged.
In addition, I would like to express my deepest gratitude to Assoc. Prof. Dr.
Liew Ju Neng of the School of Environmental and Natural Resource Sciences, UKM,
for the valuable information. My gratitude also goes to all of my Professors, Senior
Lecturers and in general all the staff of the Department of Mathematical Sciences,
UTM, and Centre for Environmental Sustainability and Water Security (IPASA),
UTM, for their important guidance and valuable remarks.
ABSTRACT
ABSTRAK
TABLE OF CONTENTS
CHAPTER
TITLE
PAGE
DECLARATION
ii
DEDICATION
iii
ACKNOWLEDGEMENT
iv
ABSTRACT
v
ABSTRAK
vi
TABLE OF CONTENTS
vii
LIST OF TABLES
xi
LIST OF FIGURES
xiv
LIST OF ABBREVIATIONS
xxi
LIST OF SYMBOLS
xxiii
LIST OF APPENDICES
xxv
1
INTRODUCTION
1
1.1
Introduction
1
1.2
Background of the Study
2
1.3
Statement of the Problem
3
1.4
Objectives of the Study
4
1.5
Scope of the Study
5
1.6
Significance of the Study
6
1.7
Structure of the Thesis
7
2
LITERATURE REVIEW
8
2.1
Introduction
8
2.2.1
The Markov Chain Approach
9
2.2.2
Alternating Renewal Process
12
2.3
The Rainfall Amount Process
14
2.3.1
Multi-State Markov Model
15
2.3.2
The Weather State Model Approach
17
2.4
Hidden Markov Model
19
2.5
Non-Homogeneous Hidden Markov Model
21
2.6
El Niño-Southern Oscillation (ENSO)
25
2.6.1
ENSO indices
27
3
METHODOLOGY
30
3.1
Introduction
30
3.2
Flow Chart
30
3.3
Study Site Description
33
3.3.1
Rainfall Data
33
3.3.2
Atmospheric Data
35
3.4
Stochastic Process
35
3.5
Markov Chain
37
3.5.1
Discrete Time Markov Chain
37
3.5.2
First Order Markov Chain
38
3.6
Basic of the Hidden Markov Model
40
3.7
Hidden Markov Model
41
3.7.1
Parameterization for the Rainfall Process
42
3.7.1.1
Bernoulli Distribution
42
3.7.1.2
Gamma Distribution
42
3.7.1.3
Exponential Distribution
43
3.7.2
Parameterization for the Hidden Process
44
3.8
Non-Homogeneous Hidden Markov Model
44
3.8.1
Parameterization for the Hidden Process
45
3.9
The Atmospheric Variables
46
3.9.1
Singular Value Decomposition
46
3.9.2
Climatological anomalies
47
3.9.4
Student’s t Test
49
3.10 Parameter Estimation
50
3.10.1
Likelihood
50
3.10.2
The Expectation-Maximization (EM)
Algorithm
53
3.10.3
The K-Means Algorithm
56
3.11 Model Selection
57
3.12 Viterbi Algorithm
58
3.13 Scaling
59
3.14 Simulation and Prediction
61
3.15 Model Assessment
62
3.15.1
Pearson Correlation Coefficient
62
3.15.2
Survival Function
63
3.15.3
Quantile-Quantile Plots
67
3.15.4
Root Mean Squared Error (RMSE)
67
4
HIDDEN MARKOV MODEL FOR RAINFALL
OCCURRENCE
69
4.1
Introduction
69
4.2
Descriptive Statistics of the Rainfall Data
69
4.3
Model Fitting
70
4.4
Estimated Parameters and Hidden States
73
4.5
Physical Interpretation on the Hidden States
83
4.6
Assessing the Performance of EM Algorithm and
K-means Algorithm
92
4.7
Summary
115
5
HIDDEN MARKOV MODEL FOR RAINFALL
AMOUNT
117
5.1
Introduction
117
5.2
Description of Atmospheric Data
102
5.3
Fitting the Hidden Markov Model to the Rainfall
5.4
Estimated Parameters and Hidden States
123
5.5
Assessing the Performance of HMM on Rainfall
Amount
134
5.6
Non-homogeneous Hidden Markov Model
138
5.7
Synoptic Conditions
143
5.8
Inter-annual and Inter-decadal Variability
150
5.9
Comparison between the Simulations from HMM
and NHMM.
157
5.10 Short-Term Prediction from HMM and NHMM
164
5.11 Summary
169
6
CONCLUSION AND RECOMMENDATION
172
6.1
Introduction
172
6.2
Conclusion
172
6.3
Recommendation for Future Research
174
REFERENCES
175
LIST OF TABLES
TABLE NO.
TITLE
PAGE
3.1
Geographical coordinates of the 40 selected rainfall
stations in Peninsular Malaysia from 1975-2008. C
Center; E East; NW Northwest; SW Southwest; W West
34
3.2
The examined atmospheric variables to input into the
NHMM from the NCEP reanalysis dataset.
35
3.3
The frequency of State 1 during the Northeast monsoon.
49
3.4
The dry spell distribution in Station 1 (Janda baik)
during the Northeast monsoon. Group 1 and Group 2
represent the observed and simulated data via the EM
algorithm, respectively.
64
3.5
Log-rank test statistic for Station 1 (Janda Baik) during
the Northeast monsoon
65
4.1
Summary of the descriptive statistics of rainfall data
during the Northeast monsoon and Southwest monsoon
season. Sn-Station; Total- Total rainfall amount (mm) for
the monsoon season; Mean-Daily mean (mm);
SD-Standard deviation (mm); n(w)-Number of wet days;
P(w)-Probability of wet days; Re-Regional.
72
4.2
Bayesian information criterion (BIC) for two to six states
HMMs during the Northeast monsoon and Southwest
monsoon. Log (L): log-likelihood
73
4.3
Estimated parameters from the EM algorithm and
K-means algorithm during the Northeast monsoon
74
4.4
Estimated parameter from the EM algorithm and K-mean
4.5
Rainfall statistics for the identified hidden states from the
EM algorithm and K-means algorithm during Northeast
monsoon
84
4.6
Rainfall statistics for the identified hidden states
generated using the EM algorithm and K-means
algorithm during the Southwest monsoon
88
4.7
The bias values on the observed inter-station correlation
during the Northeast and Southwest monsoon.
94
4.8
RMSE for the observation wet/dry spell versus EM and
K-means simulation wet/dry spell
107
4.9
The results of the log-rank test statistics on the survival
function curves. No represents no significant difference;
Yes represents a significant difference; NEM represents
the Northeast monsoon; SWM represents the Southwest
monsoon. Total represents the total number of “No”.
114
5.1
Skill scores of the summary variables. CC is the mean
absolute Pearson correlation among all the station
between summary variable and the observed rainfall
amount and CV the percentage of correlation explained
by the first summary variable
119
5.2
The Bayesian Information criterion (BIC) for two- to
ten-states HMMs and the four different amounts
distribution during the Northeast monsoon
121
5.3
The Bayesian Information criterion (BIC) for two- to
ten-states HMMs and the four different amounts
distribution during the Southwest monsoon
122
5.4
Estimated HMM parameters for rainfall amounts during
the Northeast monsoon
123
5.5
Estimated HMM parameters for rainfall amounts during
the Southwest monsoon
126
5.6
Rainfall statistics for the identified hidden state of HMM
during the Northeast monsoon
132
5.7
Rainfall statistics for the identified hidden state of HMM
during the Southwest monsoon
133
5.8
Comparison of the BIC for the models during Northeast
monsoon. S6 six hidden states model; n1 Precipitable
water; n2 Relative humidity 700hPa; n3 Specific
5.9
Comparison of the BIC for the models during Southwest
monsoon. S5 five hidden states model; m1 Zonal wind
700hPa; m2 Zonal wind 850hPa; m3 Meridional wind
200hPa; m4 Meridional wind 1000hPa.
139
5.10
Rainfall statistics for the identified hidden state of
NHMM during the Northeast monsoon
140
5.11
Rainfall statistics for the identified hidden state of
NHMM during the Southwest monsoon
142
5.12
Summary of the hidden state patterns during the
Northeast monsoon
146
5.13
Summary of the hidden state patterns during the
Southwest monsoon
150
5.14
The years used to create the SST anomaly composites
during the Northeast monsoon
152
5.15
The years used to create the SST anomaly composites
during the Southwest monsoon
154
LIST OF FIGURES
FIGURE NO.
TITLE
PAGE
2.1
Structure of the HMM
19
2.2
Structure of the NHMM
22
2.3
El Niño. The red color represents the warmer than
normal tropical Pacific sea surface temperature and the
blue color represents the cooler than normal tropical
Pacific sea surface temperature. Available from:
<http://www.esrl.noaa.gov/psd/>
26
2.4
La Niña. The red color represent the warmer than
normal tropical Pacific sea surface temperature and the
blue color represents the cooler than normal tropical
Pacific sea surface temperature. Available from:
<http://www.esrl.noaa.gov/psd/>
26
2.5
Location of the Darwin and Tahiti in which its sea level
pressures contribute to the Southern Oscillation Index.
Available from: <http://www.esrl.noaa.gov/psd/>
28
2.6
Location of the Niño regions for measuring sea surface
temperature. Available from:
<http://www.esrl.noaa.gov/psd/>
29
3.1
Flowchart summarizing the methodological framework
for this study.
32
3.2
The coordinate of the rainfall stations and rainfall
regions in Peninsular Malaysia. NW northwestern; SW
southwestern; E eastern; C central; W western.
37
3.3
Graphical representation of a wet or dry spell
occurrence
63
4.1
Error bar at various rainfall regions for each region in
Peninsular Malaysia (S1 refers to Station 1).
Note: ‘x’ indicates the mean and the line indicates
4.2
(i) and (ii) The estimated state sequence. (iii) and (iv)
The accumulated days in the state over 34 years during
the Northeast monsoon via the (a) EM algorithm and
(b) K-means algorithm.
85
4.3
HMM rainfall probabilities from a reddish (dry) to
bluish (wet) color and anomaly composites of 500hPa
omega vertical velocity (contour, Pa/s) with winds at
850hPa for each hidden state during the Northeast
monsoon from 1975 to 2008. Only statistically
significant wind vectors at 95% level are plotted.
88
4.4
(i) and (ii) The estimated state sequence. (iii) and (iv)
The accumulated days according to state over 34 years
during the Southwest monsoon generated from the (a)
EM algorithm and the (b) K-means algorithm
89
4.5
HMM rainfall probabilities from reddish (dry) to bluish
(wet) color and anomaly composites of 500hPa omega
vertical velocity (contour, Pa/s) with winds at 850hPa
for each hidden state during the Southwest monsoon
for the period of 1975–2008. Only wind vectors that
are statistically significant at 95% level are plotted.
92
4.6
The daily rainfall probability between 40 stations for
the observed data versus the simulated data. NE refers
to the Northeast monsoon and SW refers to Southwest
monsoon
93
4.7
The observed versus simulated Spearman’s
inter-station correlations. NE refers to the Northeast
monsoon and SW refers to the Southwest monsoon.
94
4.8
Comparison of the wet spell between the observed data
(blue), simulated data from the EM algorithm (green),
and simulated data from the K-means algorithm (red)
during the Northeast monsoon.
96
4.9
Comparison the dry spell between the observed (blue),
simulated from EM algorithm (green) and simulated
from K-means algorithm (red) during the Northeast
monsoon
97
4.10
The number of wet spells for a wet spell length of: (a) 1
day; (b) 2 days; (c) 3 days; (d) 4 days; (e) 5 days; (f) 6
days; (g) 7 days; and (h) 8 days during the Northeast
monsoon.
99
days; (g) 7 days; and (h) 8 days during the Northeast
monsoon.
100
4.12
Comparison of the wet spells between the observed
data (blue), simulated data from EM algorithm (green),
and simulated data from K-means algorithm (red)
during the Southwest monsoon.
102
4.13
Comparison of the dry spell between the observed data
(blue), data simulated from EM algorithm (green), and
data simulated from K-means algorithm (red) during
the Southwest monsoon.
103
4.14
The number of wet spells for a wet spell length of: (a) 1
day; (b) 2 days; (c) 3 days; (d) 4 days; (e) 5 days; (f) 6
days; (g) 7 days; and (h) 8 days during the Southwest
monsoon.
104
4.15
The number of dry spells for a wet spell length of: (a) 1
day; (b) 2 days; (c) 3 days; (d) 4 days; (e) 5 days; (f) 6
days; (g) 7 days; and (h) 8 days during the Southwest
monsoon.
105
4.16
Observed versus mean simulated wet duration
distribution of rainfall from the EM algorithm and
K-means algorithm at each station during the Northeast
monsoon.
109
4.17
Observed versus mean simulated dry duration
distribution of rainfall from the EM algorithm and
K-means algorithm at each station during the Northeast
monsoon.
110
4.18
Observed versus mean simulated wet duration
distribution of rainfall from the EM algorithm and
K-means algorithm at each station during the Southwest
monsoon.
112
4.19
Observed versus mean simulated dry duration
distribution of rainfall from the EM algorithm and
K-means algorithm at each station during the Southwest
monsoon.
113
5.1
Coordinates of the thirty-five gridded reanalysis data
(95
oE-110
oE, 0-10
oN)
118
5.2
BIC for non-zero rainfall models by single exponential,
mixture of two exponential, single gamma, and mixture
5.3
The BIC for non-zero rainfall models via single
exponential, a mixture of two exponentials,
single-Gamma, and a mixture of two Gamma distributions
during the Southwest monsoon.
122
5.4
HMM rainfall parameters (a)-(f) rainfall occurrence
probability from reddish (dry) to bluish (wet) and
(g)-(l) rainfall amount on days receiving larger than 0.3
mm rainfall during the Northeast monsoon. Note: The
size of the circles refers to the value of occurrence
probability and amount of rainfall.
130
5.5
HMM rainfall parameters (a)-(e) rainfall occurrence
probability from reddish (dry) to bluish (wet) and (f)-(j)
rainfall amount on days receiving larger than 0.3 mm
rainfall during the Southwest monsoon. Note: The size
of the circles refers to the value of occurrence
probability and amount of rainfall.
131
5.6
(a) The estimated state sequence (b) The accumulated
state over 34 years during the Northeast monsoon
132
5.7
(a) The estimated state sequence (b) The accumulated
state over 34 years during the Southwest monsoon.
134
5.8
Quantile-quantile plots of observed versus simulated
rainfall amounts for all the stations during the
Northeast monsoon.
136
5.9
Quantile-quantile plots of observed versus simulated
rainfall amounts for all the stations during the
Southwest monsoon.
137
5.10
BIC for models with different combinations of
atmospheric variables during the Northeast monsoon.
139
5.11
BIC for models with different combinations of
atmospheric variables during the Southwest monsoon.
140
5.12
(a) The estimated state sequence (b) The accumulated
days in particular states over the 34 year Northeast
monsoon rainfall data
141
5.13
(a) The estimated state sequence (b) The accumulated
days in particular states over 34 years of Southwest
monsoon rainfall data
142
5.14
The specific humidity of 850hPa for the mean of Nov–
Feb
144
variable for a specific humidity of 850hPa: (a) rainfall
occurrence probabilities from reddish (dry) to bluish;
(b) rainfall amount on days receiving larger than 0.3
mm rainfall, from light bluish (low amounts) to dark
bluish (high amounts); and (c) Mean of the specific
humidity of 850hPa during the Northeast monsoon.
Note: The size of the circles refers to the value of
rainfall probability and amounts.
146
5.16
The 700hPa zonal wind for the mean of May–Aug
148
5.17
NHMM rainfall parameters including the first summary
variable for the 700hPa zonal wind: (a) rainfall
occurrence probabilities from reddish (dry) to bluish;
(b) rainfall amount on days receiving larger than 0.3
mm rainfall from light bluish (low amounts) to dark
bluish (high amounts); and (c) Mean of the 850hPa
specific humidity during the Southwest monsoon. Note:
The size of the circles refers to the value of probability
and amounts.
149
5.18
(a) Inter-annual variability and (b) 9-year running mean
of state frequency and NINO 3.4 index (multiple of 10)
during the Northeast monsoon.
152
5.19
(a) Inter-annual variability and (b) 9-year running mean
of state frequency and NINO 3.4 index (multiple of 10)
during the Southwest monsoon
154
5.20
Composites of seasonal mean (November to February)
sea surface temperature anomalies defined as years in
the upper 15% of the inter-annual distribution of the
state frequency during the Northeast monsoon. The
number of seasons in each composite is given in
brackets. The shading represents 90% statistical
significant according to a two-tailed Student t-test. The
solid lines depict positive contours and the negative
contours are shown as dashed lines for which the
contour interval is 0.25.
155
5.21
Composites of seasonal mean (May to August) sea
surface temperature anomalies defined as years in the
upper 15% of the inter-annual distribution of the state
frequency during the Southwest monsoon. The number
of seasons in each composite is given in brackets. The
shading represents 90% statistical significant according
to a two-tailed Student t-test. The positive contours are
represented with solid lines and the negative contours
are shown as dashed lines for which the contour
5.22
Inter-annual variability of HMM and
NHMM-simulated during the Northeast monsoon, (a)
seasonally-averaged rainfall amount, (b) occurrence
frequency, and (c) rainfall intensity (averaged rainfall
amount on wet days). Plotted is the median of 100
HMM and 100 NHMM simulations averaged over all
40 stations.
158
5.23
Inter-annual variability of HMM and
NHMM-simulated during Southwest monsoon: (a)
seasonally-averaged rainfall amount; (b) occurrence frequency;
and (c) rainfall intensity (averaged rainfall amount on
wet days). Plotted is the median of 100 HMM and 100
NHMM simulations averaged over all 40 stations.
159
5.24
The boxplot of the entire range of 100 simulations
during the Northeast monsoon, (a) seasonally-averaged
rainfall amount, (b) occurrence frequency, (c) rainfall
intensity (averaged rainfall amount on wet days) for (i)
The HMM, and (ii) The NHMM. The line depicts the
observation data averaged over all 40 stations.
161
5.25
The boxplot of the entire range of 100 simulations
during the Southwest monsoon, (a)
seasonally-averaged rainfall amount, (b) occurrence frequency,
and (c) rainfall intensity (averaged rainfall amount on
wet days) for (i) The HMM and (ii) NHMM. The line
depicts the observation data averaged over all 40
stations.
163
5.26
Inter-annual variability of HMM and NHMM
prediction during the Northeast monsoon, (a)
seasonally-averaged rainfall amount, (b) occurrence
frequency, and (c) rainfall intensity (averaged rainfall
amount on wet days). Plotted is the median of 100
HMM and 100 NHMM predictions averaged over all
40 stations.
166
5.27
Inter-annual variability of HMM and NHMM
prediction during the Southwest monsoon, (a)
seasonally-averaged rainfall amount, (b) occurrence
frequency, (c) and rainfall intensity (averaged rainfall
amount on wet days). Plotted is the median of 100
HMM and 100 NHMM prediction averaged over all 40
stations.
167
amount; (ii) occurrence frequency; and (iii) rainfall
intensity (averaged rainfall amount on wet days). The
line depicts the observation data averaged over all 40
stations.
168
5.29
The boxplot of the entire range of 100 predictions
during the Southwest monsoon: (a) The HMM and (b)
The NHMM for: (i) seasonally-averaged rainfall
amount; (ii) occurrence frequency; and (iii) rainfall
intensity (averaged rainfall amount on wet days). The
line depicts the observation data averaged over all 40
LIST OF ABBREVIATIONS
AIC
-
Akaike’s Information Criterion
ARMA
-
Autoregressive Moving Average
BIC
-
Bayesian Information Criterion
C
-
Central
CART
-
Classification and Regression Trees
CDF
-
Cumulative distribution function
E
-
Eastern
EM
-
Expectation Maximization
ENSO
-
El-Niño Southern Oscillation
FO
-
Frequency Occurrence
GCM
-
General Circulation Model
HMM
-
Hidden Markov Model
MCMC
-
Markov Chain Monte Carlo
MMD
-
Malaysian Meteorology Department
MJJA
-
May-June-July-August
NCAR
-
National Center for Atmospheric Research
NCEP
-
National Centers for Environmental Prediction
NDJF
-
November-December-January-February
NHMM
-
Non-Homogeneous Hidden Markov Model
NOAA
-
National Oceanic and Atmospheric Administration
NW
-
Northwestern
PC
-
Principal Component
-
Quantile-Quantile
SD
-
Standard Deviation
SVD
-
Singular Value Decomposition
SW
-
Southwestern
TPM
-
Transition Probability Matrix
LIST OF SYMBOLS
-
Transition probability from i to j
-
The number of transition from i to j
M
-
The number of stations
N
-
The number of the hidden states
S
-
The hidden sequence
-
Initial Probability
G
-
The emission probability
( )