Spatial Exploratory Data Analysis of Birth Defect Risk
factors’ Identification
Jilei WU1*, Jinfeng WANG1,
Gong CHEN
2, Lihua PANG
2,
Xinming SONG2, Bin MENG1, Keli ZHANG3, Ting ZHANG4and Xiaoying ZHENG2*Address: 1
Institute of Geography Science and Nature Resource Research, CAS, Beijing, 100101, P. R. China, 2
Institute of Population Research, Peking University, Beijing, 100871, P. R. China 3
Department of Resources and Environment, Peking Normal University, Beijing, 100875, P. R. China 4
Capital Institute of Pediatrics, Beijing, 100020, P. R. China
Email: Jilei WU* – [email protected] ; Jinfeng WANG – [email protected] ; Gong CHEN –
[email protected] ; Lihua [email protected] ; Xinming SONG – [email protected] ;
Bin MENG – [email protected] ; Keli ZHANG – [email protected] ; Ting Zhang –
[email protected] and Xiaoying ZHENG* –[email protected]
* Corresponding author
Abstract
Background
Birth defects, which have severely been affecting the population health since human been appears in
the earth, refer to embryos and fetus developing abnormally, with structure, function, and metabolism
heredity and/or environmental factors but very difficulty to search out clearly. This study tries to identify environmental risk factors playing the role in birth defects using spatial exploratory data
analysis methods.
Methods
Spatial autocorrelation statistics, Moran's I coefficient, was used to detect the spatial association of birth defects. And newly developed spatial hotspots detector, Getis's G statistics, was used to detect the
hotspots birth defects in space to explore the risk factors.
Results
Different types of birth defects show different spatial distributions. Neural tube birth defects have
significant positive spatial autocorrelation and two typical clustering phenomena at different spatial
scales, while other types of birth defects have no such significant spatial autocorrelation. Positive
spatial autocorrelation indicates that there are some common environmental risk factors affect birth
defects occurring ratio while different clustering phenomena disclose the different working scales by different risk factors.
Conclusion
Using spatial exploratory data analysis methods, significant positive spatial autocorrelation of the
neural tube birth defects and two typical spatial patterns of hotspots were detected. These give clues in the risk factors identification and the birth defects intervention. The risk factors causing birth defects
Background
Birth defects refer to embryos and fetus developing abnormally and have defects in some parts of
their bodies when in uterus before birth. The reasons caused birth defects severely affect the population health and definitely aggravate the burden on the whole society development. More and more attention
has been paid to searching birth defects reasons and birth defect prevention measurements. Birth
defects intervention has become a main task in public health field by now.
According to update results of birth defects research, the probability of birth defects caused by
genetic factors may be similar in regions. However, environmental risk factors, such as infection, environmental pollution and toxicosis, account for distinction of birth defects occurring ratio among
regions. Such exposures of environmental risk are maneuverable for birth defects intervention.
Currently, some researches have been initiated to look at identification and inspection of them to birth
defects from different angles.
Based on research estimated the ratio of birth defects occurring is estimated about 40~50 in China, a higher level in the world. Shanxi province of China is one of highest risk regions in China and has
highest ratio of neural tube birth defects. In order to reach prepotency, we select Heshun, one county of
Shanxi, as an experimental region for study. This county lies in Taihang Mountain regions and forms a
relative closed area. Most people in this county are farmers and seldom change their living places, and
no migration happened in history. So the population-based inherit risk factors to birth defects would be similar in this region, which is of very important precondition to study the relationship between
Methods
Acquirement of birth defects data
There are 322 villages and one town in this county. Four years (1998~2001) birth defects cases
acquired based on hospital records and investigation in villages. As family plan policy being carried out
strictly, number of birth planed every year for each village. So we can have birth defects occurring ratio according to this number. But as the birth defects are small probability events, we summed all four
years birth defects cases and divided by average planning birth number per year as birth defects
occurring ratio. The villages and their people are some equally distributed in this area. But the town is
relatively people clustered place, and environment factors are also some complexity. In order to
simplify the relationship between birth defects occurring ratio to environmental factors, we have removed the records occurring in the town.
Different birth defects may be caused by different risk factors. By organ system, we divided the birth
defects types by neural tube birth defects and non-neural tube birth defects (NTBD) and other birth
defects. NTBD include anencephaly, spina bifida, encephalocele, holoprosencephaly, hydrecephalus etc.
And the birth defects occurring ratio calculated respectively according to NTBD and other birth defects.
Spatial position and expression
322 villages have been located by Geographical Information Systems for spatial analysis (Fig. 1). As there no boundaries defined for the villages, we draw them for every village by Voronoi chart (Fig. 2).
Voronoi chart, also named Dirichlet chart, is composed by continuous polygons. Those polygons are
by points on shared border of neighboring Voronoi polygons. Centered by location of the village, Voronoi polygon gives the scope of the villages and Delaunay triangles reflect the distance correlation
of the villages which is used in spatial autocorrelation analysis below.
Fig. 1 Villages in study region
Fig. 2 Voronoi polygons
Spatial statistics methods
=
n i n j n i i ij n i n j j i ijy
y
w
y
y
y
y
w
n
I
2)
(
)
(
)
)(
(
Here,
n
is the number of villages,y
iis the number of birth defects occurring ratio in villagei
ij
w
is weight matrix between villagei
andj
, wheni
andj
linked in Delaunay trianglew
ij=
1
, otherwiseW
ij=
0
, score test is used for hypothesis test, namely:)
(
/
)]
(
[
I
E
I
std
I
Z
=
When
|
Z
|
>
1
.
96
, small probability exist in normally distribution, null hypothesis is rejected. Namely, significant positive autocorrelation exists for 95% confidence interval.Hotspots detect, namely Getis G* statistics, has expression below [2]
2 / 1 2 * * 1 * *
)}
1
/(
]
{[
)
(
=
n
W
nS
S
y
W
y
d
w
G
i i n j i j ij iHere,
S
is standard variance of birth defects occurring ratio, when villagej
toi
within distanced
, thenw
ij(
d
)
=
1
, otherwisew
ij(
d
)
=
0
S
1i*=
S
jw
ij(
d
)
,W
*i=
S
jw
ij(
d
)
. The highervalue of
G
i*, the more influence of villagei
at given distanced
, and become a hotspot of thisregion.
Results
Spatial autocorrelation
Using Moran’s I statistics, NTBD and other birth defects occurring ratios were analyzed for their
Tab.1 Moran’s I statistics of NTBD and other birth defects occurring ratios Statistics items Moran I Value Score test Normal distribution
probability Neural birth defects 0.13871444 4.24939737 0.00004784 Non-neural birth defects 0.03914535 1.19833834 0.19457338
Both NTBD and other birth defects occurring ratios have positive spatial autocorrelation. But by
score test for Moran’s I, NTBD occurring ratio has significant positive spatial autocorrelation in this
study area, while other birth defects occurring ratio can not past this test at the 95% confidence
intervals.
Distance scale
As Getis G* statistics needs a threshold distance value. Typical distance in this study area were
calculated and described below in table 2.
Tab 2 Typical distance scale and their meanings
Statistics items Distance value Meanings Nearest distance among remote
villages 6.165 9.309km
Socio-economic activities scopes of farmers
Differentiated distance of soil types 19.5 30 km Geological variance distance scale of lithology etc.
Hotspots detect
Two typical hotspots distributions of NTBD occurring ratio are drawn in Fig 3 and Fig 4. Calculated
by Getis G* statistics. In order to compare the hotspots detected results and acquire effective clues of environmental risk factors, the distributions of birth defects occurring ratios and lithology were also
Fig 3. Hotspots detected at 6.84 km (Grouped distribution)
Fig 5. birth defects occurring ratios distributions
Fig 6. Lithology distribution (zonal type)
Discussion
proportion of birth defect in infant mortality reason is gradually going up from one fourth to one third in the 90’s of last decade. Therefore, the birth defects research is becoming an important task in public
health field nationwide. Many researchers collected the hospital-based monitoring data and analyzed
for useful information to intervene birth defects. Those data were also used observed/expected method
(O/E) to capture the unconventionally changes in birth defects occurring ratio. For example,
International Clearinghouse for Birth Defects Monitoring Systems (ICBDMS) used O/E method in their quarterly and annual reports [3].
Commonly used O/E method has some disadvantages in birth defects inspection and intervention.
The main task of O/E method is draw a baseline of birth defects occurring ratio of a region based on
Poisson distribution. Then by inspecting the birth defects occurring ratio, sudden interventions of
environmental factors can be detected by referring to epidemic characteristics of birth defects. However, there are many limits in O/E method. First of all, how to confirm baseline of birth defects occurring
levels in different regions? Next, when inspecting time is limited and family planning were carried
strictly, as small probability event birth defects are, the population may be not enough for statistics in
O/E method. And more, for some regions, long existing environmental risk factors to birth defects
cannot be detected by limited inspecting time.
Using spatial exploratory data analysis methods, this study effectively reduced the dependence on
temporal data of birth defects records. That means, by utilizing spatial location information, small
probability events analysis, such as birth defects, can be effectively carried on. There are no limits of
baseline on birth defects occurring levels and long existing environmental risk factors to birth defects
can be also reflected as population-based situation in spatial.
autocorrelation on NTBD occurring ratio while no such on other birth defects. As mechanism analysis carried on birth defects types, NTBD have similar risk factors and other birth defects have different risk
factors. Positive spatial autocorrelation can be proved common environmental risk factors to NTBD
existing in this region while other birth defects might be caused by some random distributed risk
factors or just belong to random occurrences.
Hotspots detecting, namely Getis G* statistics, were calculated from distance of 0.5km to 45km and two typical hotspots clustering phenomena exist in two scales (6.84km and 22.8km). The distribution of
hotspots changes from villages grouping clustering to zonal clustering as distance increasing. However,
the scales of two typical hotspots clustering have practical meaning in socio-economic activity of
people and geographical variance of lithology. Distance of scale 6.165~9.309km is typical radius of
communication scope of residents in study area. Their socio-economic activities usually occur within this distance. So if inherit risk factors contribute to birth defects, this distance scope and hotspots
clustering can give effective clues. Distance of scale 19.5~30km is the soil types change scale. In this
study area, the soil types and lithology types have zonal distribution and core mine underground as well.
According to general knowledge of residents in this area, high birth defects occurring areas usually are
diggings area. So the minerals or rocks maybe have some risk elements do harm to mankind health and cause birth defects occurring. As rocks effloresce to soil, soil variance can reflect the change of
lithology. This zonal type clustering of birth defects occurring hotspots indicate that this region exist
some risk elements related to soil and lithology, and take effect in scope of soil types change scale.
Conclusions
analysis. The results showed that different types of birth defects have different characteristics in spatial distribution. NTBD have significant positive autocorrelation at 95% confidence interval while other
birth defects have no such significant autocorrelation. From different scales to detect hotspots of birth
defect occurring ratios, two types clustering phenomena shown at two typical scales which have some
senses in socio-economic and geography. Those give effective clues to detect the environmental risk
factors to birth defects especially NTBD. This spatial explore data analysis has proposed their efficiency capacity in seeking clues of birth defects risk exposure of environment. Next steps of this
study should to analyze the geological samples by chemical test in laboratory firstly, such as soils
samples, for environmental risk factors identification. Following these, animal model of risk exposure
should be established and re-tested. Meanwhile, investigation of socio-economic activities such as
scope of intermarriage etc. should be carried in the project.
Authors’ contributions
This study was conceived and completed by Jilei Wu, Jinfeng Wang and Xiaoying Zheng supervised
this study, Gong Chen, Lihua Pang, Bin Meng and Keli Zhang assisted with the study and analyses of
socio-economic factors, soils and lithology analysis, Xinming Song and Ting Zhang assisted with the medicine analysis of birth defects.
Acknowledgements
This study is supported by grants from National “973” Program, JJ03000101
and 49871064 from National Nature Science Foundation of China, 2002AA135230 from National High
Academy of Sciences and “211” program of Peking University.
References
[1] Robert .J. Haining. Spatial Data Analysis in the Social and Environmental Sciences. Cambridge
University Press, Cambridge, U.K, 1989.
[2] Getis A., Ord J.K., 1992, The Analysis of Spatial Association by Use of Distance Statistics, Geographical Analysis, Vol. 24: 189-206.
[3] ICBDMS. International Clearinghouse for Birth Defects Monitoring Systems Annual Report
1998. Roma: ICBDMS, 1998
[4] AN XL, FU SHL, Environmental Eugenics, Beijing Medical University and Chinese Union