structural types using multi-sensor remote sensing and machine learning

techniques

ISPRS Journal of Photogrammetry and Remote Sensing, forthcoming

Christian Geiß, Patrick Aravena Pelizari, Mattia Marconcini, Wayan Sengara, Mark Edwards, Tobia Lakes and Hannes Taubenböck

Received 12^th July 2013; revised 20^th June 2014; accepted 30^th July 2014

77 ABSTRACT

Detailed information about seismic building structural types (SBSTs) is crucial for accurate earthquake vulnerability and risk modeling as it reflects the main load-bearing structures of buildings and, thus, the behavior under seismic load. However, for numerous urban areas in earthquake prone regions this information is mostly outdated, unavailable, or simply not existent. To this purpose, we present an effective approach to estimate SBSTs by combining scarce in situ observations, multi-sensor remote sensing data and machine learning techniques. In particular, an approach is introduced, which deploys a sequential procedure comprising five main steps, namely calculation of features from remote sensing data, feature selection, outlier detection, generation of synthetic samples, and supervised classification under consideration of both Support Vector Machines and Random Forests. Experimental results obtained for a representative study area, including large parts of the city of Padang (Indonesia), assess the capabilities of the presented approach and confirm its great potential for a reliable area-wide estimation of SBSTs and an effective earthquake loss modeling based on remote sensing, which should be further explored in future research activities.

1.INTRODUCTION

Increasing spatial concentration of exposed elements such as people, buildings, infrastructure or economic values in earthquake prone regions induce seismic risk at an unprecedented high level. In particular, urban areas in developing countries are characterized by a large amount of vulnerable buildings. At the same time, a very dynamic urban growth is accompanied by the construction of unplanned, spontaneous and highly vulnerable settlements. Thus, local governments and stakeholders face the problem of continuously updating their knowledge on the current building stock and simultaneously assessing exposed buildings area-wide to efficiently establish and adjust preparedness measures (Sarabandi and Kiremidjian, 2007;

Taubenböck et al., 2009a; Wieland et al., 2012). Especially for earthquake loss estimation (ELE) modeling, the gathering of building inventory and vulnerability information represents normally the most time-consuming and expensive aspect (Dunbar et al., 2003).

The exclusive application of conventional approaches such as detailed in situ building-by-building analysis by structural engineers is decreasingly able to cope with this situation.

Instead, in the last few years remote sensing has proven its great potential to extract relevant features for pre-event vulnerability analysis of built-up structures for large areas (Geiß and Taubenböck, 2012). So far, different approaches have been presented in the literature. By means of characteristics extracted from remote sensing data, Taubenböck et al. (2009a) and Borzi et al. (2011) reconstruct and characterize the built environment and retrieve specific fragility functions for designated building types. Pittore and Wieland (2012) use remote sensing data for delineating and characterizing homogeneous built-up areas. The vulnerability of the building inventory is determined in combination with information from a ground-based omnidirectional imaging system. Similarly, Borfecchia et al. (2009) assess the vulnerability of buildings in a hybrid way, namely by combining in situ ground truth for selected buildings with information derived from remote sensing data. Supervised classification techniques are subsequently used to classify the residual building inventory. Geiß et al. (2013) combine detailed in situ seismic vulnerability information with features describing the urban morphology derived from remote sensing data. Supervised regression and classification techniques are then applied to evaluate the suitability for an area-wide assessment. The aforementioned studies deploy very heterogeneous approaches with respect to the vulnerability levels or classes to be estimated. Taubenböck et al. (2009a), Borzi et al. (2011), and Borfecchia et al. (2009) use rather specific definitions, whereas Pittore and Wieland (2012) and Geiß et al. (2013) incorporate also more transferable, yet generalized, assessment

79 schemes, such as the European Macroseismic Scale (EMS-98; Grünthal et al. 1998).

However, none of the cited studies focus on the estimation of seismic building structural types (SBSTs). SBSTs characterize the main load-bearing structure of a building. This is the most affecting factor for earthquake damage and, accordingly, it is generally the first property considered for categorizing a building. Further frequently considered parameters that may reflect the seismic performance also comprise the number of storeys, the period of construction or the presence of structural irregularities (Coburn and Spence, 2002). A function for individual SBSTs can be determined that relates the magnitude of the seismic hazard to the damage probability of the structures (Calvi et al., 2006). This enables the prediction of the probable damage distribution of the building inventory with respect to a certain level of seismic hazard (Douglas, 2007). Additionally, SBSTs can also contribute to the assessment of the seismic vulnerability according to schemes such as the EMS-98.

In the pioneering work of Sarabandi and Kiremidjian (2007), information derived from remote sensing data is combined with ancillary (geo-)information to estimate SBSTs. In particular, they use very high resolution optical imagery to derive the building inventory and calculate features describing the height, extent, shape, and roof type characteristics of individual buildings. In addition, they use tax assessor data to compile information about occupancy and age. Subsequent to that, supervised classification techniques (Classification and Regression Trees (CART) and multinomial logistic regression) are deployed to estimate SBSTs. In this paper we propose considerable conceptual and methodological differences to estimate SBSTs. The plethora of sensors systems that provide useful and complementary information yields the possibility to substitute ancillary (geo-)information and, thus, fully rely on remote sensing to reconstruct and characterize the building inventory. Due to e.g. data availability it may be crucial to gain independence from proprietary sources of information (e.g. tax assessor data). In addition, a complementary set of remote sensing data allows to characterize the building inventory in an exhaustive manner and to encode for instance also spatial context information in the classifier. This in turn opens a good opportunity to boost predictive performance of learned models. An exhaustive characterization of the building inventory based on a comprehensive set of features simultaneously suggests relying on classification approaches that are able to cope efficiently with high-dimensional data sets.

Moreover, SBSTs ground truth is very costly to obtain and at the same time is afflicted with uncertainties induced by an often challenging assignment process. This induces the general need for a more tailored approach, which is able to lower those uncertainties and can cope with the scarcity of in situ elaborated ground truth.

To address these considerations, the objective of this paper is to introduce an approach for estimating SBSTs area-wide based on scarce in situ ground truth and complementary multi-sensor remote sensing data by means of a sequential procedure of advanced machine learning techniques. More specifically, we exploit very high resolution multispectral imagery, multi-temporal medium resolution multispectral data, as well as height information from a normalized digital surface model (nDSM) to derive a comprehensive set of features characterizing the urban environment. Different feature selection techniques are then employed to reduce the dimensionality of the resulting dataset and identify the most relevant features. Outlier detection is applied to prune those objects from the data for which the available in situ information cannot be considered reliable. To tackle the scarcity of SBSTs ground-truth data, additional synthetic samples are generated. Finally, different SBSTs are estimated by means of advanced supervised classification techniques. In particular, both Support Vector Machines (SVM) (Vapnik, 1998; Schölkopf and Smola, 2002) and Random Forests (RF) (Breiman, 2001) are considered due to their capability of effectively handling complex remote sensing classification problems (Camps-Valls and Bruzzone, 2009; Gislason, 2006). Since spatially distributed estimation of SBSTs represents a critical input for ELE models, we illustrate the applicability of the presented approach within scenario-based loss estimations for the city of Padang, Indonesia.

The remainder of the paper is organized as follows. Section 2 characterizes the study site and data base for this study. In section 3 the methods are described and results and discussions are revealed in section 4. The paper is concluded and an outlook is given in section 5.

2.STUDY SITE AND DATA

2.1.STUDY SITE:PADANG,INDONESIA

The presented study focuses on the city of Padang (Indonesia), which is situated in one of the most earthquake-prone regions worldwide. Padang is located on the island of Sumatra (mainly on the coast and to some extent sited beneath the mean sea level) and is the capital city of the Sumatera Barat province. It represents the third largest city on the island with approximately one million inhabitants. The dynamic urban system of Padang is constituted by a high concentration of population, infrastructure and economic values. The city has supra-regional relevance with an international airport and railway connection and possesses an essential economical role for the coastal region and the mountainous hinterland.

81 The Sumatra subduction zone represents one of the most active plate tectonic margins in the world (Petersen et al., 2004). The Australian plate plunges beneath the Sunda block of the Eurasian plate with convergence rates between ~56 and 62 mm/yr. (Chlieh et al., 2008) (Fig.

IV-1a). The associated complex plate boundary setting leads to thrust earthquakes on the subduction fault, strike-slip earthquakes on the Sumatran fault, deeper earthquakes within the subducting lithosphere, and volcanic earthquakes (McCaffrey, 2009). Accordingly, the city is located in a region characterized by extremely high probability of severe earthquakes, as well as secondary effects such as tsunamis (Chlieh et al., 2008; Taubenböck et al., 2009b). As an example, in the afternoon of 30^th September 2009, Padang was hit by an earthquake with a moment magnitude of Mw = 7.6 (Taubenböck et al., 2013). Overall, the earthquake event affected an area with a population of 1.2 million and caused 1,195 fatalities. 144,000 buildings collapsed or were significantly damaged. In Padang, 383 people died and 431 were seriously injured, primarily due to collapsing buildings (EERI, 2009; BNPB, 2009). Despite the size of the event, the Sunda megathrust was not ruptured and the stress on the Mentawai segment, which was accumulated over 200 years, has not been significantly reduced. The megathrust strain-energy budget remained substantially at a high level and the threat of a great, also tsunamigenic earthquake with a magnitude Mw > 8.5 on the Mentawai patch is unabated (McCloskey et al., 2010).

Fig. IV-1. Overview on the location of the study area and acquired data. (a) overview on the location of Padang and tectonic setting; basic active structural elements of the obliquely convergent Sumatran plate boundary are shown; dashed lines parallel to the trench are the 50, 100, and 200km depth of the megathrust (Chlieh et al., 2008); (b) very high resolution multispectral IKONOS imagery; (c) multitemporal LANDSAT data; (d) nDSM with object heights; (e) derived geoinformation consisting of building footprints and buildings blocks; (f) in situ data regarding SBSTs that are superimposed on (b).

2.2.REMOTE SENSING DATA

The remote sensing data used in our study have been acquired and processed within the “Last-Mile” project (Taubenböck et al., 2009b) and comprise a multispectral IKONOS image, multitemporal LANDSAT data, as well as height information from a nDSM (all of them co-registered to UTM 47S projection and WGS-84 datum). The multispectral IKONOS image was acquired on 12^th April 2005 and covers a spectral range between 0.445 and 0.853 µm, with a geometric resolution of 1 m for the panchromatic band and 4 m for the 4 multispectral bands. The data were pan-sharpened and atmospheric correction was performed using the ATCOR (Atmospheric and Topographic Correction) model (Richter, 1996; Taubenböck et al., 2009b) (Fig. IV-1b). Multitemporal LANDSAT data were acquired by the Thematic Mapper sensor and the Enhanced Thematic Mapper sensor on 25^th July 1989 and 15^th July 2000, respectively. Both images are characterized by 7 multispectral bands covering a spectral range between 0.45 and 2.35 µm at 30 m spatial resolution (Fig. IV-1c). Height information is

83 derived by means of a digital surface model (DSM) and a digital terrain model (DTM), both derived from airborne radar data acquired by pair antennas and processed using SAR interferometry techniques (Li et al., 2004). The DTM was derived based on measurements of the bare ground contained in the original radar data and by manually reviewing and editing (Intermap, 2010). The data sets have a geometric resolution of 5 m and a height Root-mean-square-error (RMSE) of 1 m. To get relative height information of elevated objects a nDSM is calculated by subtracting the height values of the DTM from the height values of the DSM (Fig. IV-1d). Interested readers can refer to Taubenböck et al. (2009b) for a more detailed description of data acquisition and preprocessing.

2.3.DERIVED GEO-INFORMATION AND IN SITU DATA

Basic geo-information had already been derived from the remote sensing data and provided for this study. Within the “Last-Mile” project 87,573 building footprints were digitized from the IKONOS imagery by means of a manual photointerpretation procedure. They represent the core of Padang’s building inventory. Additionally, building blocks had been derived from a closed-meshed road network (Taubenböck et al., 2008) (Fig. IV-1e). Both information layers serve as basis for the calculation of features, which is explained in section 3.1.

About four weeks after the earthquake on 30^th September 2009 a field survey in the affected region took place in the framework of the Australia-Indonesia Facility for Disaster Reduction (AIFDR) jointly led by the Institut Teknologi Bandung (ITB) and Geoscience Australia. The primary objective of the survey was to undertake a population based inspection of buildings of all types and all damage levels. The results allowed inferring knowledge regarding the vulnerability of a range of building types present in the surveyed region and representative for others in Indonesia (Sengara et al., 2010). Overall, 3896 buildings were surveyed and each of them was assigned to a specific structural system, wall type, roofing type, floor type, number of storeys, usage, and the degree of damage suffered from the earthquake event. To conduct a vulnerability assessment and to derive fragility curves, the surveyed buildings were categorized according to SBSTs, which reflect a similar behavior under seismic load. In particular, the following classes were considered: “Confined masonry”

(CM), “Reinforced concrete high” (RC high), “Reinforced concrete low” (RC low), “Steel frame” (SF), “Timber frame residential” (TF res), “Timber frame residential” (TF non-res), and “Unreinforced masonry” (URM) (Fig. IV-1f). From the whole amount of surveyed buildings, 2779 are located in the study area. The position of each building was recorded with a GPS device (Fig. IV-1b) and digital pictures were also taken (ibid.). Nevertheless, due to

inaccuracies in the GPS positioning, only 561 buildings could be unambiguously assigned to their corresponding building footprint extracted from remote sensing imagery (Fig. IV-1e).

Unfortunately, only two samples remained for the structural type SF. As this class represents a relatively rare but striking SBST, the corresponding in situ ground truth was extended with 12 additional samples derived from another data set compiled in February/March 2008 during the “Last-Mile” project (Taubenböck et al., 2009b). The histogram depicting frequencies of different SBSTs of the final in situ data set is shown in Fig. IV-2. Descriptive statistical analyses were carried out to check whether the final in situ data set is consistent with all surveyed buildings and the results revealed a very good agreement.

Fig. IV-2. Frequency of labeled samples (overall: 573) according to different SBSTs of the final in situ data set.

3.METHODS

Based on the remote sensing and in situ data, we carry out a sequential procedure to estimate SBSTs. Fig. IV-3 gives a schematic overview from the data sets used, the chronology of the procedure to the targeted SBSTs classification map. A set of features is derived from the remote sensing data at two different spatial levels, building and block level (section 3.1.). The hierarchical supervised classification approach is described in section 3.2. Outliers in the in situ data and building inventory are identified first. Therefore, a subset based feature selection technique (section 3.2.1.) is used to create a suitable group of features for building robust class classification models based on the in situ data. The models are built by means of a one-class support vector machine (OC-SVM, section 3.2.2.) approach and are applied on both in situ data and building inventory. Subsequent to outlier identification, multiclass classification

85 models are built in three consecutive steps. The remaining in situ samples are used to identify useful groups of features for building robust models by applying subset and ranker based feature selection techniques (section 3.2.1.). To tackle scarcity of the in situ data and learn efficient discriminative classifiers, synthetic training samples are generated by means of an oversampling technique (section 3.2.3.). Based on the generated feature groups and oversampled training data, multiclass classification models are learned by using SVM and RF (section 3.2.4.). Finally, the most accurate model is applied on the building inventory to estimate SBSTs spatially distributed.

Fig. IV-3. Overview of the framework and processing steps followed in this study. Detailed explanations are given throughout the text in section 3.1 which describes the feature calculation, and section 3.2 which describes the hierarchical supervised classification approach.

3.1.CALCULATION OF FEATURES FROM REMOTE SENSING DATA

For a reliable estimation of SBSTs, numerous features have been extracted for both the aforementioned individual building footprints and building blocks (identified in the following by the subscripts B and S, respectively) (Fig. IV-1e and Fig. IV-3). Generally, the building

footprints allow characterizing individual buildings, whereas the building block layer characterizes the spatial setting which the respective buildings are embedded in. We chose to use building blocks derived from a street network rather than artificial spatial units, such as quadratic objects. This allows us to reflect the urban morphology, which is constituted by distinct areas that are generally irregularly shaped, more naturally. Simultaneously, the difficulty of having to determine the optimal kernel size a priori is avoided (Herold et al., 2003).

We use a set of features, that was introduced and explained in detail in Geiß et al. (2013), where it was used to evaluate the potential of remote sensing to assess the seismic vulnerability levels of buildings (see Tab. IV-1a). In particular, the features relate to the two-dimensional extent of buildings as well as the description of their shape characteristics. In addition, statistical values of 1^st and 2^nd order were extracted from the available IKONOS imagery at building and block level. The first serve as a descriptor of roof surface material and arrangement whereas the latter are intended to describe the composition of distinct urban structures. Mean and standard deviation values of the different image bands as well as band ratios, which are intended to emphasize spectral dissimilarities, were calculated. Additionally, rotation-invariant texture measures for the panchromatic and near-infrared band were computed using both the co-occurrence matrix (GLCM) and grey level difference vector (GLDV).

Features explicitly aiming to describe the spatial context are calculated at block level and consist of the area of building blocks and the average size of the buildings located within.

Furthermore, spatial metrics such as proportion measures of land cover classes are computed.

Based on a urban land cover map derived in Taubenböck et al. (2009b) (which exhibited an OA of 97%), proportions of land cover classes “buildings”, “sealed”, “grass/meadow”,

“trees”, and “impervious surface”, which represents a combination of “buildings” and

“sealed” were calculated per block. Additionally, a semantic classification (“Structure Type

S”), which is built on physical features that describe the urban morphology, is incorporated.

The classification describes the socio-economic status of the population by distinguishing

“slums”, “suburbs”, “low income areas”, “medium income areas”, and “high income areas”.

Beyond, the incorporation of height information allows the calculation of 3D features such as building floor number, floor space, ratio of diameter and height, ratio of width and height, as well the average building height within a building block. The mean slope for each building block was calculated to describe topographic location characteristics. By analyzing two Landsat images from 1989 and 2000, the period of construction is approximately described

87 based on a post classification change detection procedure, which aims to map the urban extent at the respective time step. For a more comprehensive description of all the features listed in

In document Seismic vulnerability assessment of built environments with remote sensing (Page 101-131)