• No results found

Extracting Spatio-Temporal Patterns from Ocean Fishery Data Sets in the East China Sea Using Spatial Cluster Analysis

N/A
N/A
Protected

Academic year: 2021

Share "Extracting Spatio-Temporal Patterns from Ocean Fishery Data Sets in the East China Sea Using Spatial Cluster Analysis"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

© 2003 by the American Fisheries Society

Extracting Spatio-Temporal Patterns from Ocean Fishery Data Sets in the East China Sea Using Spatial Cluster Analysis

YUNYANDU*, CHENGHUZHOU, QUANQINSHAO, FENZHENSU, SHENGWANG Institute of Geography, Chinese Academy of Sciences, Bldg. 917

Datun Road, Anwai, Beijing 100101, PR China

433 can be considered sources of data mining. Thus, in this respect, the two research branches are similar.

The ocean, because of its large areas and limi- tations or difficulties of investigative methods, could not be cognized as a special field until recently. Cur- rent numerical modeling methods are unsuitable for some ocean processes or phenomena that cannot be understood by the mechanism. Because the ultimate goal of data mining or knowledge discovery is to dis- cover hidden patterns or trends in complex informa- tion sources (Deekshatulu et al. n.d.), data mining tools are suitable for use in discovering knowledge from a ocean data set. In this paper, the quantitative relationship between the ocean fishery and corre- sponding environmental factors is given based on the combination of data mining and GIS. The spa- tio-temporal patterns have been extracted from a time series of fishery productivity statistic data and corresponding environment factors in the East China Sea (24–36˚ N, 118.5–130˚ E) from 1987 to 1998 using spatial isodata cluster algorithm and GIS mapping techniques in detail.

Cluster analysis is a branch of statistics that has been studied extensively for many years. The main advantage of using this technique is that interesting structures or clusters can be found directly from data without any background knowl- edge (Chen 1998). Recently, with the formation of data mining concepts and structure, cluster analy- sis has also become considered as a data mining tool. Spatial cluster analysis is one way to mine knowledge from data sets that have a spatial loca- tion. Like traditional cluster methods that take into account only attribution when analyzing geo- graphic data, the single spatial cluster takes into account only location when calculating the dis- tance of a sample. These two conditions cannot be used for practical applications (Guo 1997).

In this paper, we investigate the extraction of spatio-temporal patterns from the historical records of ocean fishery production, corresponding temper- atures, and their gradients to obtain a thorough understanding of this relationship. Based on Since the mid-1980s, the concern of geographic

information systems (GISs) has evolved from geography databases to spatial analysis with the development of new techniques and applications, and the demand for strong spatial analysis func- tions in GISs has increased dramatically (Guo 1997). Previous spatial analysis functions previ- ously focused mainly on GIS graphic techniques based on the theories of geometry and topography, such as overlay and buffer, and were poor for knowledge-based GISs.

Two branches are forming in this field: the combination of GIS and spatial analysis, of which Openshaw and Goodchild are representative, and spatial data mining in GIS, of which Jiawei Han and Deren Li are representative. The former looks at some actual research into the integration of spa- tial analysis and GIS, as well as the potential advantages in developing such integration (Fischer 1994). The latter focuses mainly on the application of data mining in GIS spatial databases from a knowledge discovery perspective. As Koperski et al. (1996) note, “Methods for mining spatial data should be combined with advanced spatial data- base, as well as statistical analysis, spatial reasoning and expert system technology to create Intelligent GIS systems” (Fotheringham and Rogerson 1994).

Because of the complexity and uncertainty of geographic phenomena, there are some problems in the absolutely quantitative method of data capture, application, and explanation of results; expert knowledge is needed to combine geographic analy- sis. Traditional expert system methods based on sym- bol inference are difficult to apply in practice because of a lack of knowledge about data capture and updating (Zhang 1999). Symbol intelligence is moving toward calculation intelligence in the artifi- cial intelligence field. At the same time, from a data mining perspective, exploratory (statistical) and mathematical models of data in spatial analysis all

*Telephone: 086 010 64889681; Fax: 086 010 64889630;

E-mail: [email protected]

(2)

previous studies, we concluded that sea surface temperature (SST) data have a significant correla- tion with fishery productivity; their relationships vary steadily within certain ranges over time and area in the East China Sea. To mark the relation- ship in space automatically, a dynamic cluster algo- rithm is used for the long-time series ocean fishery data set. So-called dynamic cluster analysis uses the distance as the measure index of sample’s similari- ty; after choosing the rule function of accessing cluster results, the start classes are given, and the optimal solution is calculated by iteration (Ren et al. 1998; Bian 2000). For the ocean fishery, data sets are organized by grid structure. Different attrib- ute data form a stack of grids, the location data is hidden in the grids, and the isodata cluster algo- rithm is chosen.

The isodata cluster algorithm is an iterative process for computing the minimum Euclidean dis- tance when assigning each candidate cell to a clus- ter. The process starts with the software assigning one arbitrary mean for each cluster (the number of clusters is dictated by the user input). Each cell is then assigned to the closest of these means, all in the multidimensional attribute space. New means are then recalculated for each cluster based on the attribute distances of the cells that belong to the cluster after the first iteration. This process is repeated; each cell is assigned to the closest mean in multidimensional attribute space, and new means are calculated for each cluster based on the membership of cells from the iteration.

Materials and Methods

Study area

The study area is 118.5–130˚ E, 24–36˚ N, includ- ing part of the China Yellow Sea and almost all of the East China Sea. It is situated west of mainland

China, northeast of mainland Korea, and south of the South China Sea by the Taiwan Strait (Wang 1996). It forms an arc from Taiwan to the Kyushu Islands. The west sector of this arc is the continen- tal shelf; the east sector of this arc is the ocean slope and trough. Hang Zhou Bay is the largest bay in this area.

Data preparation

The main focus of data preparation is on a multi- variate analysis among fishery productivity, corre- sponding SSTs, and their gradients. Therefore, orig- inal data must be pretreated. SSTs were extracted from National Oceanic and Atmospheric Adminis- tration (NOAA) and advanced very high resolu- tion radiometric (AVHRR) images and rectified to 0.5˚ latitude and longitude intervals by investigat- ing data. Daily fishing catch statistics of four com- panies were used as fishery production data. To ana- lyze the four representative companies, we calculat- ed samples for all companies together as well as sep- arately by week (Figure 1). We also calculated the distribution of average productivity per net (APPN) of all companies together and separately by week (Figure 2). The Zhoushan, Shanghai, and Ningbo companies have the same distribution with total trends both in number and production trends;

all but the Zhoushan Company were representative.

Thus, the APPN values of three companies were chosen to represent fishery density.

Because the time interval of SSTs is weeks and the space interval of fishery density is a fish cell, the fishery density should be calculated by week and SSTs should be encrypted as a fish cell (10′ × 10′).

Spatial cluster process

Because SSTs and their gradients are the main fac- tors that affect the distribution of fishery density, to

Figure 1. Sample numbers by week, 1987–1998.

(3)

determine the spatial center, SSTs, their gradients, and APPN values are taken into account in the isodata cluster. Isodata cluster calculations are per- formed only in multivariate attribute space and are not based on any spatial characteristics. The lati- tude and longitude of this study area are organized by grid and put into the stack for isodata cluster analysis. From this point of view, it can be consid- ered one of the spatial clusters.

Better results are obtained if all layers in the input stack have the same data ranges; then, in a specific process, the APPN values are transformed to the same range with SSTs using grid algebra and the following equation because of vastly different data ranges:

(x – old min)*(new max – new min)

z=——————————————+new min (1) (old max – old min)

where Z is the output grid with new APPN ranges; X is the APPN grid; old min and old max are the minimum and maximum values of the APPN grid, respectively; and new min and new max are the minimum and maximum values of the SST grid, respectively.

Extracting spatial patterns from ocean fishery data sets

To determine the spatial distribution of APPN, corresponding SSTs, and their gradients from 1987 to 1998 as a whole, the weekly statistics of bottom-trawl (type 30) and purse-net (type 20) APPN values were extracted from original litera- ture. Meanwhile, SSTs and their gradients in the same period and same cell were also extracted, then summed by fish cell, as listed in Table 1;

2,139 total records were used. The “Fish cell” col- umn lists the spatial locations, so APPN values, SST, and gradients can be changed into ASCII files and then transformed to three arc/info grid

layers. In the same way, the fish cell can be changed into two arc/info grids: the latitude and longitude layers of the fish cell. Above five layers of grids form two stacks according to the type of fishery production. Using the algorithm men- tioned above, cluster results of the two stacks can be obtained (Figures 3 and 4) according to the fol- lowing parameters. The number of classes is 6;

maximal iterations are 20. Minimum class size is no fewer than two cells; sample interval is one cell. Statistics of every class’s mean and variance are given in Tables 2 and 3. The data in Figures 3 and 4 illustrate that different fishery production types have vastly different spatial patterns.

There are two main concentrate classes of type 20 in the study area. One is the Dasha and Shawai fish area, located between the Yellow China Sea and the East China Sea. The other class contains two subareas: the Changjiangkou Zhoushan area and the Wentai Mindong area northeast of Taiwan island. In Changjiangkou Zhoushan, the APPN is relatively high, the mean SST is 12–13˚C, and the SST gradient is about 3˚C, whereas in other areas, the APPN is relatively low, the mean SST is 26˚C, and the SST gradient is about 1.9˚C (Table 2).

There are two temperature fronts in the upper waters of the Dasha area: one, in the west, is formed by the Yellow Sea warm current and the Yellow Sea mixed water; the other, in the east, is formed by littoral water of Korea and Duima warm water mass. Therefore, the pelagic fishery resources in the Dasha fish area are high, and the SST gradi- ents in this area are relatively high. The Taiwan Warm Current, one of the branches of the Kuroshio Current, plays a major role in the Wentai Mindong area, where is has almost the same inten- sity during winter and summer except for deflecting due to a strong northward monsoon in winter. As a result, this area is relatively stable, and the pelagic fishery resources are low.

Figure 2. APPN by week, 1987–1998.

The APPN of All Companies from 1987 to 1998

The APPN of Each Companies from 1987 to 1998

(4)

The cluster result of demersal fishery produc- tion (type 30) has more classes, and the spatial pat- terns are more complex (Figure 4), taking on a zonal and circular structure. The combined area of Changjiangkou Zhoushan and Dasha is located at the center of a circle, then extends slowly outward to a zonal structure. On the whole, there are three main areas: the Yellow Sea warm water and Yellow Sea mixed water region, the Changjiangkou Zhoushan region, and the Kuroshio Current area (the area surrounded by lines in Figure 4). The direction of the zonal pattern is almost consistent with the main flow of Kuroshio and its branches.

Meanwhile, the center of the circular pattern is the influx of littoral current of the Changjianglou Zhoushan fish area, northward Kuroshio branch (Taiwan Warm Current) and southward cold cur- rent. The other patterns in Figure 4, except the three main areas, are transitional regions.

The cluster results indicate that the APPN is high in the areas of the Kuroshio Current. The main reason for this are that the formation demer- sal fishing areas are strongly correlated with the ascending current, which lies to the left of the Kuroshio Current; the bottom drops steeply east of the 200-m isobath, which enhances the intensity of Table 1. APPN, SSTs, and their gradients, 1987–1998.

Fish cell Nets Production APPN Type SST (0.1ºC) Gradient (0.1ºC)

9,511 9 305 (box) 33 (box) 30 169 6

9,512 3 30 10 30 140 5

9,513 4 40 10 30 190 7

9,516 5 90 18 30 69 2

9,517 23 870 37 30 150 7

9,615 4 2,350 587 20 160 5

9,616 3 300 100 20 143 3

9,619 1 450 450 20 154 8

9,716 7 1,500 214 20 150 11

Figure 3. Spatial pattern of type 20 in East China Sea.

(5)

the ascending current. The Changjiangkou Zhou- shan area lies in zones between the Changjiang diluted water, the Yellow Sea cold water mass, and the Taiwan Warm Current; the temperature front that is the main factor in high stocks of fishery resources forms easily. Other zones have the unsta- ble water masses and large temperature gradients that are unsuitable for shoal concentrations. Such areas can be considered shoal transition areas.

Extracting spatial patterns by season

In this section, we analyze the spatial distribution of fishery resources and temperature in East China Sea area from 1987 to 1998 in a general sense. To reveal how the spatial patterns change over time, further

analysis (especially spatio-temporal analysis) is required. First of all, the time resolution in this process should be set. Given that the amount of samples of the three main fishing companies decreased steadily from 1990 to 1997, even though each company has some samples in each year, a year is not a good resolution for this analysis. Addition- ally, considering that the ocean environment varies much more distinctly by season than by year, season is chosen as the fundamental time resolution to study the spatio-temporal relationship between ocean fishery resources and ocean temperature.

According to ocean research conventions, the four seasons are winter (December, January, and February), spring (March, April, and May), summer Figure 4. Spatial pattern of type 30 in East China Sea.

Table 2. Cluster results of type 20. SST and gradient values are given in 0.1ºC.

Class ID No. of cells Layers Mean Variance

1 308 SST 261.14 579.55

Gradient 18.84 217.63

APPN 64.55 118.50

2 460 SST 124.48 611.60

Gradient 28.96 162.6

APPN 70.77 466.75

(6)

and Qiantangjiang rivers and the littoral current flow together toward the northeast; with increased flow velocity and southward changes due to the monsoon, circles share as center the sea mouth of the Yangtze River.

In further analysis of the statistics resulting from the cluster analysis (omitted from the tables), the high density of fishery resources almost appear to dominate water mass: in winter and spring, along the main path of Kuroshio, stretch and transit on its left side, taking a different intensity; in summer and fall, circling around the mouth of the Yangtze River, stretch and transit outward, taking intensity that is high in summer and relatively low in fall.

The classes in each season cannot be com- pared with each other because of lack of normal- ization across the four seasons. For the sake of com- parison, based on the APPN variant, we normal- ized the classes above. The total classes are 10 and sorted ascending by APPN; hence, isodata spatial clustering was recalculated (Figure 6). According to the results, quantitative rules of specific class shifts with the seasons can be determined (analysis not included in this paper).

Conclusion and Discussion

We adopted the isodata spatial clustering algorithm to successfully extract the spatio-temporal patterns from a time series of ocean fishery data and corre- sponding SST and SST gradient data from 1987 to 1998. Traditional mathematical methods common- (June, July, and August), and fall (September,

October, and November). Because of space limita- tions, only the fishery data with working type 30 (i.e., trawl net) are further analyzed by season in this paper.

After normalizing the fishery data and temper- ature data in each original data table using equa- tion (1), the five variables of each season are ana- lyzed using the isodata spatial clustering algorithm mentioned above. The results show that the spatial patterns vary distinctly with seasonal shifts (Figure 5). Winter and spring share similar patterns (i.e., like a belt extending northeast–southwest, along the path of Kuroshio, to the Duima Strait and the Changjianglou Zhoushan area). Although patterns in summer and fall are also similar, patterns display circles that share a center: the sea mouth of the Yangtze River, affected by diluted water of the Yangtze River. Such spatial patterns are created because of the Kuroshio Current in the East China Sea; its direction is relatively steady, but its flow velocity and quantity change markedly by season (i.e., Kuroshio is strong in winter and spring, weak in summer and fall [Wang 1996]). In winter and spring, the Duima branches of Kuroshio, which gather several water masses, are vastly different from the main Kuroshio, whereas the Taiwan branches of Kuroshio (which have almost the same extension in summer and winter) flow northward along the Zhejiang coastline except for in winter because of strong monsoon winds (Chen 1991). In summer and fall, the diluted water of the Yangtze

Table 3. Cluster results of type 30. SST and gradient values are given in 0.1ºC.

Class ID No. of cells Layers Mean Variance

1 38 SST 108.97 332.56

Gradient 11.24 56.3

APPN 57.71 1,569.23

2 146 SST 145.67 69.08

Gradient 15.24 41.23

APPN 50.54 19.49

3 415 SST 173.75 41.76

Gradient 21.55 40.8

APPN 50.37 37.0

4 425 SST 191 26.8

Gradient 21.5 58.36

APPN 48.9 10.92

5 313 SST 211.3 32.9

Gradient 18.8 54.34

APPN 49.2 9.55

6 207 SST 232.55 132.2

Gradient 16.69 75.3

APPN 49.7 21.49

(7)

strategy and that it is suitable for the analysis of ocean fishery data. The use of a raster data struc- ture overcomes the shortcoming of the traditional mathematical models—the lack of spatial informa- tion—when it is used in geospatial data analysis.

The clustering results presented in this paper reveal that the ocean fishery resource data and the corresponding ocean SST data in the East China Sea area show certain spatial patterns and that such patterns vary distinctly with seasonal change.

These patterns are also related to the dominant water flows in this area, including the Kuroshio Current and its two main branches, Yangtze River diluted water and the Yellow Sea mixed water mass, which are strongly correlated with the envi- ronmental data. Unfortunately, the lack of fish species information in the original data set makes further analysis associated with fish biological property impossible.

Figure 5. Spatial pattern of type 30 by season.

ly used in geography research have many short- comings. The most critical one is that the existing spatial and temporal topology relationship in a geospatial data set is lost after the data set is divid- ed into independent samples, regardless of which method is adopted.

One distinct characteristic of ocean fishery data is that the spatial graphical data are relatively simple, and the attribute data implicate spatial information. A raster data structure (one of the most common used in GIS) is selected to organize the variables during cluster analysis. To further demonstrate the spatial clustering characteristic, spatial location data organized in a raster data structure are also introduced at the same time. The clustering characteristic is revealed much more clearly with the engagement of spatial data layer.

Thus, we conclude that the clustering method used in this paper is a kind of flexible spatial clustering

(8)

In general, using the isodata spatial clustering algorithm to analyze ocean fishery data quantita- tively is a promising strategy for revealing the spa- tial distribution pattern of ocean fishery resources and its cause. Determining how to describe the ocean fish field spatial pattern shift quantitatively is our next research goal.

Acknowledgments

This work was sponsored by grants from the national 863 program, High-Tech Marine Moni- toring Theme of China.

References

Bian, Z. 2000. Pages 235–239 in Pattern recognizing. Qinghua University Publishing Company, Qinghua, China.

Chen, D. 1998. Pages 60–67 in Multi-data processing.

Chemical Industry Press, Beijing, China.

Chen, G.., editor. 1991. Pages 76–89 in Ocean fishery environment of China. Zhejiang Science and Tech- nology Publishing House, Hangzhou, China.

Deekshatulu, B. L., R. Krishnan, and N. Jacob. n.d. Spatial analysis and modeling techniques: a review. http://

pages.hotbot.com/edu.geoinformatics/f138.html.

Fischer, M. M. 1994. From conventional to knowledge- based geographic information systems. Computers, Environment and Urban Systems 18(4):232–242.

Fotheringham, S., and P. Rogerson, editors. 1994. Pages 1–10 in Spatial analysis and GIS. Taylor & Francis, London, UK.

Guo, R. 1997. Pages 1–10 in Spatial analysis. WuHan Technical University of Surveying and Mapping, Wuhan, China.

Koperski, K., J. Adhikary, and J. Han. 1996. Spatial data mining: progress and challenges. http://db.cs.sfu.ca.

Ren, R., et al. 1998. Pages 77–81 in Multianalysis: theory, technique, examples. Guo Fang Publishing Compa- ny, Beijing, China.

Wang, Y. 1996. Pages 110–130 in Marine geography of China. China Scientific Press, Beijing, China.

Zhang, J. 1999. Study on information entropy based Geo- DM/Kdd Models and their applications. Ph.D. the- sis. Geography Institute of the Chinese Academy of Sciences, Beijing, China.

Figure 6. Spatial pattern of type 30 by season, by normalizing.

References

Related documents

Linse, Angela R., and Xie, Hui (2011), Student Ratings of Teaching Effectiveness: Analysis of Data from Common Courses from Select Semesters.. (2009 ‐ 2010), Schreyer Institute

Momentum Traffic Control will have a site supervisor on site responsible for the monitoring of onsite traffic management conditions, as well as communications with

When you integrate Microsoft Word and/or Excel with DM using COM Automation with Active integration, some menu shortcut keys are intercepted and DM dialog boxes and controls are

Nominated Lion in charge of the collection for the evening Towing Driver License and Insurance All Members and the public.. 3 1 3 The driver of the tow vehicle is to hold a

• The most significant development transaction in the fringe market for the year was at Argent’s King’s Cross Central scheme where BNP Paribas Real Estate bought a site to develop

1998 Akron Art Museum; Bookmaking Workshop and Summer Festival 1996 Artist in Residence; Louisville School System; Louisville, Ohio 1992 Kent State University; Lecturer at The

•for the second semester (starting in February): October 15th Learning Agreement deadline: no later than 15 July (1st semester); 15 December (2nd semester). Nomination by your

In Viswanathan's type of holoendemic malaria, which is developed in races that have been exposed to malaria for a limited number of genera- tions only, the parasite counts in