• No results found

Open cyberGIS software for geospatial research and education in the big data era

N/A
N/A
Protected

Academic year: 2021

Share "Open cyberGIS software for geospatial research and education in the big data era"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Available online atwww.sciencedirect.com

ScienceDirect

SoftwareX 5 (2016) 1–5

www.elsevier.com/locate/softx

Open cyberGIS software for geospatial research and education in the

big data era

Shaowen Wang

a,b,c,d,e,f,∗

, Yan Liu

a,b,c,f

, Anand Padmanabhan

a,b,c,f

aCyberGIS Center for Advanced Digital and Spatial Studies, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA bCyberInfrastructure and Geospatial Information Laboratory, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA cDepartment of Geography and Geographic Information Science, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA

dDepartment of Urban and Regional Planning, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA eGraduate School of Library and Information Science, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA

fNational Center for Supercomputing Applications, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA Received 16 March 2015; received in revised form 3 July 2015; accepted 28 October 2015

Abstract

CyberGIS represents an interdisciplinary field combining advanced cyberinfrastructure, geographic information science and systems (GIS), spatial analysis and modeling, and a number of geospatial domains to improve research productivity and enable scientific breakthroughs. It has emerged as new-generation GIS that enable unprecedented advances in data-driven knowledge discovery, visualization and visual analytics, and collaborative problem solving and decision-making. This paper describes three open software strategies – open access, source, and integration – to serve various research and education purposes of diverse geospatial communities. These strategies have been implemented in a leading-edge cyberGIS software environment through three corresponding software modalities: CyberGIS Gateway, Toolkit, and Middleware, and achieved broad and significant impacts.

c

⃝2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/ by/4.0/).

Keywords:CyberGIS; Cyberinfrastructure; Geospatial big data

Code metadata

Current code version v0.6

Permanent link to code/repository used of this code version https://github.com/ElsevierSoftwareX/SOFTX-D-15-00005

Legal Code License NCSA open source license

Code versioning system used git

Software code languages, tools, and services used C, C++, Python, Bash; MPI, OpenMP, CUDA

Compilation requirements, operating environments & dependencies Compilers: GNU/Intel/Cray; OS: Linux (RedHat, Debian, Ubuntu, CentOS, SUSE); Dependencies: GDAL, GEOS, PROJ4, SPRNG, PySAL, OpenGeoDa, etc.

If available Link to developer documentation/manual https://github.com/cybergis/cybergis-toolkit

http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php/ct Support email for questions CyberGIS Helpdesk (help@cybergis.org)

Corresponding author at: CyberGIS Center for Advanced Digital and Spatial Studies, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA.

E-mail address:shaowen@illinois.edu(S. Wang).

http://dx.doi.org/10.1016/j.softx.2015.10.003

(2)

1. Motivation and significance

Geospatial data and related analytics have become ubiqui-tous as continued growth in geographic information science and technology enables scientific investigations and decision-making support in a plethora of science and engineering fields including for example ecology, environmental science and engineering, public health, geosciences, and social sciences [1,2]. Extensive computational capabilities are needed to man-age and analyze massive quantities of complex and heteroge-neous geospatial data collected across multiple scales and used for diverse applications by many geospatial communities [3]. However, conventional GIS approaches and associated software tools are primarily developed using sequential computing and cannot adequately resolve this increasing data intensity, com-plexity, and diversity of applications [4]. CyberGIS – defined as GIS based on advanced cyberinfrastructure (CI) – collectively harnessing heterogeneous CI resources (e.g., cloud, high-end, and high-throughput) has emerged as new-generation GIS for resolving geospatial big data challenges [5,6].

The rapid development of cyberGIS as an interdisciplinary field has been pushed by advanced digital technologies and pulled by a large number of scientific innovation and discovery challenges and opportunities that exist in numerous geospatial communities. CyberGIS has evolved as a complex ecosystem of hardware, infrastructure, software and services, and applications [7]. Open software is critical to effectively resolve the complexity of the ecosystem and support the diversity of geospatial communities. Our open cyberGIS software approach has three key strategies: open access, source, and integration enabled by three corresponding modalities: CyberGIS Gateway, Toolkit, and Middleware [8,5,9]. CyberGIS Gateway (referred to as Gateway hereafter) provides an online problem-solving environment for geospatial communities to access cyberGIS software and data capabilities based on CI. CyberGIS Toolkit maintains a suite of community-selected open source spatial analysis and modeling software that is scalable on high performance computing resources. GISolve Middleware bridges Gateway and Toolkit to manage the complexity of CI access. These three modalities form open software architecture (see Figure 1 in [8] and Figure 1 in [10]) to address open access, open API, open source, and CI-based integration and computation for cyberGIS software. This cyberGIS approach has already had significant impact in a number of domains (e.g., biosciences [11], coupled human–natural systems [12], econometrics [13], and public health [14,15]).

2. CyberGIS Gateway—open access

Gateway is the leading online geospatial problem-solving environment providing cyberGIS capabilities to serve various research and education purposes [8,5]. As a pioneer of sci-ence gateways [16], Gateway is built on the TeraGrid GI-Science Gateway approach to bridging advanced CI and GIS capabilities through friendly user interfaces based on rich-client web technologies [17]. Gateway capabilities are made available to users at two levels: service and application. As

an open access platform, Gateway represents a software-as-service approach that significantly reduces the complexity of accessing advanced CI and managing cyberGIS software. In general, advancing scientific software requires both soft-ware engineering and domain-specific scientific knowledge. In particular, cyberGIS software exhibits additional dimensions of complexity due to the integration with high-performance parallel and distributed computing resources and services, and diverse geospatial user communities. Each Gateway ser-vice is currently implemented as a RESTful web serser-vice (https://en.wikipedia.org/wiki/Representational state transfer).

The service-oriented approach alone is not sufficient for broad open access to cyberGIS capabilities. Using cyberGIS functions often needs highly interactive user interfaces because geospatial data and analytics require frequent user involvement in such tasks as data and study area selection, map projection, feature extraction, and map visualization. Therefore, Gateway is designed to provide a rich set of interactive user interface components for cyberGIS data and analytics by exploiting advances in web technologies such as HTML5 and geospatial visualization software. A Gateway application is a standalone web application within the Gateway online framework to interact with backend CI and services for a suite of geospatial data and analytical functions. Detailed discussion about Gateway application development can be found in [8,10]. This video:https://www.youtube.com/watch?v=hrJ cZkG-Xs&t=12

provides an illustrative example of a geoscience application while demonstrating how Gateway can interoperate with online data services.

CyberGIS Gateway has been advanced as an open access environment for a large number of users to perform compute-and data-intensive, compute-and collaborative geospatial problem solv-ing enabled by advanced CI. The development of Gateway software focuses on reusable cyberGIS user interface com-ponents and Gateway portal management. Reusable user in-terface components such as map panel, visualization and symbology, data layer ordering, and map-making functions are built in Gateway as JavaScript library for application develop-ment. A coding framework is established for scalable integra-tion of individual applicaintegra-tion codes and portal management.

3. CyberGIS Toolkit—open source

CyberGIS Toolkit integrates a set of loosely coupled scal-able geospatial software components for the following pur-poses [10]:

• Sustain the CyberGIS Toolkit as a reliable community soft-ware toolbox for scalable cyberGIS analytics through rigor-ous software building, testing, packaging, and deployment based on open source software practice;

• Capture spatial characteristics of software elements to achieve optimal computational performance, scalability, and portability in various CI environments; and

• Engage computational and data scientists to advance scalable geospatial computing.

(3)

Table 1

Software components in CyberGIS Toolkit.

Name Description Scalable computing Deployment Scalability (cores)

PABM Scalable agent-based modeling MPI+MPI IO XSEDE 16,384

Parallel PySAL Scalable PySAL functions Multi-core XSEDE 32+

PGAP Parallel Genetic Algorithm Library MPI XSEDE Blue Waters 262,144

pRasterBlaster Map reprojection MPI+MPI IO XSEDE 1024+

SPREG Spatial regression High throughput Phantom Cloud On demand

TauDEM Hydrological information analysis MPI+MPI IO XSEDE 1024

Viewshed Visibility analysis GPU ROGER Single GPU

WRF Multi-scale weather modeling MPI XSEDE 4096

Note: XSEDE (the Extreme Science and Engineering Discovery Environment, http://xsede.org), Blue Waters (http://bluewaters.ncsa.illinois.edu), ROGER (https://wiki.ncsa.illinois.edu/display/ROGER) are CI programs and facilities supported in part by the US National Science Foundation. Phantom Cloud is an on-demand and scalable cloud resource hosted at the Argonne National Laboratory.

Table 1lists a suite of representative software components that have been integrated in the current release or have open access in Gateway and are planned to release in CyberGIS Toolkit. These components are based on research codes developed in various cyberGIS-related community projects. Each component follows the CyberGIS Toolkit integration process based on community needs and code readiness levels such as code quality and scalability, and whether associated work has been published. All of the CyberGIS Toolkit components are open source.

CyberGIS Toolkit has a continuous integration process es-tablished to streamline the integration of an identified geospa-tial code through rigorous open source software engineering. If a code needs to be refined for evaluation from the open source community, code developers are provided with cloud-based development virtual machines customized for software library support for this particular code. Developers adopt appro-priate desktop-level software testing tools (e.g., Python Note-book/iPython) for component-level testing. As the code is ready for integration, two levels of testing are applied: porta-bility and scalaporta-bility. The portaporta-bility test is conducted with different combinations of operating systems, architecture, and software library versions through CI resources for software building and testing (e.g. https://www.batlab.org/). The scal-ability test requires high performance computing expertise and computational performance profiling to identify potential performance bottlenecks on computing, memory, input/output (IO), and network. This test is critical to enhance the scalability of a code to both high performance computing resources and problem sizes. Oftentimes, such tests and computational inten-sity evaluation lead to improved scalable algorithms and novel computational techniques [18,19]. As the scalability of a Cy-berGIS Toolkit component is improved through this process, a major benefit for users is that they are able to solve geospatial problems through resolving big data, which would not be fea-sible using conventional GIS approaches. This is because for most of software components in Toolkit, computation repre-sents a major bottleneck when data volume is significant.

In addition to the software components (Table 1), CyberGIS Toolkit also manages software dependencies collectively. Current components rely on the following three types of open source software libraries/APIs:

• Language and system: Python modules (e.g., Numpy and Scipy), Lustre file system tools (for MPI IO), MPI, OpenMP, CUDA, and SPRNG (parallel random number generator);

• Geospatial software: GEOS, Proj4, GDAL, PySAL, Shapely, Geoserver, and PostGIS; and

• Performance profiling: PAPI, IPM, and Darshan.

CyberGIS Toolkit can be downloaded and deployed on computational resources configured with parallel computing capabilities. Toolkit deployment includes both programming libraries and applications that can be directly used by end users.

4. GISolve middleware—open integration

The open integration strategy is designed to provide interoperable access to cutting-edge CI resources and establish spatially intelligent programming capabilities to help application developers directly benefit from accessing advanced CI capabilities. To achieve this, geospatial software experts, who are trained to program using geospatial tools and services but may not be well equipped to directly work with advanced CI and geospatial big data, need to be enabled to bridge the technical chasm and hence, the open integration strategy is focused on this gap. Specifically, the open integration strategy manages the complexity of CI access, while providing them with spatially aware APIs.

The GISolve middleware fulfills this role enabling open integration of advanced computing and information infrastructure with geographic information system capabili-ties for computationally intensive and collaborative geospa-tial problem solving via a suite of open service APIs (available at:http://sandbox.cigi.illinois.edu/home/doc/gosapi/ GISolveOpenServiceAPI.html). In particular, the GISolve Open Service API defines a set of REST Web service inter-faces for authentication, application integration, and CI-based geospatial computation; plays an important role in the integra-tion of applicaintegra-tions and the management of the compute and data requirements; and is a key enabler for the open access provided by the CyberGIS Gateway. Furthermore, GISolve middleware is spatially aware and models the computational intensity of spatial analysis and modeling to represent compu-tational requirements based on CI [4].

Key GISolve web service interfaces for enabling open in-tegration include: (1) application integration API provides a

(4)

mechanism to integrate application software into the cyberGIS software environment and allows application software devel-opers and contributors to define customized user/service inter-faces, deploy their applications on CI resources and publish these services into the cyberGIS software environment; (2) se-curity API provides a token-based authentication and autho-rization framework for integrating services, web applications (e.g. CyberGIS Gateway), and CI resources; and (3) compu-tation API manages the complexity of computation, data and visualization resources to provide on-demand and flexible mechanisms to access CI resources needed to support cyberGIS analytics.

5. Impact

The three modalities of the open cyberGIS software approach provide a comprehensive open software solution to advancing cyberGIS and related domain sciences, high-performance spatial analysis and modeling algorithm devel-opment, and scalable computational methods. For example, CyberGIS Gateway has been used across the globe for var-ious cutting-edge research and education purposes (Fig. 1). CyberGIS Gateway can also be widely accessed by general public to gain understanding about advanced CI-enabled sci-entific problem solving through customizable and friendly user interfaces. The holistic cyberGIS software approach has helped improve the software capabilities of the USGS National Map program, which also has significant and broad societal im-pacts. As a software contributor, USGS published a scalable map reprojection software as an open source software after the CI-based pRasterBlaster integration significantly improved the scalability of this software on thousands of processors. On the other hand, as a cyberGIS community user, they found and are using TauDEM, another Toolkit component, to accelerate the National Hydrography Dataset (NHD) research. This balanced and holistic open software approach has potential to be appli-cable to other inter- and multi-disciplinary communities facing geospatial big data challenges. As most of the Toolkit compo-nents have gone through software engineering and scalability testing on advanced CI resources, Toolkit components them-selves and the experience of developing scalable software on CI have the potential to be directly adopted or adapted for in-dustrial use.

6. Conclusions

Over the past several years, cyberGIS has grown rapidly in an organic and distributed fashion into a complex ecosys-tem of online interfaces, software tools and services, which enable a number of spatial analysis, modeling and simula-tion applicasimula-tions by employing the capabilities of advanced CI. This growth has presented and crystallized unique require-ments that are not fully met by the traditional open source software model. In this paper, we have laid out our open soft-ware approach to innovating and sustaining an open cyberGIS software ecosystem. Specifically, to address the requirements

Fig. 1. Spatial distribution of CyberGIS Gateway users across the globe.

posed by various geospatial communities with varied levels of computational expertise and technological skills, we estab-lished an open-access, open-source and open-integration soft-ware approach with three distinct but interrelated modalities. Within the NSF CyberGIS software project these three modal-ities are represented respectively by (1) CyberGIS Gateway, focused on broad communities without sophisticated techni-cal knowledge; (2) CyberGIS Toolkit, focused on cyberGIS ex-perts with deep technical knowledge of both GIS and CI; and (3) GISolve Middleware, that bridges the gap between Gate-way and Toolkit and focuses on providing GIS developers with friendly service interfaces for supporting CI-enabled applica-tion integraapplica-tion and execuapplica-tion. Together, they form a cutting-edge geospatial big data and compute platform for cyberGIS communities.

From CI software perspective, the Gateway modality is similar to successful software platform approaches de-veloped in other science domains, such as the NanoHub (http://nanoHUB.org, Madhavan et al. [20]) in nanotechnol-ogy and the generalized HubZero approach (http://hubzero.org, McLennan and Kennell [21]). A key difference from such similar approaches is that cyberGIS software focuses on provisioning geospatial capabilities and exploiting geospatial characteristics in big data and compute. We believe this ex-panded approach to openness employed by the cyberGIS soft-ware ecosystem represents a powerful enhancement to open source that has undoubtedly benefited the cyberGIS community and likely holds potential for adoption in other science commu-nities.

Acknowledgments

This paper and associated materials are based in part upon work supported by the National Science Foundation (NSF) under grant numbers: 0846655, 1047916, 1354329, and 1443080. This work used the NSF Extreme Science and Engineering Discovery Environment (XSEDE). NSF supports XSEDE under grant number 1053575. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF.

(5)

References

[1] Wang S, Armstrong MP. A quadtree approach to domain decomposition for spatial interpolation in grid computing environments. Parallel Comput 2003;29(10):1481–504.

[2] Wright DJ, Wang S. The emergence of spatial cyberinfrastructure. Proc Natl Acad Sci 2011;108(14):5488–91.

[3] Wang S, Hu H, Lin T, Liu Y, Padmanabhan A, Soltani K. CyberGIS for data-intensive knowledge discovery. ACM SIGSPATIAL Newslett 2014; 6(2):26–33.

[4] Wang S, Armstrong MP. A theoretical approach to the use of cyberinfrastructure in geographical analysis. Int J Geogr Inf Sci 2009; 23(2):169–93.

[5] Wang S. A CyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Ann Assoc Amer Geograph 2010;100(3): 535–57.

[6] Wang S, Wilkins-Diehr NR, Nyerges TL. CyberGIS—toward synergistic advancement of cyberinfrastructure and GIScience: A workshop summary. J Spat Inform Sci 2012;4:125–48.

[7] Wang S. CyberGIS: Blueprint for integrated and scalable geospatial software ecosystems. Int J Geogr Inf Sci 2013;27(11):2119–21. [8] Liu YY, Padmanabhan A, Wang S. CyberGIS gateway for enabling

data-rich geospatial research and education. Concurr Comput.: Pract Exper 2015;27(2):395–407.

[9] Wang S, Armstrong MP, Ni J, Liu Y. GISolve: A grid-based problem solving environment for computationally intensive geographic information analysis. In: Proceedings of the 14th international symposium on high performance distributed computing (HPDC-14)—challenges of large applications in distributed environments (CLADE) workshop. IEEE Press; 2005. p. 3–12.

[10] Wang S, Anselin L, Bhaduri B, Crosby C, Goodchild MF, Liu Y, Nyerges TL. CyberGIS software: A synthetic review and integration roadmap. Int J Geogr Inf Sci 2013;27(11):2122–45.

[11] Wang S, Zhu X-G. Coupling cyberinfrastructure and geographic information systems to empower ecological and environmental research. BioScience 2008;58(2):94–5.

[12] Tang W, Wang S, Bennett DA, Liu Y. Agent-based modeling within a cyberinfrastructure environment: a service-oriented computing approach. Int J Geogr Inf Sci 2011;25(9):1323–46.

[13] Anselin L, Rey SJ. Spatial econometrics in an age of CyberGIScience. Int J Geogr Inf Sci 2012;26(12):2211–26.

[14] Padmanabhan A, Wang S, Cao G, Hwang M, Zhang Z, Gao Y, Soltani K, Liu YY. FluMapper: a cyberGIS application for interactive analysis of massive location-based social media. Concurr Comput.: Pract Exper 2014; 26(13):2253–65.

[15] Shi X, Wang S. Computational and data sciences for Health-GIS. Ann. GIS 2015;21(2):111–8.

[16] Lawrence KA, Wilkins-Diehr N, Wernert JA, Pierce M, Zentner M, Marru S. Who cares about science gateways? A large-scale survey of community use and needs. In: Proceedings of the 9th gateway computing environments workshop. (GCE’14), Piscataway (NJ, USA): IEEE Press; 2014. p. 1–4.http://dx.doi.org/10.1109/GCE.2014.11.

[17] Wang S, Liu Y. TeraGrid GIScience Gateway: Bridging cyberinfrastruc-ture and GIScience. Int J Geogr Inf Sci 2009;23(5):631–56.

[18] Finn MP, Liu Y, Mattli MD, Guan Q, Yamamoto KH, Shook E, Behzad B. pRasterBlaster: High-performance small-scale raster map projection transformation using the extreme science and engineering discovery Environment. In: The XXII international society for photogrammetry & remote sensing congress, Melbourne, Australia, August 25–September 1, 2012, 2012.

[19] Fan Y, Liu YY, Wang S, Tarboton D, Yildirim A, Wilkins-Diehr N. Accelerating TauDEM as a scalable hydrological terrain analysis service on XSEDE. In: Proceedings of the 2014 annual conference on extreme science and engineering discovery environment. (XSEDE’14), Atlanta (GA): ACM Press; 2014. July 13–18, p. 5:1–5:2.

[20] Madhavan K, Zentner L, Farnsworth V, Shivarajapura S, Zentner M, Denny N, Klimeck G. nanoHUB.org: Cloud-based services for nanoscale modeling, simulation, and education. Nanotechnol Rev 2013;2(1): 107–17.http://dx.doi.org/10.1515/ntrev-2012-0043.

[21] McLennan M, Kennell R. HUBzero: A platform for dissemination and collaboration in computational science and engineering. Comput Sci Eng 2010;12(2):48–52.

References

Related documents

Before beginning training, participants underwent measurements of skeletal muscle oxygen consumption using near-infrared spectroscopy (NIRS) at rest (resting muscle ˙VO 2 )

- Ak bude bielizeň následne sušená v sušičke bielizne, zvoľte otáčky odstredenia podľa návodu výrobcu sušičky.. Downloaded From

Actively promote wholesale and retail electric competition. States that belong to transmission 

• Fossil Succession: Any rock containing a particular fossil species was deposited some time between the origin (first appearance) and the extinction (last appearance) of

An alternative to shorten this process is presented with Phased Plan Review (PPR), which is “the process that engages the Facilities Development Division (FDD), at its

EDP excessive deficit procedure EGs Employment Guidelines EMI European Monetary Institute EMU Economic and Monetary Union EPC Economic Policy Committee EPL

Statistics and inference are critical in machine learning, and it is also nice to understand how information theory fits into this picture.. I haven't included an information

The objectives of this study include: to assess the impact of various types of land use along Sungai Langat; describes hydrological change and water quality variation along this