Challenges of Spatial Data Infrastructure

5. POLICY IMPLEMENTATION

5.2. Challenges of Spatial Data Infrastructure

“The challenge for spatial professionals is to provide megacity managers, both polit-ical and professional, with appropriate ‘actionable intelligence’ that is up-to-date, citywide and in a timely manner to support more proactive decision making that encourages more effective sustainable development.” (FIG, 2010)

5.2.1.Human Error

Spatial data is, regardless of accuracy and precision, still data. Therefore, it can still be manipulated incorrectly and subject to bias and inadequate methods. The more powerful the tool, the more easily a small human error can become a major error in the end product. There are two sides to the issue of accuracy in digital spatial metrics. The excess of detailed information can cause computation of spatial metrics to be bogged down by sheer volume and relevant information to be lost. The human intervention in data processing can reduce such issues but at the cost of creating the possibility of causing large errors stemming from small misinterpretations in data.

Langford, Gergel, Dietterich, and Cohen (2006) studied errors in landscape pattern analyses and found there is “potential for large errors in nearly every landscape pattern analysis ever published.”

The use of high-precision technology is marketed as a tool for the reduction in human error, which it may successfully accomplish most of the time, the abstraction of data is in the hands of the researcher and depends on the qualification and intuition of that individual to properly direct the data processing in order to produce a true and useful product. Issues of acci-dental human error - like adding an extra zero or forgetting a decimal point - can compound over the time the data is processed and generate a large error. Issues of intuition or bias are well documented in scientific research and apply to the analysis of spatial data as well.

It does not appear that the human influence, both literal and metaphysical, on the accu-mulation of knowledge will decrease with the growth of digital tools. Quite the contrary, the role of researchers may be more integral to the process of generating information than ever since they are equipped with ever more powerful tools to test their hypotheses. The essence of spatial data is the productive abstraction of large amounts of information into useful and influential knowledge, or “actionable intelligence”. Without the conscious analysis of continually growing data, the use of remote sensing and digital cartography is no more useful than recording an entire person’s life only to store the tapes in a basement.

5.2.2.Scale in Spatial Metrics

A major feature of spatial metrics methods is the ability to isolate information spatially, such as by location, distance, and relationships to other features. Similarly to mathematical re-gression analysis, spatial analysis requires a careful selection of data. While in rere-gressions the selection of variables to avoid multicollinearity or hidden omitted-variable bias², similar issues can arise from the selection of space spatially analyzed.³ In addition to the selection of data, scale is an extremely significant factor in the results of spatial analyses.

The scale of the target region studied affects the averaged results of the analysis. For example, the assumption that New York City offers less green space for its citizens than Raleigh, North Carolina, may true if the measure is the amount of official parkland per person within city

2 Omitted-variable bias (OVB) occurs when a regression analysis is created leaving out one or more important variables.

The absence of that variable may cause the model to predict over or underestimations for the importance of other variables.

3 Multicollinearity, or collinearity, is a situation in which two or more predictor variables in a multiple regression are highly correlated, meaning one variable can predict the other.

limits. Raleigh offers over 1,200 square feet or park per resident while New York City offers about 250. If the figure is the percentage of the city reserved for park area, then New York City is far ahead of Raleigh, reserving nearly 20% of its area for green space, while Raleigh reserves barely 3%. The omitted variable in the two comparisons is density. New York City is twelve times denser than Raleigh, which allows it be an exponentially more efficient and compact city that offers green amenities to a greater number of people. There are barely any private lawns to maintain in Man-hattan, so almost the whole of green space in the city is concentrated in public parks. The counter-argument is that at the neighborhood scale, a child in the Four Acres neighborhood of Raleigh surely has more access to green space within a five-minute walk than a child in Lower East Side of New York.

Caruso et al. (2015) make the case that broad environmental sustainability goals lead to

“over-focusing” on density and spatial efficiency while ignoring smaller scales of spatial arrange-ments, such as the neighborhood, that could reconcile environmental and social goals.

5.2.3.Computational Capacity

A rather basic challenge of spatial data infrastructure is the computational capacity - both human and machine - of all the data acquired. The processing crunch is well documented in remote sensing literature, and the capacity has increased with time, although not quite as rapidly as data can be gathered.

Avelar and Tokarczyk (2014) put the threshold at large landscapes of 50,000 acres or larger as the point at which the computational resources for geometric operations become strained.

This is a relative threshold since the resolution of images can be altered in order meet the pro-cessing power of the computer or the capacity of the lens or laser in the sensor.

Moore’s Law⁴ assumes a doubling of technological capacity every two years. Although Moore’s theory was based on the growth of circuits in computer processors, the rate of growth of imaging resolution is analogous. Consumer-level digital cameras became popular in the early 2000s when cameras were able to record images at approximately 307,000 pixels (0.3 megapixels). Today, the highest resolution consumer-level camera offers 50,600,000 (50.6 megapixels), 168 times the resolution of early digital cameras and in keeping with a doubling approximately every two years.

This brings about a critical issue of how much data to capture and how often to do so. If “high-resolution” images of place were taken ten years ago at three megapixels, today those images would be inadequate for many of the processes that require images of 10, 20, or 30 megapixels.

The value of documenting something at the highest possible quality at any given time is likely the best strategy to avoid fast obsolescence. After all, no one would argue that black and white pictures of a century ago are of no use if they do not contain color.

In terms of 3-D spatial capture, the processing power required recording a point cloud of a physical object is relatively low. It need only keep up with storing the coordinates and infor-mation of the points as they are shot by the laser. The processing power required generating a mesh from the point clouds in order to visually render the scanned object is exponentially higher.

Each point must be located and connected to nearby points through an algorithm that suggests

4Gordon E. Moore, the co-founder of Intel and Fairchild Semiconductor, wrote a 1965 paper that described a doubling every year in the number of components per integrated circuit, projecting that pattern for a decade. In 1975, he revised the forecast to a doubling every two years.

the most appropriate connections. The mesh then must be rendered in 3-D and constantly increase or decrease the resolution in order to generate a fluid visualization of the object as the user moves it around the simulated space.

The human capacity to process spatial data is also tested. A LiDAR scanner may shoot millions of points at a soccer ball, but it cannot document what it cannot see - such as the other side of the ball. The solution is to shoot another several million points and merge the two point clouds to generate a spherical mesh. An informed user may see the soccer ball and realize that is is a sphere and model that sphere with just the measured diameter of the ball. The user-generated spherical model will be constituted of one surface and a few kilobytes of information while the spherical point cloud will be made of millions of points and several gigabytes. Common sense and proper training can save incalculable amounts of time and processing power as well as generate more useful end products.

5.2.4.Politics of Proof

In an era of polarized debate over the effects of climate change and human involvement, more spatial information may help sway some skeptics of the immediate environmental issues at hand with comprehensive and clear documentation of the physical world.

The political climate today provides no reassurance that empirical proof will convince policy-makers to pay attention to urban environmental issues rather than more politically advan-tageous issues, but information is as powerful a tool to shape public opinion and ultimately influ-ence policies.

Figure 42: D isputed Territory of Jamm u and K ashmir

The region of Jammu and Kashmir is visualized differently depending on the country the end-user accesses Google Maps. The Indian version considers the region part of India while the American version demarcates it as a region in dispute. Source: Google Maps USA, Google Maps India.

5.2.5.Visualization, Interpretation, Communication

An advantage of a functional spatial data infrastructure is the potential for the constant amassing and enhancement of information. The data is to be continually updated and improved through additional sources and connections so that information can be cross-referenced and checked for human or machine errors.

“No one debates that continual accumulation of knowledge is a key to future success to do this we must first determine what knowledge to pursue” (Linehan & Gross, 1998)