Robustness and Performance - Computational Geometry

3.4 Computational Geometry

3.4.2 Robustness and Performance

Precision. It is important to distinguish between precisionandaccuracy. Accu- racy is the degree to which a map or digital database maps the real world objects

sufficiently. Accuracy is an issue of how to collect high quality data. In GIS, one can consider horizontal and vertical accuracy to measure the accuracy of a geo- graphic position. In contrast, precision is the level of exactness of the information digitally represented in a digital database or data type (for example a concrete floating-point or integer arithmetic implementation). Hence, data may be very precise, but still inaccurate. As the development of GIS components such as a geometry is not directly engaged in the collection of data, the sequel of this section will be focused on precision issues.

There are several possibilities of precision, i.e. how and where to store information. In GIS, such types of precision are sometimes calledprecision models. The most common precision models are:

• Fixed precision.The value will be rounded according to a scale factor: point.x = round( point.x * scale ) / scale

point.y = round( point.y * scale ) / scale

A scale greater than one means that the precision point is to the right of the decimal point, i.e. we have a high precision (more bits). A scale smaller than one means that the precision point is to the left of the decimal point, i.e. we have a smaller precision (less bits). Result values of computation will be rounded by the rules above and have the same number of digits as its input values. Raster models always accord to fixed precision, because they use a homogeneous unit scale.

• Floating point precision. The values will use the full precision provided by the according floating-point data types based on the IEEE floating-point standard (seeIEEE Floating-Point Standardin Appendix B). In Java, these are the elementary data types floatanddouble. Computation results may have more digits than the input values.

• Exact precision. Exact arithmetic is the general term for arithmetic which realize exact results for input data in computation without loss of precision by rounding. In general, this is achieved by integer and rational arithmetic. Exact geometric computation is discussed below (seeExact geometric computation) in detail.

Robustness of algorithms. Many actual GIS use floating-point arithmetic (see Appendix B). Floating-point arithmetic is inherently imprecise and its naive use

can set axioms of arithmetic out of order [Sch98]. For instance, the evaluation of the term

3·0.4 = 1.9999999999999999

(evaluated with JDK 1.5) is a typical floating-point rounding error. However, floating-point computation is hardware-supported (see IEEE Floating-point standard, section B) and hence very fast. A comprehensive work about floating-point arithmetic and its weak points was given by Goldberg [Gol91].

Such rounding-errors are responsible for robustness problems. Inexact results can be used in combinatorial computations and make the complete algorithm fail. Schirra defines that ".. implementation of an algorithm is considered to be robust if it produces the correct result for some perturbation of the input. If the perturbation is small an implementation is called stable ..." [Sch98]. As such stability is referred to numerical computation it is also called numerical stability. Schirra and [LY01] review robustness and precision issues in a broad and detailed manner.

A wide spread approach on how to implement algorithms in a stable way is the use of athreshold value Epsilon (). This technique follows the rule of thumb "If something is close to zero it is zero". A trigger valueis added to a numerical value to test whether this value is (almost) equal to another numerical value. The original intention is that the difference between two numerical values can be that small that, in practice, both values can be assumed to be equal. A big problem is the choice of. In practice it is a tiny constant which value is found following the Try-and-Error process until all current tests work for the input data and no errors occur. The technique of finding anvalue is calledepsilon-tweaking[Sch98].

As a result of rounding errors in floating-point arithmetic, the addition of an Epsilon value is a popular and easy technique. Its big disadvantage is the risk of processing correctly on local tests, but resulting in faulty assumptions in relation to the general context. A naively implemented Epsilon condition might result in errors, illustrated in the following basic example: The program decides thatA=B andB =C, but in realityA6=C, so the program leads to incorrect results. Figure 3.10 shows another example from practice: A test using a small value might suppose that the adjacent lines are equal in their degree, and one might not notice their different directions. But all together, the polyline is obviously not straight.

Figure 3.10:The polyline seems locally straight, but not straight at all

Consequently, the use of Epsilons can help in some situation, but may result in later robustness problems. Typical examples of robustness problems occur in the computation of the intersection point between two line segments, the Point-Line Orientation test and the Point-In-Ring test (seeGeometric Predicatesin section 6.2). Another problem, which occurs in practice is thedimensional collapse: Due to miss- ing precision or rounding errors the topological dimension of an operation result is lower than the expected dimension. For instance, in the event that the precision is not high enough, the intersection between two very small regions may result in a single point or a line instead of a region (see [viv03b] for further explication).

The following part of this section addresses possibilities to avoid rounding errors without using Epsilons.

Exact geometric computation. The field of exact geometric computation has turned into one of the biggest issues in CG research. Exact geometric computation means the algorithms make correct decisions for all input data, not only for some perturbation of it. It is an approach to secure the robustness of algorithms.

There are many approaches for the realization of exact computation:

• Integer arithmetic.Integer arithmetic is based on binary representation and binary arithmetic operations. It can result in an overflow, but not in rounding errors. The use of arbitrary precision integers eliminates the overflow problem. Since integral input is usually bounded in size (as the 16-bit elementary data typeint), some approaches use multiple precision integer with a fixed precision according to the binary size of the input data.

• Rational arithmetic. Exact rational arithmetic is the exact representation of a number by a numerator and a denominator (both in integer arithmetic).

In general, divisions are typical reasons for rounding errors. Rational arithmetic avoids divisions, or better, they are postponed: a_b ÷_dc = a·d_b·c

• Homogeneous Coordinates.Homogeneous coordinates can be used to represent the input data. Homogeneous coordinates use an additional ordinate, which can serve as a common denominator, and therefore avoid divisions. • Symbolic and implicit representation. The result data is not directly com-

puted, but only represented by its original input data. A numerical number as a result of complex computation can be represented by an expression tree which reflects the history of the computation of this numbers [MNU97]. An intersection point, for example, can be represented by the two line segments which intersect.

Figure 3.11 shows the map overlay process suggested by Brinkmann and Hin- richs [BH98]. As the algorithm uses the exact input data, result data must be rounded to the internal geometry precision afterwards.

Figure 3.11:Exact geometric computation in map overlay [BH98]

Realizing Exact Integer Arithmetic. Exact integer arithmeticcan be realized by amongst others, two approaches [BH98]: The first implements an abstract data type which represents an arbitrary integer value. That data type must support exact computation by offering robust operators (Addition, Subtraction, Multipli- cation, Division, Square root, etc.). The advantage of this approach is that any geometric operation can be based on this robust data type. There exist several software packages based on this approach ([SVH89], [She97], [BBP95], [MNU97]). The greatest disadvantage of this technique is the time consuming of each elementary operation. Michael Karasick reports about an experiment [KLN91] in which he

replaced a floating-point arithmetic package by a rational-arithmetic package in a Delaunay triangulation implementation. The results of the experiment showed that the implementation using rational arithmetic was about 10.000 times slower than the floating-point implementation. However, Güting presented an integer arithmetic based geometric domain calledREALM[GS93] [RHG], which underlies the spatial data types and seems to be more efficient.

Often not all operations must be implemented in a robust way and computed exactly. In fact, to construct complex robust algorithms or algorithms which offer the correct result for the majority of cases, only a few operations need to be implemented in a robust manner. This theory was adapted by the second approach. It considers certain geometric primitives which are implemented by a hidden technique to assure exact results. Thus, the exact computation is not implemented at every level of elementary arithmetic operations (Addition, Multiplication, etc.). The package of basic algorithms of Fortune and van Wyk follows this technique [FW96] and shows an immense run-time advantage in comparison to the first approach.

Adaptive Evaluation. As mentioned above, geometric algorithms based on floating-point arithmetic compute correct results most times and fail only in spe- cial cases. However, in practice, algorithms work correctly with floating-point arithmetic most of the time as well. Hence, the substitution of all floating-point computations by exact computations would result in a huge performance overhead which is unnecessary. Adaptive Evaluation tries to evaluate exact results only when needed. Stefan Schirra [Sch98] gives detailed explanation and a broad overview of implementations for this technique, which is also calledlazy evaluation.

A simple form of lazy evaluation is a floating-point filter, which has become a well established approach in geometric computation. Floating-point filters calcu- late a tight error bound of an operation with exact input data and compare it with the floating-point result (for instance a line intersection point). If the precision is within the error bound, the result of the floating-point-arithmetic computation will be used. Otherwise, the result will be computed using exact integer arithmetic.

In [BH98], Brinkmann and Hinrichs implement a floating-point filter and show how to combine imprecise floating-point arithmetic with exact integer arithmetic to achieve exact computation of determinant signs. Implementation experiments showed that the integer arithmetic implementation is about 50 times slower. Thus, the overhead of error-bound-computation is by far the better solution.

There are more approaches of exact computation like the interval arithmetic which is based on approximation and error bound, defining an interval that con- tains the exact result. [Sch98] reviews this idea as well.

As stated before, not all algorithms have to be implemented in a robust manner to achieve sufficient results for the practice. In most of the cases, approximated results, which may be rounded numeric floating-point values, are sufficient so that floating-point algorithms deliver correct results and only fail occasionally. [Sch98] states correctly that this is given by the fact that almost all input data in geometric algorithms arerealsgiven in floating-point or even integer arithmetic. The selec- tion of the algorithms, which shall be implemented robustly, depends on its later application. For example, it might not be noticeable when a single point of a con- vex hull is lightly slighted aside, but a test which determines whether a point lies on the left or right side of a line segment can result in extensive errors if it is not cal- culated correctly. Generally, the implementation of an algorithm in a robust way is recommended if the procedure creates decisive input data for another algorithm. Computational Performance. Algorithms can be implemented robustly, but on costs of performance. The robust implementation of single basic operations results in a similar problem. Their performance is directly linked to the performance of the algorithms using them. In general, robust algorithms will have a worse performance than most non-robust algorithms. Hence, the trade-off between performance and precision (or even robustness) is an important issue. Another factor in computational performance is the data structure the algorithms work on. In general, topological structures have considerable performance advantages.

4 Implementation Aspects

A geometry data model is a complex data structure. The correct implementation of all semantic properties and relations of and between geometric objects requires a deep understanding of the data model. Though the data model is a core aspect in a geometry implementation, it is not the only one. Many GIS have shown weak points. This chapter discusses implementation relevant aspects which address the problems turned up in the development and use of previous GIS geometries (i.e. their data model) and explain how this implementation deals with them.

4.1 Programming language

One of the most basic and important choices is the one of the programming language. This should not only be the one which covers best the needs of the actual problem, but should also consider the language of the environment in which the geometry will work. A system with heterogeneous languages make object translations from one language to another necessary. In distributed systems, those aspects are not often a problem, but do still cost performance and, in practice, a geometry rarely will be installed separated from its above GIS layers.

By the nature of its problem, the object oriented (oo) technology is the most appropriate to implement the data model. The data model specifies data types which shall represent real world objects and follow a clear hierarchy with object inheritances. Object oriented features likemethod overloadingare ideal to define a general solution for higher objects (i.e. objects on top of the hierarchy), and more specialised solutions for certain types in a lower position within an inheritance hierarchy.

Interfaces map the OGC Feature Geometry and are implemented by classes. One interface defines the capability of a data type by specifying methods the object should provide. The interfaces structure represent the data model. Some object orientated language like C++ allow classes to inherit from two or more

superclasses. This is not possible in Java (see illustration (d) in figure 4.1). How- ever, Java overcomes this limitation by allowing multiple interface inheritance, instead of multiple class inheritance (see illustration (b) in figure 4.1).

Figure 4.1:Inheritance in Java: Java supports multiple interface inheritance, but not multiple class inheritance

The implementation in this work was done in Java due to its advantages spe- cially in portability. C++ is a high-performance language, but its code depends on the operation system on which it runs. Java source code is platform independent and will be compiled and translated into machine specific code by theJava Virtual Machine (JVM). Platform independency is one of the main requirements on OGC technologies [OGC03]. Furthermore, Java has one of the largest internet commu- nities. Many GIS, which often contain components based on OGC specifications, were implemented in this seminal oo language. The free open source toolEclipse was used to organize and develop the implementation of this work. It is a power- ful and widespread Java IDE (Integrated development environment) which offers a comprehensive list of functions to support the developer.

Coding Conventions. Open source software is usually not programmed by a single person, but rather by a community of programmers from different places, often even from different countries. This implementation works as a basis for future work by the GIS community (for example GeoTools). Coding conventions have become a de facto standard. They are important for a number of reasons (from [Sun06]):

• 80% of the lifetime cost of a piece of software goes to maintenance.

• Hardly any software is maintained for its whole life by the original author. • Code conventions improve the readability of the software, allowing engi-

• If one ships a source code as a product, one needs to make sure it is as well packaged and clean as any other product created.

The source code of this implementation conforms to the SUN coding conventions [Sun06] in order to make it uniform and understandable. The following naming conventions were defined by SUN:

Type Rules Examples

Packages Package names have to be written in lower com.sun.eng

case ASCII letters and should be one of the org.geotools.geometry.iso top level domain names, followed by names

according to the organization’s internal own naming convention

Classes Class names should be nouns with the first class Curve;

& letter of each internal word capitalized. class DimensionModel; Interfaces The name should be simple and descriptive.

Methods Method names should be verbs. The first getLength(); letter in lowercase and the first letter isCycle();

of each internal word capitalized. computeIntersectionPoint(); Variables Variable names should start with a lower int i;

case letter; first letter of each internal char c;

word is capitalized. They should not start GeometryGraph graph; with underscore (_) or dollar sign ($) List<Ring> surfaceBoundary; characters, even though both are allowed. Ring externalBoundary; The names should be short yet meaningful.

One character variables should only be used for temporary variables, for example i, j, k, m, and n for integers; c, d, and e for characters.

Constants Constant names should be all uppercase static final int MIN = 4; with words separated by underscores (_). public static String

WKT_POINT = "POINT";

Table 4.1:SUN Naming Conventions for Java

The GeoAPI Interfaces, generated from the Abstract Specification, carry their names without the"GM_"prefix, e.g.Curveinstead ofGM_Curve,Ringinstead of GM_RingorDirectPositioninstead ofGM_DirectPosition. To help users and developers easily distinguish between interfaces and their implementing classes, their class names end with"Impl", e.g. CurveImplor RingImpl. Note that code which was adapted from the JTS was not verified whether it absolutely conforms to the above naming conventions since such code is not supposed to be modified by future developers in this project.

In document DiplThesisJena07 (Page 37-47)