4.4 Document Profiles
4.4.3 Comparing Geographic Expressions
Many geo-taggers that extract geographic expressions from text documents only assign point geometry information to a location as the only geometry information, regardless of the actual geographic extent of the location. However, as already mentioned in Section 4.3.3, containment information about locations is also often associated with the extracted locations. For this, we exploit this containment and thus the granularity information of the locations to compare two geographic expressions.
Similar to the structure of the previous section, in the following, we will present geographic relationships and the geographic mapping function as well as two algorithms to compare geographic expressions with each other.
Geographic Granularities
As was explained in Section 2.4.1 and exemplarily shown in Figure 2.2(c) (page 18), geographic expressions can be organized hierarchically similar to temporal expressions. Thus, every geographic expression can be associated with a specific granularity such as city or country. We assume the following geographic granularities G = {Gcity, Gstate, Gcountry}, e.g., “Leipzig” can be anchored in Gcity, “Saxony” can be
4.4 Document Profiles
anchored in Gstate, and “Germany” can be anchored in Gcountry. Although many more geographic
granularities exist (e.g., address, suburb, county, etc.), we assume – for the sake of simplicity – only these three geographic granularities to explain how we compare two locations.
Geographic Disconnect and Containment Relationships
To compare two locations of the same granularity with each other, we introduce a geographic disconnect relationship. Formally, this disconnect relationship is defined as follows:
Definition 4.9. (Geographic Disconnect Relationship∅G)
Using the geographic disconnect relationship ∅G, the relationship between two locations gi ∈ G0,
gj ∈ G00, with G0 = G00, can be determined as gi∅Ggj, if gi 6= gj.
Due to the hierarchical organization of geographic information, two locations of the same granularity are either equal or geographically disconnected. However, instead of comparing only locations of the same granularity with each other, it is necessary to also compare locations of different granularities since typically documents and thus geographic document profiles contain locations of several granularities. For that, we introduce a geographic containment relationship that is formally defined as follows:
Definition 4.10. (Geographic Containment Relationship⊂G)
Given two locations gi ∈ G0and gj ∈ G00, with G0being more fine grained than G00. The geographic
containment relationship⊂Gbetween giand gj holds (gi ⊂Ggj) if giis contained in gj.
Note that one could also determine whether or not a containment relationship holds between two locations based on explicit region information (specified, e.g., in the form of a polygon). However, as already mentioned above, the hierarchical containment information is typically accessible using the gazetteer of a geo-tagger while explicit polygonal information is often not available. Thus, we rely on the containment information rather than explicit polygonal information about the locations.
Geographic Mapping Function
Now, locations of the same granularity can be compared to each other, and locations of different granularities can be checked for a containment relationship. However, two locations can be of different granularities without a containment relationship. Although in the case of a linear geographic hierarchy as in our example (G = {Gcity, Gstate, Gcountry}) there would either be a containment or a disconnect
relationship between any two locations, in the case of a non-linear hierarchy, there could also be partially overlapping locations. Thus, and since it is required for the “mapping to equality” procedure for locations that will be described below, we introduce a geographic mapping function that is defined as follows: Definition 4.11. (Geographic Mapping FunctionαG)
The geographic mapping function αG(gi0) = gi00 maps the location g0i ∈ G0 to the next coarser
geographic granularity, so that gi00 ∈ G00, with G00 being the next coarser granularity of G0 in the
geographic hierarchy.
Assuming the three example geographic granularities G = {Gcity, Gstate, Gcountry}, locations of the
granularities city and state can be mapped to the state and country granularities, respectively, by applying the geographic mapping function. For example, αG(“Leipzig”) = “Saxony” and αG(“Saxony”) =
4 The Concept of Spatio-temporal Events
Algorithm 4.3Procedure to compare two locations of any granularities making use of the geographic disconnect ∅Gand containment ⊂Grelationships, and the geographic mapping function αG.
1: procedureCompare_Locations(g1, g2)
2: g1∗= g1, g∗2 = g2 . keep original values of g1 and g2
3: if(g1.granularity < g2.granularity) and (g1 ⊂G g2) then
4: return g1∗contained in g∗2
5: else if(g2.granularity < g1.granularity) and (g2 ⊂Gg1) then
6: return g2∗contained in g∗1
7: end if
8: while(g1.granularity < g2.granularity) do
9: g1 = αG(g1)
10: end while
11: while(g2.granularity < g1.granularity) do
12: g2 = αG(g2) 13: end while 14: if(g1∅Gg2) then 15: returng∗1 disconnected of g2∗ 16: else 17: returng∗1 equals g∗2 18: end if 19: end procedure
“Germany”. Of course, as the temporal mapping function, the geographic mapping function can also be applied recursively, e.g., αG(αG(“Leipzig”)) = “Germany”. Since the same example locations will be
used below when explaining the algorithms for comparing locations and determining their similarity, the hierarchy structure of the locations is depicted in Figure 4.3.
Algorithm to Compare Locations
Similar to Algorithm 4.1 to compare chronons with each other, we describe in Algorithm 4.3 the pro- cedure to compare two locations with each other independent of their granularities. In this procedure, the geographic disconnect relationship (Definition 4.9) and the geographic containment relationship (Definition 4.10) as well as the geographic mapping function (Definition 4.11) will be used.
In lines 3 to 7, the two locations g1and g2are checked for a containment relationship. If there is no
containment relationship, both locations are mapped to the same granularity in lines 8 to 13.8 Then, in lines 14 to 18, the geographic relationship between g1and g2is determined as either equal of disconnected.
In contrast to chronons, which can be chronologically ordered, locations can only be distinguished to be either equal, disconnected or contained in each other. In addition, note that not all the geographic
8
As for the corresponding algorithm for chronons, for the sake of simplicity, we assume that the geographic hierarchy is linear. Thus, the algorithm would have to be slightly modified if the hierarchy was more complex: (i) the containment relationship (lines 3 to 7) would be applied to linearly related locations only; (ii) instead of checking for the granularities of g1and g2to be identical (8 to 13), one would map two non-linear related locations up to their common governor granularity resulting in “(close to) overlap” relationships, which were to be distinguished from the equal relationship. Again, we assume that any
4.4 Document Profiles
g1 g2 granularities mappings relation
Germany Germany Gcountry, Gcountry – g1= g2
Germany Spain Gcountry, Gcountry – g1∅Gg2
Germany Saxony Gcountry, Gstate – g2⊂Gg1
Spain Saxony Gcountry, Gstate αG(g2) g1∅Gg2
Germany Heidelberg Gcountry, Gcity – g2⊂Gg1
Germany Leipzig Gcountry, Gcity – g2⊂Gg1
Spain Heidelberg Gcountry, Gcity αG(αG(g2)) g1∅Gg2
Spain Leipzig Gcountry, Gcity αG(αG(g2)) g1∅Gg2
Saxony Heidelberg Gstate, Gcity αG(g2) g1∅Gg1
Saxony Leipzig Gstate, Gcity – g2⊂Gg1
Heidelberg Leipzig Gcity, Gcity – g1∅Gg2
Table 4.4: Examples showing how to compare two locations with each other to determine their geographic relationship. If both locations are identical, there is an equal relationship between g1 and g2
without mappings as exemplarily shown for g1= g2= Germany.
... Germany ... Heidelberg Saxony Leipzig Spain Figure 4.3: Hierarchy structure of the example locations.
relations described in Section 2.4.1 are considered in Algorithm 4.3. The four relations of the region connection calculus 8 (RCC8) “tangential proper part”, “non-tangential proper part” and their inverses are all captured as containment relationship. Furthermore, while the algorithm’s “equal” relationship is identical to the RCC8 “equal” relationship, the RCC8 relations “disconnected” and “externally connected” are both considered as disconnected by the algorithm. Finally, the “partially overlapped” relationship is not considered because it can only occur if the geographic hierarchy is not linear or if arbitrary regions are compared to locations of specified granularities. As the temporal overlap relation, the geographic overlap relation will also become relevant in Chapter 5 when information retrieval with temporal and geographic constraints are developed.
In Table 4.4, we show some examples how two locations can be compared by applying Algorithm 4.3. In Figure 4.3, the geographic hierarchy of the following five example locations is depicted: “Germany” ∈ Gcountry, “Spain” ∈ Gcountry, “Saxony” ∈ Gstate, “Heidelberg” ∈ Gcity, and “Leipzig” ∈ Gcity.
Comparing any two of the five locations with each other, there is either an “equal”, disconnected (∅G), or
containment (⊂G) relationship.
Mapping Locations for Equality
In Algorithm 4.4, the mapping locations for equality procedure is shown. As its temporal counterpart (Algorithm 4.2), it will become important in Chapter 6. Its structure is also quite similar and the goal is analogous to determine the similarity between two locations of any granularities. To decide how similar two locations are based on their hierarchical distance, the number of necessary mapping steps and the granularity when the two locations match are determined. The less mappings are necessary and the finer the granularity when the two locations match, the more similar are the two locations.
4 The Concept of Spatio-temporal Events
Algorithm 4.4Procedure to map two locations of any granularities until they are equal. The procedure makes use of the geographic disconnect relationship ∅Gand the geographic mapping function αG.
1: procedureMap_Locations_for_Equality(t1, t2)
2: map1 = 0, map2 = 0 . tracking the mapping steps of g1 and g2
3: g1∗= g1, g∗2 = g2 . keep original values of g1 and g2
4: while(g1.granularity < g2.granularity) do
5: g1 = αG(g1)
6: map1 = map1+ 1
7: end while
8: while(g2.granularity < g1.granularity) do
9: g2 = αG(g2) 10: map2 = map2+ 1 11: end while 12: while(g1∅Gg2) do 13: g1 = αG(g1) 14: g2 = αG(g2) 15: map1 = map1+ 1 16: map2 = map2+ 1 17: end while
18: returng1∗equals g2∗after map1and map2mappings on granularity g1.granularity
19: end procedure
In Table 4.5, we show how our five example locations are compared, assuming the geographic granu- larities G = {Gcity, Gstate, Gcountry, Gglobal}, with Gglobalbeing the root granularity of the geographic
hierarchy on which all locations become equal.
Summary
In this section, we introduced two algorithms to compare two geographic locations with each other. While the first algorithm determines the geographic relationship between two locations, the second algorithm determines how similar two locations are. Both algorithms will become important when describing spatio-temporal and event-centric search and exploration tasks in Chapter 5 and Chapter 6, but also when comparing spatio-temporal events with each other (Section 4.4.5). However, before we explain how to compare spatio-temporal events with each other, we first explain how spatio-temporal events can be organized by introducing the concept of event document profiles.