• No results found

Experiments after MediaEval Placing Task 2010

6.5 Experiments Georeferencing Informal Documents

6.5.2 Experiments after MediaEval Placing Task 2010

After Media Eval 2010 a set of experiments were performed in order to improve the Geo- graphical Knowledge approach and test the other approaches. The experiments performed with the geographical knowledge approach include the following improvements with respect to the preliminar experiments presented in the MEPT2010: 1) filtering out weak geographi- cal named Entities (e.g. the toponym Porto Alegre has the weak toponyms Porto and Alegre which could be erroneously matched as a topononym, 2) improving the focus detection phase for cases in which with several toponyms with the same state,country or continent appear in the medatada (in the original MEPT2010 submitted experiments the population sorting of these cases was not activated), 3) adding the Geonames Alternate Names file (with 2.9 million of features).

The experiments performed after MEPT2010 show that the improvements of adding a weak NE filter and the focus detection refinement can improve the results of the GeoKB georeferencing approach . On the other hand the addition of the Alternate Names data does not improve the results. The improvement with respect of the best results at MediaEval 2010 official evaluation is from 2740 videos (0.5382 of accuracy) to 2838 videos (0.5574 of accuracy) correctly georeferenced at a distance maximum of 100 Km.

After the improvement of the Geographical Knowledge Approach the georeferencing experiments done with this approach were the following:

1. Experiments to detect the relative importance of the metadata fields (title, description, and keywords) for the Flickr georeferencing task: the results show that the metadata field Keywords (tags) is the most important one, achieving results of 52% of accuracy at 100 Km (see in Table 6.4 the results of these experiments). The inclusion of the Title and/or the Description fields improves the results of the Keywords (tags) alone: Title and Keywords (53.7%), Keywords and Description (54.4%), Title and Keywords and Description (original TALP_2 configuration) achieves the 55.2% of accuracy at 100 Km. The use of only Title and Keywords achieves an accuracy of 21.7% at 100 Km. These results are slightly better than the results of J. Perea-Ortega et al. (2010), that used a Geo-NER for detecting Named Entities in Title and Description achieving an accuracy of 21.3%. Although is not clear how to compare the toponym disambiguation process between the two systems this may indicate that the performance of toponym recognition with Gazetteer lookup using Geonames is performing at state-of-the-art NERC level.

Table 6.4: Experiments with different metadata fields.

Metadata Accuracy (over 5091 videos)

fields 1km 5km 10km 50km 100km Title (T) 0.034 0.076 0.090 0.104 0.109 Description (D) 0.023 0.075 0.098 0.131 0.132 Keywords (K) 0.106 0.331 0.412 0.503 0.520 T + K 0.103 0.336 0.421 0.517 0.537 T + D 0.051 0.135 0.168 0.210 0.217 K + D 0.108 0.343 0.429 0.526 0.544 T + K + D 0.105 0.345 0.433 0.532 0.552

150 Chapter 6. Textual Georeferencing Approaches 2. Experiments to detect the performance and precision of the geographical disambiguation heuristics: these experiments show the performance and importance of geographical knowledge and population heuristics applied alone or in combination but priorizing first the geographical knowledge ones (see in Table 6.5 the configuration details of these experiments). The results (see Table 6.6 ) show that the heuristics that apply geographical knowledge without population heuristics obtain the best precision with a 86.36% of correctly predicted videos from the 2,215 predicted in the experiment EXP_4 and a 83.21% of correctly predicted videos from the 2,353 predicted in the experiment EXP_3. The difference between EXP_3 and EXP_4 is that the last one uses stopwords and the English dictionary to filter out ambiguous place names. On the other hand the combination of geographical knowledge and population heuristics in experiments EXP_4 and EXP_5 obtained a precision of 58.36% (with 4,681 predicted videos) and 68.03% (with 4,173 predicted videos). In order to know the relative performance in precision of each specific heuristic that pertains to the geographical knowledge set of heuristics we computed the precision of each rules (applied in priority order) in the context of the experiment EXP_4 (see Table 6.7).

Table 6.5: Configuration of the georeferencing experiments after MEPT2010.

experiment Geo. Heuristic StopWords Dictionary

EXP_1 population no no

EXP_2 population yes yes

EXP_3 knowledge no no

EXP_4 knowledge yes yes

EXP_5 knowledge+population no no

EXP_6 knowledge+population yes yes

Table 6.6: Results of the georeferencing experiments after MEPT2010.

100km (margin of error)

experiment #predictedOK #predicted Accuracy Precision

EXP_1 2,185 4,681 0.4291 0.4667 EXP_2 2,337 4,173 0.459 0.5600 EXP_3 1,958 2,353 0.3846 0.8321 EXP_4 1,919 2,215 0.3769 0.8636 EXP_5 2,732 4,681 0.5366 0.5836 EXP_6 2,839 4,173 0.5576 0.6803

Table 6.7: Relative performance in precision of each geographical knowledge heuristic data set georefercing up to 100 Km with the experiment EXP_4.

Heuristic Measures

Feature (Superordinate) #predictedOK #predicted Precision H1_city/spot (state) 1,351 1,546 0.8738

H2_city/spot (country) 515 609 0.8456

6.5. Experiments Georeferencing Informal Documents 151 6.5.2.1 Experiments with the IR Approach

The IR indexes were created with a metadata corpus of Flickr photos provided in the MediaEval 2010 for development purposes. The corpus consists of 3,185,258 Flickr photos uniformly sampled from all parts of the world. The photos are georeferenced with geotags with 16 zoom accuracy levels. The accuracy shows the zoom level used by the user when placing the photo on the map ((e.g., 6 - region level, 12 - city level, 16 - street level). The medatada of the corpus is represented by the following information: UserID, PhotoID, HTMLLinkToPhoto, GeoData (includes longitude, latitude, and zoom accuracy level), tags, date taken, and date uploaded. From the metadata corpus of photos we filtered out some data: 1) if a user has several photos metadata with the same tagset then only one photo metadata of them is kept, 2) metadata without existing tags is filtered. After this filtering steps a set of 1,723,090 metadata entries for each photo was obtained. Then, from the filtered corpus we selected four subsets depending on the values of the zoom level accuracy: 1) level 16 (715,318 photos), 2) levels from 14 to 16 (1,140,031 photos), 3) levels from 12 to 16 (1,570,771 photos), 4) levels from 6 to 16 (1,723,090 photos). Moreover, for each unique coordinate pair in the corpora all the tagsets associated to the same coordinate pair were joined resulting of: 1) level 16 (511,222 coordinate pairs), 2) levels from 14 to 16 (756,916 coordinate pairs), 3) levels from 12 to 16 (965,904 coordinate pairs), 4) levels from 6 to 16 (1,026,993 coordinate pairs). The indexing of the metadata subsets was done with the coordinates as a document number and their associated tagsets the document text. Indexing was performed by filtering out tokens that match a multilingual stopwords list and without stemming. The retrieval experiments have been done with the metadata of the videos as queries to the IR system. The following metadata fields were used for the query: Keywords (tags), Title and Description. The metadata fields Title and Description were lowercased for the query. The experiments shown in Table 6.8 show that BM25 achieves the best results in accuracies from 10 to 100 Km and the Hiemstra Language Model IR algorithm achieves the best results in accuracies georeferencing up to 1 and 5 km.

6.5.2.2 Experiments with the IR Re-Ranking and GeoFusion Approaches The GeoFusion approach is applied by combining the results of the Geographical Knowl- edge approach and the IR approach with Re-Ranking (see the results of this approach in Table 6.9). The results are combined in the following way: from the set of Geographical based experiments we selected the experiment with best precision (EXP_4). From the Geographical Knowledge-based experiment with highest precision the system selects the predicted coordinates, and the ones that are not predicted because the geographical rules do not match are selected from the Information Retrieval approaches with Re-Ranking. This means that from the EXP_4 were selected 2,215 predictions and the rest (2,876 pre- dictions) were selected from the IR with RR approaches. The results of the IR Re-Ranking and the GeoFusion approaches (see Table 6.9 ) show that both approaches outperform the Geographical and the IR approaches and the baselines. The baselines presented in Table 6.9 are three: 1) the best results obtained at the MEPT2010 with the test set (Van Laere et al., 2010a), 2) the experiment with BM25 trained with a corpus with accuracies from 6 to 16 levels, and 3) the Hiemstra LM trained with accuracies from 14 to 16 levels. These last two baselines were the ones that obtained the best results in accuracies compared to the other IR and corpus training models.

152 Chapter 6. Textual Georeferencing Approaches

Table 6.8: Results of the georefencing experiments with the Information Retrieval Approach.

Model Accuracy 1km 5km 10km 50km 100km annotation accuracy=16 BM25 0.4236 0.5055 0.5395 0.5951 0.6091 TFIDF 0.4227 0.5028 0.5362 0.5912 0.6059 HLM 0.4309 0.5054 0.5356 0.5989 0.6130 annotation accuracy=14-16 BM25 0.4236 0.5063 0.5446 0.5990 0.6120 TFIDF 0.4227 0.5044 0.5417 0.5939 0.6065 HLM 0.4364 0.5124 0.5474 0.6079 0.6218 annotation accuracy=12-16 BM25 0.4203 0.5044 0.5515 0.6065 0.6216 TFIDF 0.4201 0.5040 0.5494 0.6028 0.6179 HLM 0.4350 0.5146 0.5515 0.6042 0.6201 annotation accuracy=6-16 BM25 0.4142 0.5016 0.5527 0.6063 0.6244 TFIDF 0.4136 0.5012 0.5505 0.6028 0.6201 HLM 0.4284 0.5107 0.5494 0.6049 0.6220

The experiments show that stopwords lists and controlled dictionaries can help the dis- ambiguation of placing names and the focus detection. The experiments also show that geographical knowledge heuristics can achieve a high precision in georeferencing: up to a 86.36%. This fact is very interesting for establishing high confidence rules that could allow a high precision georeferencing detection in textual annotations and tags. The strategy that combines geographical knowledge and population heuristics for geographical focus detec- tion achieves the best results in the experiments with the Geographical approach with the MEPT2010 data set. The Information Retrieval approaches outperformed the Geographical one, but the fusion of both is achieving the best results. The best approach georeferenc- ing up to 1, 5 and 10 km is achieved with the Information Retrieval Re-ranking approach with the Hiemstra LM. The best results in accuracy up to 50 and 100Km are achieved with the fourth strategy: a fusion of Information Retrieval Re-ranking with Geographical Knowledge approaches. These strategies outperformed the best results in accuracy reported by the state-of-the art systems participating at MEPT2010. The best results of accuracy georeferencing up to a distance of 100 Km are 68.53% and obtained with the GeoFusion approach with IR Re-Ranking at a distance of 100km. The approaches of Van Laere et al. (2010a) and Kelm et al. (2010) obtained a 67,23% and 60,46% of accuracy with the same test set at the MEPT2010.

6.5. Experiments Georeferencing Informal Documents 153

Table 6.9: Results of the experiments with IR Re-Ranking and with GeoFusion with the MEPT2010 data (in bold the results that improve the MEPT2010 best results). Note that the IR Re-Ranking experiments are specified with the following syntax: 1) first the name of the IR algorithm (e.g. BM25, HLM, or TFIDF), 2) then a ’@’ symbol, 3) finally the clustering threshold in kms. The GeoFusion experiments follow the same syntax of the IR Re-Ranking plus the following string: ”+GeoKB” (indicating that combines predictions from the GeoKB and the IR Re-Ranking approaches). The ”+GeoKB” string also indicates that the Geographical Knowledge-Based approach has been applied only with the H1,H2,H3 heuristics.

Experiments Accuracy

1km 5km 10km 50km 100km

Baselines

Best MEPT2010 (Van Laere et al., 2010a) 0.4329 0.5425 0.5879 0.6509 0.6723

BM25 (annotation accuracy 6-16) 0.4142 0.5016 0.5527 0.6063 0.6244

HLM (annotation accuracy 14-16) 0.4364 0.5124 0.5474 0.6079 0.6218

Experiments at different Re-Ranking distances

BM25@1km 0.4331 0.5134 0.5507 0.6057 0.6230 BM25@1km+GeoKB 0.2598 0.4549 0.5246 0.6307 0.6552 HLM@1km 0.4535 0.5336 0.5690 0.6338 0.6491 HLM@1km+GeoKB 0.2728 0.4670 0.5391 0.6491 0.6733 BM25@5km 0.3698 0.5266 0.5631 0.6216 0.6375 BM25@5km+GeoKB 0.2427 0.4633 0.5332 0.6389 0.6643 HLM@5km 0.4030 0.5433 0.5739 0.6468 0.6595 HLM@5km+GeoKB 0.2541 0.4761 0.5470 0.6590 0.6823 BM25@10km 0.3688 0.5055 0.5772 0.6256 0.6399 BM25@10km+GeoKB 0.2429 0.4568 0.5389 0.6409 0.6660 HLM@10km 0.4030 0.5275 0.5894 0.6485 0.6611 HLM@10km+GeoKB 0.2563 0.4704 0.5523 0.6611 0.6847 BM25@50km 0.3496 0.4680 0.5124 0.6340 0.6470 BM25@50km+GeoKB 0.2304 0.4378 0.5148 0.6438 0.6682 HLM@50km 0.3834 0.4928 0.5427 0.6482 0.6599 HLM@50km +GeoKB 0.2439 0.4553 0.5346 0.6590 0.6831 BM25@100km 0.3500 0.4635 0.5008 0.5957 0.6485 BM25@100km+GeoKB 0.2309 0.4399 0.5116 0.6318 0.6702 HLM@100km 0.3838 0.4902 0.5358 0.6187 0.6609 HLM@100km+GeoKB 0.2433 0.4539 0.5299 0.6464 0.6853

154 Chapter 6. Textual Georeferencing Approaches