3.6 HeidelTime’s Evaluation Results
3.6.7 Processing Time Performance
While fast processing performance has not been the major goal during the development of HeidelTime, we still payed attention to this issue to allow the usage of HeidelTime in large-scale document processing scenarios.
In this section, we report on HeidelTime’s processing time performance by presenting time measure- ments of processing two types of corpora. (i) To present HeidelTime’s processing time performance with different language and domain settings, we process the gold standard corpora used for evaluating Heidel- Time. (ii) We report time measurements for processing Wikipedia in different languages as representatives of large document collections. In both settings, HeidelTime’s UIMA version is used. For each run, we will thus report the processing times of the whole workflow and of the HeidelTime analysis engine separately. In both cases, the time measurements are directly reported by the UIMA framework.
Processing Time Performance on Evaluation Corpora
Since the evaluation corpora have different formats and languages, we process them using HeidelTime’s UIMA version in combination with the collection readers, analysis engines, and CAS consumers being part of the UIMA HeidelTime kit. The workflows for all corpora are depicted in Figure 3.9. For instance, the English news-style corpora in TempEval-3 format are read by the TempEval-3 reader and processed by the TreeTagger wrapper analysis engine for linguistic preprocessing (sentence splitting, tokenization, and part-of-speech tagging). Then, HeidelTime is applied with its news-style strategy, and the TempEval-3 writer outputs the results in the format required by the TempEval-3 evaluation scripts.
Due to the relatively small sizes of all evaluation corpora, we ran the workflows on a standard laptop (Intel dual core P8700 2.53 GHz, 4 GB ram) without parallelization or any other kind of tuning.
In Table 3.16, the time performance measurements are shown for all evaluation corpora – ordered by language and number of tokens (token count according to the number of tokens extracted by our UIMA wrappers for linguistic preprocessing). The processing times of the whole workflows range from less than six seconds to almost 520 seconds for the smallest and largest corpora with respect to the number of tokens (TE-2 English and ACE TERN 2004), respectively. Note that the workflow processing time is not only sensitive to the total number of tokens but also to the number of documents. While processing the Time4SMS corpus that contains many short documents is rather slow, the WikiWars and WikiWarsDE corpora contain only few but quite long documents and are processed much faster.
The processing time of the HeidelTime analysis engine is rather less sensitive to the number of documents but mostly depends on the total size of the corpora (total number of tokens). In Figure 3.10, the processing times of the UIMA workflows and the HeidelTime analysis engines are depicted. In both figures – note the log scale of the x-axes – the dotted line represents a linear regression based on the
3 Cross-domain Temporal Tagging TE-2 English TERN 2004 training ACE 2005 training TimeBank 1.2 I-CAB test I-CAB training Time4SCI Arabic train Arabic test-150 Arabic test-50 TE-2 Italian test TE-2 Italian training TE-3 English platinum
TimeBank (TE3) Aquaint (TE3) Time4SMS WikiWarsDE WikiWars French TimeBank TE-3 Spanish test TE-3 Spanish training
TempEval-2 Reader TempEval-3 Reader ACE TERN Reader TreeTagger token = false sentence = false POS = true JvnTextPro token = true sentence = true POS = true TreeTagger token = true sentence = true POS = true Stanford POS Tagger token = true sentence = true POS = true HeidelTime domain = news HeidelTime domain = narratives HeidelTime domain = colloquial HeidelTime domain = autonomic TempEval-2 Writer TempEval-3 Wrtier ACE TERN Writer Arabic test-50* WikiWarsVN
Corpora Collection Readers Analysis Engines CAS Consumers
Figure 3.9: UIMA workflows to measure time performance on evaluation corpora.
processing times for all corpora. For the HeidelTime analysis engine, the linear regression is a quite good estimation with an asymptotic standard error of just 3.3%. In contrast, the linear regression is a less well estimation for the full UIMA workflows (asymptotic standard error 16.5%).
In addition to the document length sensitivity – represented by the WikiWars and Time4SMS corpora explicitly marked in Figure 3.10(a) – the language also plays a more significant role for the processing time of the whole workflows than for the processing time of the HeidelTime analysis engines. For example, processing Italian is slower than processing English. These differences are mainly due to performance differences for linguistic preprocessing performed by our UIMA wrappers. In general, these could be improved to reduce the processing times of the whole workflows since they currently require a lot of I/O for each preprocessing subtask (sentence splitting, tokenization, and part-of-speech tagging). Avoiding I/O would make the preprocessing faster since the preprocessing tasks themselves are quite fast. However, due to the rather minor role of performance for our work, this will be addressed in future work.
Processing Time Performance on Wikipedia
After having presented processing times of the rather small evaluation corpora, we now report time performance measurements for processing the English and Spanish Wikipedia to demonstrate that HeidelTime can easily be used to process large document collections.
Except of language settings, the workflows of both document collections are identical. Since we stored the text parts of all Wikipedia articles in a NoSQL MongoDB database, we used a simple MongoDB collection reader. For linguistic preprocessing, our UIMA TreeTagger wrapper is used with the respective language models. Then, HeidelTime is applied with its narrative normalization strategy before a CAS consumer counts sentences, tokens, and temporal expressions extracted from the documents.
3.6 HeidelTime’s Evaluation Results
corpus language domain docs tokens workflow [s] HeidelTime [s] TE-2 test English news 9 4,849 5.7 2.2 TE-3 platinum English news 20 7,000 11.2 3.2 Time4SCI English autonomic 50 16,760 26.2 6.5 Time4SMS English colloquial 1,000 26,054 406.3 19.0 Aquaint TE3 English news 73 36,497 45.4 16.2 TimeBank TE3 English news 183 63,173 101.3 28.7 TimeBank 1.2 English news 183 66,628 104.4 31.6 WikiWars English narratives 22 117,169 66.9 55.1 ACE 2005 English news 599 325,974 385.8 145.9 ACE TERN 2004 English news 862 370,964 518.0 176.3 TE-2 test Italian news 13 5,293 16.4 1.9 TE-2 training Italian news 51 28,988 67.9 10.3 I-CAB test Italian news 190 80,293 244.9 29.0 I-CAB train Italian news 335 133,032 430.5 48.0 ACE train-50* Arabic news 50 12,228 16.3 7.5 ACE train-50 Arabic news 50 13,489 23.7 8.4 ACE test-150 Arabic news 150 44,449 76.5 29.4 ACE train-203 Arabic news 203 61,494 115.4 41.3 TE-3 test Spanish news 35 9,914 10.3 3.2 TE-3 training Spanish news 150 58,493 53.2 18.8 WikiWarsDE German narratives 22 94,058 47.3 26.8 WikiWarsVN Vietnamese narratives 15 11,014 8.3 2.9 TimeBank-FR French news 108 17,611 52.4 3.7
Table 3.16: HeidelTime’s processing time performance on evaluation corpora.
In Table 3.17, we report some information about the Wikipedia dumps in addition to the time perfor- mance measures for the full UIMA workflows and the HeidelTime analysis engines. Note that in contrast to the experiments on the evaluation corpora, we used an Intel quad-core i7-4770 (3.40GHz, 16 GB ram) and multi-threading by setting the CAS pool size and processing unit thread count parameters to 16 each. Processing the almost 4.5 million documents of the English Wikipedia with more than 1,708 million tokens took in total 71 hours. About 23% of the time was used by the HeidelTime analysis engine itself. Processing the Spanish workflow was faster with respect to both, per document and per 1000 tokens measurements. The processing time for the HeidelTime analysis engine was also slightly faster for processing the Spanish Wikipedia than for processing the English Wikipedia, but the main difference between processing the English and Spanish Wikipedia is due to faster linguistic preprocessing the Spanish documents. This is also the reason why the HeidelTime analysis engine took about 33% of the full processing time of the Spanish Wikipedia.
Given the facts that only a single machine was used in these performance measurements and that there is no need for any manual effort to run HeidelTime in parallel mode, these numbers demonstrate that HeidelTime can be used out-of-the-box to process large-scale document collections.
3 Cross-domain Temporal Tagging 0 100 200 300 400 500 600 1 5 10 50 100 500 processing time [s]
tokens in corpus (in thousands) Time4SMS WikiWars English German Spanish Italian Arabic Vietnamese French
(a) Full workflows.
0 50 100 150 200 1 5 10 50 100 500 processing time [s]
tokens in corpus (in thousands) English German Spanish Italian Arabic Vietnamese French (b) HeidelTime only.
Figure 3.10: HeidelTime’s processing time performance on all evaluation corpora.
total time [h] per document [s] per 1000 tokens [s] workflow / workflow / workflow / language documents tokens HeidelTime HeidelTime HeidelTime English 4,457,716 1,708,494,667 70.6 / 16.4 (23%) 0.057 / 0.013 0.149 / 0.0345 Spanish 1,035,680 409,180,706 9.0 / 3.02 (33%) 0.031 / 0.011 0.079 / 0.0266
Table 3.17: HeidelTime’s processing time performance on English and Spanish Wikipedia.