SemEval 2013 Task 3: Spatial Role Labeling

(1)

SemEval-2013 Task 3: Spatial Role Labeling

Oleksandr Kolomiyets†, Parisa Kordjamshidi†, Steven Bethard‡andMarie-Francine Moens†

†_{KU Leuven, Celestijnenlaan 200A, Heverlee 3001, Belgium}

‡_{University of Colorado, Campus Box 594 Boulder, Colorado, USA}

Abstract

Many NLP applications require information about locations of objects referenced in text, or relations between them in space. For ex-ample, the phrasea book on the deskcontains information about the location of the object

book, astrajector, with respect to another ob-jectdesk, aslandmark. Spatial Role Label-ing (SpRL) is an evaluation task in the infor-mation extraction domain which sets a goal to automatically process text and identify ob-jects of spatial scenes and relations between them. This paper describes the task in Se-mantic Evaluations 2013, annotation schema, corpora, participants, methods and results ob-tained by the participants.

1 Introduction

Spatial Role Labeling at SemEval-2013 is the sec-ond iteration of the task, which was initially in-troduced at SemEval-2012 (Kordjamshidi et al., 2012a). The second iteration extends the previous work with an additional training corpus, which con-tains besides “static” spatial relations, annotated mo-tions. Motion detection is a novel task for annotating trajectors (objects, which are moving), landmarks (spatial context in which the motion is performed), motion indicators (lexical triggers which signals tra-jector’s motion), paths (a path along which the mo-tion is performed), direcmo-tions (absolute or relative directions of trajector’s motion) and distances (a distance as a product of motion). For annotating motions the existing annotation scheme has been adapted with additional markables which are, all to-gether, described below.

2 Spatial Annotation Schema

In this Section we describe the annotation format of spatial markables in text, and annotation guidelines for the annotators.

2.1 Spatial Annotation Format

Building upon the previous work, we used the no-tions of trajectors, landmarks and spatial indicators as introduced by Kordjamshidiet al. (2010). In ad-dition, we further expanded the set of spatial roles labels with motion indicators, paths, directions and distances to capture fine-grained spatial semantics of staticspatial relations (as the ones which do not in-volve motions), and to accommodatedynamic spa-tial relations (the ones which do involve motions).

2.1.1 Static Spatial Relations and their Roles

Static spatial relations are defined as relations be-tween still objects, whereas one object plays a cen-tral role in the spatial scene, which is called tra-jector, and the second one plays a secondary role, and it is calledlandmark. In language, a spatial re-lation between two objects is usually implemented by a preposition (in, on,at, etc.) or a prepositional phrase (on top of,inside of, etc.).

A static spatial relation is defined as a tuple that contains a trajector, a landmark and a spatial indica-tor. In the annotation schema, these annotations are defined as follows:

Trajector: Trajector is a spatial role label as-signed to a word or a phrase that denotes a central object of a spatial scene. For example:

• [T rajectora lake] in the forest

(2)

• [T rajectora flag] on top of the building

Landmark: Landmark is a spatial role label as-signed to a word or a phrase that denotes a secondary object of a spatial scene, to which a possible spatial relation (as between two objects in space) can be es-tablished. For example:

• a lake in[Landmarkthe forest]

• a flag on top of [Landmarkthe building]

Spatial Indicator: Spatial Indicator is a spatial role label assigned to a word or a phrase that sig-nals a spatial relation between objects (trajectors and landmarks) of a spatial scene. For example:

• a lake[Sp indicatorin] the forest

• a flag] [Sp indicator on top of] the building

Spatial Relation: Spatial Relation is a relation that holds between spatial markables in text as, e.g., between a trajector and a landmark and triggered by a spatial indicator. In spatial information theory the relations and properties are usually grouped into the domains of topological, directional, and distance re-lations and also shape (Stock, 1998). Three semantic classes for spatial relations were proposed:

• Region. This type refers to a region of space which is always defined in relation to a land-mark, e.g., the interior or exterior. For exam-ple:

a lake in the forest =⇒ hRegion, [Sp indicator

in], [T rajectora lake], [Landmarkthe forest]i

• Direction. This relation type denotes a direc-tion along the axes provided by the different frames of reference, in case the trajector of mo-tion is not characterized in terms of its relamo-tion to the region of a landmark. For example:

a flag on top of the building =⇒ hDirection, [Sp indicator on top of], [T rajector a flag],

[Landmark the building]i

• Distance. Type Distance states information about the spatial distance of the objects and could be a qualitative expression, such asclose, far or quantitative, such as12 km. For example: the kids are close to the blackboard =⇒ hDistance, [Distanceclose], [T rajectorthe kids],

[Landmark the blackboard]i

2.1.2 Dynamic Spatial Relations

In addition to static spatial relations and their roles, SpRL-2013 introduces new spatial roles to capture dynamic spatial relations which involve motions. Let us demonstrate this with the following example:

(1) In Brazil coming from the North-East I stepped into the small forest and followed down a dried creek.

The text above describes a motion, and the reader can identify a number of concepts which are pecu-liar for motions: there is an object whose location is changing, the motion is performed in a specific spatial context, with a specific direction, and with a number of locations related to the object’s motion.

There has been an enormous effort in formalizing and annotating motions in natural language. While annotating motions was out of scope for the previ-ous SpRL task and SpatialML (Mani et al., 2010), the most recent work on the Dynamic Interval Tem-poral Logic (DITL) (Pustejovsky and Moszkowicz, 2011) presents a framework for modeling motions as a change of state, which adapts linguistic back-ground considering path constructions and manner-of-motion constructions. On this basis the Spa-tiotemporal Markup Language (STML) has been in-troduced for annotating motions in natural language. In STML, a motion is treated as a change of location over time, while differentiating between a number of spatial configurations along the path. Being well-defined for the formal representations of motion and reasoning, in which representations either take ex-plicit reference to temporal frames or reify a spatial object for a path, all the previous work seems to be difficult to apply in practice when annotating mo-tions in natural language. It can be attributed to pos-sible vague descriptions of path in natural language when neither clear temporal event ordering, nor dis-tinction between the start, end or intermediate path point can be made.

(3)

Trajector: Trajector is a spatial role label as-signed to a word or a phrase which denotes an object which moves, starts, interrupts, resumes a motion, or is forcibly involved in a motion. For example:

• ... coming from the North-East [T rajector I]

stepped into ...

Motion Indicator: Motion indicator is a spatial role label assigned to a word or a phrase which sig-nals a motion of the trajector along a path. In Exam-ple (1), a number of motion indicators can be identi-fied:

• ... [M otion coming] from the North-East I

[M otion stepped into] ... and[M otion followed

down] ...

Path: Path is a spatial role label assigned to a word or phrase that denotes the path of the motion as the trajector is moving along, starting in, arriving in or traversing it. In SpRL-2013, as opposite to STML, the notion of path does not have the temporal dimen-sion, thus whenever the motion is performed along a path, for which either a start, an intermediate, an end path point, or an entire path can be identified in text, they are labeled as path. In Example (1), a number of path labels can be identified:

• ... coming[P athfrom the North-East]I stepped

into[P ath the small forest]and followed down

[P atha dried creek].

Landmark: The notion of path should not be con-fused with landmarks. For spatial annotations, land-mark has been introduced as a spatial role label for a secondary object of the spatial scene. Being of great importance for static spatial relations, in dy-namic spatial relations, landmarks are used to cap-ture a spatial context of a motion as for example:

• In [Landmark Brazil] coming from the

North-East ...

Distance: In contrast to the previous SpRL anno-tation standard, in which distances and directions have been uniformly treated as signals, in SpRL-2013 if the motion is performed for a certain dis-tance, and such a distance is mentioned in text, the corresponding textual span is labeled as distance.

Distance is a spatial role label assigned to a word or a phrase that denotes an absolute or relative dis-tance of motion, or the disdis-tance between a trajector and a landmark in case of a static spatial scene. For example:

• [Distance25 km]

• [Distanceabout 100 m]

• [Distancenot far away]

• [Distance25 min by car]

Direction: Additionally, if the motion is per-formed in a certain (absolute or relative) direction, and such a direction is mentioned in text, the corre-sponding textual span is annotated as direction. Di-rection is a spatial role label assigned to a word or a phrase that denotes an absolute or relative direc-tion of modirec-tion, or a spatial arrangement between a trajector and a landmark. For example:

• [Directionthe North-West]

• [Directionnorthwards]

• [Directionwest]

• [Directionthe left-hand side]

Spatial Relation: Similarly to static spatial rela-tions, dynamic spatial relations are annotated by re-lations that hold between a number of spatial roles. The major difference to static spatial relations is the mandatory motion indicator1. For example:

• In Brazil coming from the North-East I ... =⇒ hDirection, [Sp indicator In], [T rajector I],

[Landmark Brazil], [M otion coming],[P ath from

the North-East]i

• ... I stepped into the small forest and ... =⇒ hDirection, [T rajector I], [M otion stepped

into],[P aththe small forest]i

• ... I [...] and followed down a dried creek. =⇒ hDirection, [T rajector I], [M otion followed

down],[P atha dried creek]i

1_{All dynamic spatial relations were annotated with type}

(4)

Corpus Files Sent. TR LM SI MI Path Dir Dis Relation

IAPR TC-12 Training 1 600 716 661 670 - - - - 765

Evaluation 1 613 872 743 796 - - - - 940

Confluence Project

[image:4.612.73.541.60.132.2]

Training 95 1422 1701 1037 879 1039 945 223 307 2105 Evaluation 22 367 497 316 247 305 240 37 87 598

Table 1: Corpus statistics for SpRL-2013 with respect to annotated spatial roles (trajectors (TR), landmarks (LM), spatial indicators (SI), motion indicators (MI), paths (Path), directions (Dir) and distances (Dis)) and spatial relations.

3 Corpora

The data for the shared task comprises two different corpora.

3.1 IAPR TC-12 Image Benchmark Corpus

The first corpus is a subset of the IAPR TC-12 image benchmark corpus (Grubinger et al., 2006). It con-tains 613 text files that include 1213 sentences in to-tal, and represents an extension of the dataset previ-ously used in (Kordjamshidi et al., 2011). The orig-inal corpus was available free of charge and without copyright restrictions. The corpus contains images taken by tourists with descriptions in different lan-guages. The texts describe objects, and their abso-lute and relative positions in the image. This makes the corpus a rich resource for spatial information, however, the descriptions are not always limited to spatial information. Therefore, they are less domain-specific and contain free explanations about the im-ages. For training we released 600 sentences (about 50% of the corpus), and used remaining 613 sen-tences for evaluations.

3.2 Confluence Project Corpus

The second corpus comes from the Confluence project that targets the description of locations sit-uated at each of the latitude and longitude inte-ger degree intersection in the world. This corpus contains user-generated content produced by, some-times, non-native English speakers. We gathered the content by keeping the original orthography and for-mating. In addition, we stored the URLs of the scriptions and extracted the coordinates of the de-scribed confluence point, which might be interest-ing for further research. In total, the entire corpus contains 117 files with 1789 sentences (about 40,000 tokens). For training we released 95 annotated files with 1422 sentences, 2105 annotated relations in

to-tal. For evaluation we used 22 annotated files with 367 sentences. The statistics on both corpora are provided in Table 1.

3.3 Data Format

One important change to the data was made in SpRL-2013. In contrast to SpRL-2012, where spa-tial roles were annotated over “head words” whose indexes were part of unique identifiers, in SpRL-2013 we switched to span-based annotations. More-over, in order to provide a single data format for the task, we transformed SpRL-2012 data into span-based annotations, in course of which, we identified a number of annotation errors and made further im-provements for about 50 annotations.

For annotating the Confluence Project corpus we used a freely available annotation tool MAE created by Amber Stubbs (Stubbs, 2011). The resulting data format uses the same annotation tags as in SpRL-2012, but each role annotation refers to a character offset in the original text2. Spatial relations are com-posed of references to annotations by their unique identifiers. Similarly to SpRL-2012, we allowed annotators to provide non-consuming annotations, where entity mentions, for which spatial roles can be identified, are omitted in text but necessary for a spatial relation triggered by either a spatial indicator or a motion indicator. Two spatial roles are eligible for non-consuming annotations: trajectors and land-marks.

4 Tasks Descriptions

For the sake of consistency with SpRL-2012, in SpRL-2013 we proposed the following tasks:

2

Due to paper length constraints we omit the BNF specifica-tions for spatial roles and relaspecifica-tions. For further data format in-formation we refer the reader to the task description web page:

(5)

• Task A: Identification of markable spans for three typesof spatial annotations such as tra-jector, landmark and spatial indicator.

• Task B: Identification of tuples (triplets) that connect trajectors, landmarks and spatial indi-cators identified in Task A into spatial relations. That is, identification of spatial relations with three markables connected, andwithout se-mantic relation classification.

• Task C: Identification of markable spans forall spatial annotations such as trajector, landmark, spatial indicator, motion indicator, path, direc-tion and distance.

• Task D: Identification ofn-tuplesthat connect spatial markables identified in Task C into spa-tial relations. That is, identification of spaspa-tial relations with as many participating mark-ables as possible, and withoutsemantic rela-tion classificarela-tion.

• Task E:Semantic classificationof spatial rela-tions identified in Task D.

5 Evaluation Criteria and Metrics

System outputs were evaluated against the gold annotations, which had to conform to the role’s Backus-Naur form. For Tasks A and C, the system annotations are spatial roles: spans of text associated with spatial role types. A system annotation of a role is considered correct if it has a minimal overlap of one character with a gold annotation and matches the role type of the gold annotation. For Tasks B and D, the system annotations are spatial relation tuples (of length 3 in task B, of length 3 to 5 in Task D) of references to markable annotations. A system anno-tation of a spatial relation tuple is considered correct if it is of the same length as the gold annotation, and if each spatial role in the system tuple matches each role in the gold tuple. A spatial role estimated by a system is considered correct if it matches a gold ref-erence when having the same character offsets and markable types (strict evaluation settings). In ad-dition we introduced relaxed evaluation settings, in which a minimal overlap of one character between a system and a gold markable references is required for a positive match under condition that the roles

match. For Task E, the system annotations are spa-tial relation tuples of length 3 to 5, along with re-lation type labels. A system annotation of a spatial relation is considered correct if the spatial relation tuple is correct under the evaluation of Task D and the relation type of the system relation is the same as the relation type of the gold relation.

Systems were evaluated for each of the tasks in terms of precision (P), recall (R) andF1-score which are defined as follows:

P recision= tp

tp+f p (1)

Recall= tp

tp+f n (2)

wheretpis the number of true positives (the num-ber of instances that are correctly found),f pis the number of false positives (number of instances that are predicted by the system but not a true instance), andf nis the number of false negatives (missing re-sults).

F1= 2·

P recision·Recall

P recision+Recall (3)

6 System Description and Evaluation Results

UNITOR. The UNITOR-HMM-TK system ad-dressed Tasks A,B and C (Bastianelli et al., 2013).

(6)

Run Task Evaluation Label P R F1-score

UNITOR.Run1.1

Task A relaxed

TR 0.684 0.681 0.682 LM 0.741 0.835 0.785 SI 0.967 0.889 0.926

Task B relaxed Relation 0.551 0.391 0.458 strict Relation 0.431 0.306 0.358

UNITOR.Run1.2

Task A relaxed

TR 0.682 0.493 0.572 LM 0.801 0.560 0.659 SI 0.968 0.585 0.729

Task B relaxed Relation 0.551 0.391 0.458 strict Relation 0.431 0.306 0.358

UNITOR.Run2.1

Task A relaxed

TR 0.565 0.317 0.406 LM 0.661 0.476 0.554 SI 0.612 0.481 0.538

Task C relaxed

[image:6.612.121.498.58.349.2]

TR 0.565 0.317 0.406 LM 0.662 0.476 0.554 SI 0.609 0.479 0.536 MI 0.892 0.294 0.443 Path 0.775 0.295 0.427 Dir 0.312 0.229 0.264 Dis 0.946 0.331 0.490

Table 2: Results of UNITOR for SpRL-2013 tasks (Task A, B and C).

of Word Space – a Distributional Model of Lexi-cal Semantics derived from the unsupevised anal-ysis of an unlabeled large-scale corpus (Sahlgren, 2006). Similarly to the approaches demonstrated in SpRL-2012, the proposed approach first classi-fies spatial and motion indicators, then, using these outcomes further spatial roles are determined. For classifying indicators, the classifier makes use of lexical and grammatical features like lemmas, part-of-speech tags and lexical context representations. The remaining spatial roles are estimated by another classifier additionally employing the lemma of the indicator, distance and relative position to the cator, and the number of tokens composing the indi-cator as features.

In Task B, all roles found in a sentence for Task A are combined to generate candidate relations, which are verified by a Support Vector Machine (SVM) classifier. As the entire sentence is informative to determine the proper conjunction of all roles, a Smoothed Partial Tree Kernel (SPTK) within the classifier that enhances both syntactic and lexical in-formation of the examples was applied (Croce et al.,

2011). This is a convolution kernel that measures the similarity between syntactic structures, which are partially similar and whose nodes can be different, but are, nevertheless, semantically related. Each ex-ample is represented as a tree-structure which is di-rectly derived from the sentence dependency parse, and thus allows for avoiding manual feature engi-neering as in contrast to the work of Roberts and Harabagiu (2012). In the end, the similarity score between lexical nodes is measured by the Word Space model.

(7)

Although, not directly comparable to the results in SpRL-2012, one may observe some common trends. First, similarly to the previous findings, the perfor-mance for recognition of landmarks and spatial in-dicators (Task A) on the IAPR TC-12 Image bench-mark corpus is better than trajectors (F1-scores of

0.785, 0.926 and 0.682 respectively), and spatial in-dicators is the “easiest” spatial role to recognize (F1

-score of 0.926).

In contrast, spatial role labeling on the Confluence Project corpus performs worse than on the IAPR TC-12 Image benchmark corpus (withF1-scores of

0.406, 0.538 and 0.554 for trajectors, spatial indica-tors and landmarks respectively). Interestingly, the performance for landmarks is generally higher than for trajectors, which is in line with previous findings in SpRL-2012. The performance drop on the new corpus can be attributed to more complex text and descriptions, whereas multiple roles can be identi-fied for the same span (for example, a path which spans over trajectors, landmarks and spatial indica-tors). For the new spatial roles of motion indicators, paths, directions and distances, the performance lev-els are overall higher than for trajectors with an ex-ception of directions. Yet, the precision levels for new roles is much higher than the recall (0.892 vs. 0.294 for motion indicators, 0.775 vs. 0.295 for paths and 0.946 vs. 0.331 for distances). Directions turned out to be the most difficult role to classify (0.312, 0.229 and 0.264 for P, R andF1-score

re-spectively).

7 Conclusion

In this paper we described an evaluation task on Spa-tial Role Labeling in the context of Semantic Evalu-ations 2013. The task sets a goal to automatically process text and identify objects of spatial scenes and relations between them. Building largely upon the previous evaluation campaign, SpRL-2012, in SpRL-2013 we introduced additional spatial roles and relations for capturing motions in text. In ad-dition, a new annotated corpus for spatial roles (in-cluding annotated motions) was produced and re-leased to the participants. It comprises a set of 117 files with about 40,000 tokens in total.

With the registered number of 10 participants and the final number of submissions (only one) we can

conclude that spatial role labeling is an interesting task within the research community, however some-times underestimated in its complexity. Our further steps in promoting spatial role labeling will be a de-tailed description of the annotation scheme and an-notation guidelines, analysis of the corpora and ob-tained results.

Acknowledgments

The presented research was supporter by the PARIS project (IWT - SBO 110067), TERENCE (EU FP7– 257410) and MUSE (EU FP7–296703).

References

Emanuele Bastianelli, Danilo Croce, Roberto Basili, and Daniele Nardi. 2013. UNITOR-HMM-TK: Struc-tured Kernel-based learning for Spatial Role Labeling. InProceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Association for Computational Linguistics.

Danilo Croce, Alessandro Moschitti, and Roberto Basili. 2011. Structured Lexical Similarity via Convolution Kernels on Dependency Trees. InProceedings of the Conference on Empirical Methods in Natural Lan-guage Processing, pages 1034–1046. Association for Computational Linguistics.

Danilo Croce, Giuseppe Castellucci, and Emanuele Bas-tianelli. 2012. Structured Learning for Semantic Role Labeling.Intelligenza Artificiale, 6(2):163–176. Michael Grubinger, Paul Clough, Henning M¨uller, and

Thomas Deselaers. 2006. The IAPR TC-12 Bench-mark: A New Evaluation Resource for Visual Informa-tion Systems. InInternational Workshop OntoImage, pages 13–23.

Parisa Kordjamshidi, Marie-Francine Moens, and Mar-tijn van Otterlo. 2010. Spatial Role Labeling: Task Definition and Annotation Scheme. In Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), pages 413–420. Parisa Kordjamshidi, Martijn Van Otterlo, and Marie-Francine Moens. 2011. Spatial Role Labeling: To-wards Extraction of Spatial Relations from Natural Language. ACM Transactions on Speech and Lan-guage Processing (TSLP), 8(3):4.

Parisa Kordjamshidi, Steven Bethard, and Marie-Francine Moens. 2012a. Semeval-2012 Task 3: Spa-tial Role Labeling. In Proceedings of the Sixth In-ternational Workshop on Semantic Evaluation, pages 365–373. Association for Computational Linguistics. Parisa Kordjamshidi, Paolo Frasconi, Martijn Van

(8)

2012b. Relational Learning for Spatial Relation Ex-traction from Natural Language. In Inductive Logic Programming, pages 204–220. Springer.

Inderjeet Mani, Christy Doran, Dave Harris, Janet Hitze-man, Rob Quimby, Justin Richer, Ben Wellner, Scott Mardis, and Seamus Clancy. 2010. SpatialML: Anno-tation Scheme, Resources, and Evaluation. Language Resources and Evaluation, 44(3):263–280.

James Pustejovsky and Jessica L Moszkowicz. 2011. The Qualitative Spatial Dynamics of Motion in Lan-guage. Spatial Cognition & Computation, 11(1):15– 44.

Kirk Roberts and Sanda M Harabagiu. 2012. UTD-SpRL: A Joint Approach to Spatial Role Labeling. In

Proceedings of the Sixth International Workshop on Semantic Evaluation, pages 419–424. Association for Computational Linguistics.

Magnus Sahlgren. 2006. The Word-space Model. Ph.D. thesis, Stockholm University.

Oliviero Stock. 1998. Spatial and Temporal Reasoning. Springer-Verlag New York Incorporated.

Amber Stubbs. 2011. MAE and MAI: Lightweight An-notation and Adjudication Tools. In Proceedings of the 5th Linguistic Annotation Workshop, LAW V ’11, pages 129–133, Stroudsburg, PA, USA. Association for Computational Linguistics.