2.2 A Novel Approach for Knowledge Base Construction with Conditional Ran-
2.2.3 CRF Model Design
In this section, the two developed CRF variants are described, which extract entities and relations. In both variants, we restrict to use only local dependencies between labels to keep
11Note that the product over potential functions from Equation 2.4 can be written as a sum into the
Context Features Orthographic Features Word Shape Features Ngram Features Dictionary Features CRF Framework FrameworkCRF NER Features Relational Features Relational Graph Documents
NER module SRE module
Role Features
Figure 2.6: Cascaded CRF workflow for the combined task of NER and SRE. In the first module, a NER tagger is trained with the above shown features. The extracted role feature is used to train a SRE model, together with standard NER features (optional) and relational features.
inference tractable and cast the problem of identifying relations into a sequence labeling problem. Since the relations are encoded at the positions, where the entities occur in the sequence, it is suitable to restrict to linear-chain CRFs which have been applied to the task of NER with much success. As a consequence we can make use of a method that has been proven to be highly competitive for this task. Due to their discriminative nature, CRFs can easily incorporate arbitrary, non-local features from the input sequence (see Section next 2.2.4). This characteristic will be very suitable for tackling the task of relation extraction. The first variant, theone-step CRF is more suitable for encyclopaedic-style articles (see Section 2.2.1), in return it is able to perform entity recognition and relation extraction in one single step. The second variant, thecascaded CRF consists of two single CRF models, the first model for identifying entities and the second model for identifying entities and their relations. The output of the first CRF model is provided as input to the subsequent CRF. Finally, we discuss how the type of text influences the model design.
Cascaded CRF for NER+SRE
In this setting, we treat the problem with a classical pipeline approach, where the output of a classifier trained for one specific problem is used as input to the next classifier, which solves the subsequent problem. Instead of learning jointly the classifier pipeline with global inference as is done in previous work for the task of noun phrase chunking [180], we restrict to cascaded training and rely on a simple but effective two-stage model. In the cascaded setting, two CRFs are trained: a CRF for NER and a second CRF for solving the combined task of NER+SRE. The trained CRF for NER is first applied to identify all entity mentions in the textual sequence. In addition to standard local features, the
identified entity mentions are then used as additional input features to help solve the NER+SRE problem (Figure 2.6). The entities identified in the first step serve as soft constraints in the second model. There is no hard rule that after the second step, the entity mentions have to be exactly the same as after the first NER step. However, this entity feature will enforce a very strong correspondence between entities identified in the first step and between the final predictions.
One-step CRF for NER+SRE
Here we only consider text collections that refer to a key entity such as e. g. Wikipedia, Wikigenes or GeneRIF phrases. Thus, the key entity is already given. All other entities in the text phrase, so-called secondary entities, are assumed to stand in some relation to the key entity. In the cascaded setting described above, SRE is treated with a classical pipeline approach. The special nature of encyclopaedic-style document collections allows us to solve NER and SRE in one step. This is also reflected by the labeling, as a secondary entity’s label encodes the type of the entity plus the type of relation with the key entity (see Section 2.2.1). Note that the key entity itself has not to be explicitly mentioned in the text. To illustrate the assumption made in this setting, we give an example from the biomedical domain. GeneRIF sentences represent a similar style of text in the biomedical domain such as Wikipedia. GeneRIFs describe the function of a gene/protein, the key entity, as a concise phrase. Consider the following GeneRIF sentence linked to the gene COX-2:
’Expression in this gene is significantly more common in endometrial adenocarcinoma and ovarian serous cystadenocarcinoma, but not in cervical squamous carcinoma, compared with normal tissue.’
This sentence states three disease relations with COX-2 (the key entity), namely two al- tered expression relations (the expression of COX-2 relates to endometrial adenocarcinoma and ovarian serous cystadenocarcinoma) and one unrelated relation (cervical squamous carcinoma).
Encyclopaedic-style Articles vs. General Free Text.
Several CRF models are trained, when extracting facts from general free text in order to alleviate ambiguities. In particular, we train one such model for each pair of entity classes in the conceptual schema for which there exists at least one single defined relation. E. g. , given the conceptual schema from Figure2.2, results in training two CRF models for SRE. One model for the pair between the entity classes PER and ORG and the second for the entity classes ORG and LOC. It is important to note that even though several relations between two entity classes may hold, only single model is trained for all relations holding between this pair of entity classes. For the NER task, a single global model is trained.
In encyclopaedic-style article collections, we do not have such difficulties and train only one global model for SRE independent of the conceptual schema.