Edge weights and edge attributes - Stream Compatible and Context Sensitive Implicit Networks

5.3 Stream Compatible and Context Sensitive Implicit Networks

5.3.2 Edge weights and edge attributes

To assign weights and attributes to the edges, we distinguish between edges of the containment type Ecand edges of the proximity type Ep. Since the construction of the two

5 Dynamic Implicit Entity Networks

Containment edges

Edges of the containment type are binary relations at their core. Therefore, the resulting edges are essentially unweighted, although parallel edges may occur in rare cases such as multiple stop words occurring in the same sentence. To simplify the subsequent notation, we define the distance functionδ : Ec→ N for edges between an entity v and a sentence

s as δ (v, s, i) := 0 if v ∈ s, and δ (v,s,i) := ∞ if v < s. The distance between sentences and documents is defined analogously. Note that these distances work in the same way as the distances for the static implicit network.

Occurrence proximity edges

Edges of the proximity type are more complex due to more finely nuanced distances between entities and terms, and due to parallel edges caused by multiple cooccurrences. Since it is this entity cooccurrence information that encodes the relevant dynamic information that is necessary for later analyses, we want to preserve these multiple edges and enrich them with additional information prior to their aggregation (see Chapter5.3.4). As edge attributes for occurrence proximity edges, we consider three fundamental concepts, namely (i) the publication time, (ii) the textual distance between the mentions of two entities, and (iii) the context of the mentions.

Publication time

For a stream of documents such as news articles, we can assume that a publication time or retrieval date is known for each document, which we use to derive an edge attribute. Definition 5.1 (Publication timeτ ). Let τ : D → N be a function that maps each document d ∈ D to its publication time τ (d). Let di be the document that contains a cooccurrence

instancei inducing an edge e = (v,w,i) between two nodes. Then τ (e) := τ (di) is an

attribute that assigns the publication time of the corresponding document to edgee. Thus, if we observe multiple parallel edges between two specific nodes, each edge is assigned a time stamp that denotes its occurrence time in the stream.

Textual distance

The textual distance of proximity edges works similar to the static case. Formally, by overloading the functionς, we can map each entity of an instance to the index of the

5.3 Stream Compatible and Context Sensitive Implicit Networks

sentence in which this entity occurs. Thus, letς (v, i) denote the number of the sentence in which entityv occurs in instance i. For example, if entity v in instance i occurs in the first sentence of a document, then we haveς (v, i) = 1. In analogy to Chapter 3.2, the textual sentence distanceδ : Ep→ N of two entities can then be written as

δ (v, w, i) := |ς (v,i) − ς (w,i)|. (5.3) For example, if entitiesv and w cooccur in a document such that v is contained in the first sentence, andw is contained in the fourth sentence, then δ (v, w, i) = 3. Thus, if two entities occur in the same sentence, their distance is 0. Ifv and w never occur together in the same document, we setδ (v, w) := ∞. To include the distance of entity cooccurrences in the graph, we assign to each edgee = (v,w,i) the corresponding distance δ (v,w,i) as an edge attributeδ (e).

Context embeddings

To conserve the context of joint entity mentions and model the context in which an edge originally occurred, we use a vector embedding of terms in the context window of two entities. Formally, an embedding is a functionε : T → Rk that maps a term to a point in a k-dimensional vector space (see Chapter2.1). Without loss of generality, we assume that pre-trained embeddings are available for all terms (either trained on previous documents from the stream or on out-of-domain sources). To obtain the context of two entities in a cooccurrence instance, we have to consider the occurrences of terms around and between the entities or terms that correspond to the nodes of the edge. To this end, we first define a context window around a cooccurrence instance as a function of those entities.

Definition 5.2 (Context windowwin). Let e = (v,w,i) be an edge. Then we define win : E_p → 2S as a function that maps an edge to a set of sentences. Specifically, letdi be the

document containing the edge-inducing instancei, then

win(v, w, i) := {s ∈ S | s ∈ di∧ς (v, i) ≤ ς (s) ≤ ς (w, i)}, (5.4) whereς (v, i) ≤ ς (w, i) without loss of generality.

Thus,win(v, w, i) consists of the sentences containing v and w, and all sentences in between. Based on this context window, we can define the context of an edge as the normalized sum of all embeddings of the terms in the context window.

5 Dynamic Implicit Entity Networks

Figure 5.1: Schematic view of edge extraction and aggregation in the dynamic implicit network model. From an input document, edges between entitiesv and w are extracted with the vector-valued context embeddingκ, the cooccurrence distance δ and the document timestampτ as attributes. If edges between the same entities re-occur in a similar context, these edges are then merged and the attributes are combined to obtain the aggre- gated attributes.

Definition 5.3 (Context embeddingκ). Let e = (v,w,i) be an edge, and let win(v,w,i) be the corresponding context window. Then the context functionκ : Ep→ Rk is defined as

κ(v, w, i) := X s ∈win(v,w,i ) X t ∈s ε (t ) |win(v, w, i)|, (5.5)

where |win(v, w, i)| denotes the number of terms in the context window.

The removal of stop words and the restriction to content words is feasible in this step to reduce noise. For each edgee = (v,w,i), we store κ(e) as an attribute that can later be used to identify pairs of entities that appear in similar contexts. For a schematic overview of the model and the differences to the static approach, see Figure5.1.

In document Implicit Entity Networks: A Versatile Document Model (Page 135-138)