• No results found

Using the DRa annotation system we can capture the role that a passage of text plays in a document. More importantly, we also capture the relationship that the role of an annotated chunk of text imposes on the rest of the document or other chunks of text. This leads to an automatic generation of a visible dependency graph for the text (see Figure 4.6), where relations between parts of text are represented

by visible arrows and graph nodes have specified (but not visible) mathematical rhetorical or structural roles [KMRW07a].

From the annotated narrative aspect of a document we receive a dependency graph (DG) between the chunks of text in the document (e.g., see the left hand side of Figure 4.11). Those dependencies play an important role in the mathematical knowledge representation. Thanks to those dependencies, the reader can find his own way while reading the original text without the need to understand all its subtleties. Moreover, we will show in Section 5.3 that these dependencies give the ability to structure the skeleton of a document in the formal language Mizar (see

4.5.1

The Definition of a DRa dependency graph

As mentioned above, the MathLang user annotates the document himself. When annotating the DRa he has to annotate explicitly those passages of text that he found useful and that play some rhetorical mathematical or structural role within the document. At the same time he annotates dependencies between a number of boxes and passages of text. This, as described above, provides a dependency graph. In this section we describe the dependency graph in a mathematical way.

G = (V, A, E) where A⊆ V × (MR ∪ SR), E ⊆ V × Ld× V

V ={n | n = nodeId} – set of vertices

A ={a | a = (n, r) ∧ r ∈ MR ∪ SR ∧ MR ∩ SR = ∅} – set of vertices attributes E ={e | e = (nsrc, α, nanch) ∧ nsrc, nanch ∈ V ∧ α ∈ Ld} – set of edges

where

Ld = {relatesTo, justifies, subpartOf, uses, inconsistentWith, exemplifies} – the set of

allowed labels in a dependency graph

MR – the set of MathematicalRhetoricalRoles, cf. Table 4.5 SR – the set of StructuralRhetoricalRoles, cf. Table 4.5

nodeId – a unique name/identifier given by the user while wrapping the text with boxes

Figure 4.10: The formal definition of the dependency graph.

The DRa dependency graph is a directed graph with labelled edges. We repre- sent it as a triple DG = (V, A, E) of sets, such that A ⊆ V × (MR ∪ SR) and set E ⊆ V × Ld× V . The elements of set V are the vertices (or nodes, or points) of

the graph DG, the elements of A are attributes of nodes and the elements of E are labelled edges of the graph.

around chunks of text. For example, a box annotated with letter a (or b, etc.) on the right hand side of Figure 4.1. Within the definition of the DG the node is annotated as n∈ V .

Attributes of nodes are expressed in terms of division elements and mathematical units. The elements of set A are pairs a = (n, r) where r ∈ SR∪MR is either a struc- tural (i.e., SR) or a mathematical (i.e., MR) rhetorical role played by the wrapped chunk of text (annotated as node n) in the document. It is also important to note that structural and mathematical rhetorical roles are two disjoint sets SR∩MR = ∅, where each of which is a collection of instances of the class StructuralRhetoricalRole and MathematicalRhetoricalRole respectively. The attribute is assigned to a node by using relations hasStructuralRhetoricalRole or hasMathematicalRhetoricalRole in the RDF triples. For example, we can annotate that the unique node a, on the right hand side of Figure 4.1, plays the mathematical role ’definition’. We annotate this fact using RDF triple as follows: (a, hasMathematicalRhetoricalRole, definition).

Edges of the DG graph represent relations between annotated chunks of text within the document. Moreover, each edge is an ordered triple, where the order of nodes in the edge provides an order of the relations within the document. De- scribing it formally, the edge e ∈ E ⊆ V × Ld × V is a triple e = (nsrc, α, nanch)

where nsrc is a source node of the edge and nanch is a target/anchor node of the

same edge. The edges of the DG graph are labelled α with one of the predefined DRa relations presented in Section 4.2.2.2 (or see the table in Figure 4.5), i.e. α∈ Ld ={relatesTo, justifies, subpartOf, uses, inconsistentWith, exemplifies}. The la- bel α is annotated as a middle component of the triple and corresponds to the middle component of the RDF triple.

The above described definition of the graph is presented in Figure 4.10.

4.5.2

The automatically extracted dependency graph of a

document

The MathLang DRa aspect is annotated by the user himself. The accuracy of this annotation is left to the user’s understanding of the text. However, we believe that at the DRa level of the computerisation of the mathematical documents we can capture mistakes in the annotation of parts of the text.

As mentioned in Section 4.3 describing the DRa annotation process, the author performs the annotation with as little effort as possible. He simply has to wrap chunks of text with boxes, to uniquely name those boxes, to specify rhetorical roles

(either structural or mathematical) and finally to provide a number of relations between such annotated boxes. This extends the former MathLang computerised version of the document with vital information of dependencies of passages of text within the document.

At this level we can extract the annotated rhetorical structure of the docu- ment and present it in various formats. The extraction algorithm, see Section 6.3, provides one possible view of the document, mainly the dependency graph (DG).

In Figure 4.11 the DG for the proof of the Pythagoras’ theorem which was presented and annotated in Section 4.3.

A E F G B H I C D justifies justifies uses uses justifies uses uses subpartOf subpartOf A E F G B H I C D ≺ ≺ ≺ ≺ ≺ ≺ ≺ ≺ ≺

Figure 4.11: The DG and GoLP of the proof of the example of Pythagoras’ theorem by H. Barendregt.

On the left hand side we have the automatically generated presentation of the dependency graph constructed from the input of the mathematician in Section 4.3.1 for our example of Figure 4.8. The right hand side of the figure presents automatically generated GoLP from the dependency graph.

To easily distinguish graphs, the DG from the GoLP we use different shape for nodes, mainly square node and circlenode respectively.

A document’s dependency graph is a directed graph with labelled edges and attributes assigned to the vertices (see Figure 4.10). The vertices (resp. attributes resp. edges ) of such graph are the names of boxes (resp. mathematical or structural rhetorical roles resp. relations) specified by the user during the first (resp. second resp. third) step of the annotation of the document described in Section 4.3.1.

The left hand side of Figure 4.11 presents the dependency graph of the example of Pythagoras’ theorem by H. Barendregt (see Figure 4.6). This graph consists

of (1) relations between parts of the text which are represented by visible arrows, and (2) graph nodes which have specified (but not visible) mathematical or/and structural rhetorical roles. Dependencies between the annotated chunks of text play an important role in mathematical knowledge representation. Thanks to those dependencies, the reader finds his own way while reading the text without the need to understand all its subtleties. Moreover, we will show in the next sections that these dependencies allow one to present other views on a document, and to structure the skeleton of a document in the formal language Mizar. Dependencies graphs (and their views as in Figure 4.11) are extracted automatically from the mathematicians’ input in Section 4.3.1.