Automatic Guessing of Attribute Value Lists

In document Automation of Common Cause Analysis: monitoring improvement measure perfomance and predicting accidents (Page 109-114)

and if the values are the same. Two factors are equal if they are of the same type and if for each attribute value pair in one factor there is exactly one equal attribute value pair in the other factor.

For any deviation from total equality a dissimilarity score can be awarded, based on rules. All factor pairs which are over a set dissimilarity threshold are not considered similar enough to qualify as equal factors for the purpose of 9.2 (p. 91). Rulesets should take into account the number of cases to consider and the number of tolerable near hits and near misses.

11.2

Automatic Guessing of Attribute Value Lists

Automatic guessing of the attribute value lists, based on the natural lan- guage factor statements, is a function intended to help the author of a WBG to prepare a WBG for use in 9 (p. 85).

11.2.1 Guessing Actors, Actions, Omitted Actions, Objects

and Properties

Guessing the correct attributes is done by a multi-pass scoring algorithm. The algorithm relies on a dictionary which was derived from an English language thesaurus ??.

Pass 1 The first pass of the algorithm determines the predominant tense the factors of a graph are stated in. This helps in determining word types in cases where there are multiple possibilities. If the predominant tense is simple present tense for example the odds that a verb encountered in simple past form is an action are lowered. Instead it becomes more likely that a simple past verb form acts as an adjective.

In a sentence like ”The aircraft flies damaged” the knowledge that the predominant tense is simple present tense lets the heuristic rule in favour of ”damaged” as an adjective.

Pass 2 The second pass looks for signs of non-atomicity. The algorithm works on the assumption that factor texts are atomic (11.1.1). If there are indications that a factor text is non-atomic, then the user will be reminded to check the atomicity of the factor text statement. For example, phrases like ”following” or ”because of” are indications that the factor text is a con- junction of more than one atomic statement. While there may be instances where non-atomic statements are necessary, all causal relations should only be expressed through the semantics of the WBG.

Pass 3 The third pass establishes which words in a factor are composite words. They are marked as such and will be treated like singular words in later passes.

Some composite words can be inferred from the thesaurus used (). Fre- quently co-occurring words within a WBG will be treated as composite words.

Pass 4 The fourth pass simply finds out all the possibilities of word types that a word can have. For example the word ”pilots” could mean a plural noun, the pilots of an aircraft, or a simple present tense verb, the action of someone at steering an aircraft. The words or phrases are compared with an a dictionary of word forms. Each time a match is found the word form is registered.

For example the sentence ”Pilots fly aircraft” would be classified as fol- lows in pass 4:

pilots plural noun simple present verb fly singular noun simple present verb aircraft singular noun plural noun

Pass 5 The fifth pass establishes for each factor the likelihood it is an ACTOR, ACTION, OMITTED ACTION, OBJECT or PROPERTY. The fifth pass is the first which takes into consideration the order and context in which words appear in a factor text, e.g.:

• In an english sentence Actors come first, then Actions then Objects. • Adjectives and articles indicate Actors and Objects, while adverbs

indicate verbs.

• If a word is the only noun/verb/etc. in a factor text it must fill a specific role. The only noun must be the subject, the only verb must be the predicate.

• Statements with auxiliary verbs as only verbs indicate state descrip- tions.

• The phrase structure noun phrase - verb - adjective indicates state descriptions.

• An auxiliary verb followed by ”not” indicates omission phrased as omitted actions like in ”pilots do not take off”.

• ”No” before a noun indicates omissions by absence of an actor like in ”no pilot takes off”.

11.2. AUTOMATIC GUESSING OF ATTRIBUTE VALUE LISTS 111 The heuristics are helped by the explicitly set factor types each factor has.

If a word fits a heuristic it is awarded a score to indicate its probability to fit a specific role in a sentence.

Pass 6 Finally, based on the scores awarded, the attributes and values are put together. Subjects of sentences will be ACTOR values in actions and omissions and OBJECT values in state descriptions. Predicates of sentences will be ACTION value in actions and OMITTED ACTION values in omis- sions. Objects of sentences will be OBJECT values in actions and omissions. Adjectives of state description sentences will be PROPERTY values. The words are transformed to their base form to make them comparable with other attribute value lists form factors in other WBGs. ACTOR and OB- JECT values are singular nouns, ACTION and OMITTED ACTION values are infinitive verb forms without the leading ”to” and PROPERTY values are adjectives in their normal form.

Chapter 12

Why-Because Analysis

Software Toolkit Design

Choices

The toolkit’s reference manual can be found in B (147). How WBAs can be conducted using the toolkit is explained in the Why-Because Analysis introduction in A (139).

The toolkit aids in authoring Why-Because Analyses, storing and man- aging Why-Because Graphs and offers automatic common cause analysis. This chapter describes the architecture, selected festures and design choices of the toolkit.

12.1

Constituent Systems

The software is written mainly using the programming language Java [13]. Java is, with some exceptions, operating system independent. The toolkit uses the dot [46] rendering engine, which is part of the Graphviz [17] project. The toolkit depends on the presence of dot, which is not written in Java, but is available for all popular operating systems. The toolkit will use dot from a locally installed Graphviz instance. The toolkit interfaces with the dot rendering engine using Grappa[42], a Java library.

The toolkit also needs an SQL database if automatic common cause analysis is required. The user can choose which SQL database to use, but has to configure the SQL interface of the toolkit, which can be done by an XML file. The toolkit uses the Java library MyBatis[53], with which the chosen SQL database has to be compatible.

In document Automation of Common Cause Analysis: monitoring improvement measure perfomance and predicting accidents (Page 109-114)