Retaining additional knowledge from the crowd

Stage 3: Automated Mediation II (Pruning)

7.2 Future Work

7.2.1 Retaining additional knowledge from the crowd

A persistent outcome the mediation process is the large amount of data that is discarded through pruning. The most common outcome of mediation is for a submitted concept model to be removed when it fails to gain the support needed to to be either minority or majority adopted. In general, the mediation protocols are biased towards the exclusion of concepts as opposed to inclusion. This is because the effect of a false-positive (in- valid classes that are majority adopted and therefore not subject to knowledge engineer scrutiny) would undermine the entire approach by formalising erroneous concepts and producing flawed ontologies.

In its present form, the mediation process produces core components of the desired domain ontology consisting of the most easily agreed-upon concept models. For this reason, the mediation process is primarily useful for building the core module of an ontology. While the creation of comprehensive and robust domain ontologies is probably beyond the abilities of crowds at present, much work could be done towards obtaining greater coverage and identifying valid concepts before they are discarded. This is particularly relevant when additional processes are being used to reduce the number of mediation questions being produced. With the incorporation and development of

Mammal Cat Lion is-a is-a Mammal Lion is-a

Figure 7.3: Concept Granularity

additional processes that automate the exclusion of concept models (such as the use of WordNet demonstrated in Experiment 2) parallel development of tools that help recognise potentially valid concept models is required.

Developing a two-stage application of this approach, in which a core ontology is produced first and then extended using the same approach at a later point, may also be possible. The core ontology created in first stage would perform a similar function in the second stage as the seed ontology in Experiment 3. In essence the first stage ontology would become the seed ontology of the second stage ontology. While some thought would have to be invested into how exactly this might work, the adaptability of the protocols would make this a promising area of development.

7.2.2 Determining types of conflict

A potential problem occurs when two competing concept models exist, both of which are semantically correct, yet one is discarded in favour of the other. For example, in reference to Figure 7.3, we can see that two competing concept models (Lion is a Cat

and Lion is a Mammal) could be marked as being in conflict, yet no conflict really exists. This is an issue of granularity, whereby the correct concept model would depend on what level of detail is intended to be represented. To some extent, because both competing concept models are correct, they are both likely to receive support meaning that majority adoption thresholds would not be met and therefore both concept models would be retained for manual resolution (minority adoption). This is evident in the results of the either/or questions in Experiment 2 (see Table 5.8) where conflicts, such as “is album a record or recording” fail to gain majority adoption. While identifying these cases and retaining them is useful, more work needs to be done on automating this process to reduce the burden on the knowledge engineer. To achieve this, the precise nature of what constitutes a conflict would need to be developed further with additional protocols to determine how these conflicts are resolved and what information should be retained. By finding ways to do this, knowledge engineers will only be presented with conflicts that genuinely require expert-input to resolve.

7.2.3 Overloading the is-a relation

In Experiment 3 a folder analogy was used to help task-unaware crowds classify concepts. While this was useful as a device to lead unaware crowds towards performing classifica- tion, it has lead to a forced simplification over what relationships can be defined. For example, users are forced to classify mereological (part-of) relationships in the same way as they would is-a relationships. So, if developing a motoring domain ontology, an engine could reasonably be placed inside the folder representing theautomobile concept, yetengines are not a type ofautomobile. This is not something that is easy to remedy, but in geospatial domains it may be possible to incorporate some simple rules that would better determine the precise nature of inter-concept relationships. Using Region Con- nection Calculus (RCC)[91], for example, could provide a standard way to describe the precise nature of the relationships between concepts that would typically have a spatial profile. RCC defines the types of spatial relationships that two regions can have. Using RCC you could, for instance, determine that a buildings is contained entirely within an area (such as a department building within a campus) and would therefore have awithin

relationship rather than a simplistic and erroneous is-a relationship.

As with all crowdsourcing endeavours, the user-experience is essential in obtaining ideal behaviour from the crowd. Further experimentation is required to determine what additional mechanisms could be incorporated into this approach that would encourage an improvement in the quality of crowd input. Using more sophisticated behavioural prompting to promote useful crowd behaviour, in conjunction with guidance from a set of logical relationship rules, would be one way of resolving this issue.

In document Crowsdsourcing semantic web (Page 146-148)