• No results found

There are two contexts for the Task 2,{lgv:Stadium} and {dbpediaowl:College}. Be- fore the study, we recognized that identifying errors is a much more difficult task for

non-expert users, and it is well known that people might sometimes disagree on the On-

tology models. Thus we told the participants when they started Task 2 that the answers

would be very subjective, and they did not need to provide a perfect and complete list of

errors. They only need to record any classes that they think are weird and suspicious.

We observed that most participants felt unconfident about some tags during this task,

and they tended to skip a few following tags after they saw some tag that made them

hesitant or frustrated. We also noticed that the native English speakers (5 participants

including 2 in the trial) were obviously quicker at reading the tags and thus usually quicker

at completing this task.

We examined all the answers, and checked if a participant had found any true errors.

Again, the errors can be very subjective, some can be very controversial and we only con-

sider the ones that are definitely incorrect. For example, in displays for the stadium class,

there are classes of teams, which are mistakenly considered the same as their stadiums. In

the display for the college class, we find classes such as websites, companies, and person.

After examining all the answers, we find 7 of 14 (50%) participants found some true errors

in Task 2.1, and 14 of 14 (100%) found some true errors in Task 2.2. We think one reason

to explain why Task 2.2 is answered better is that there are more errors in the college

errors (the ones that seem to be an error but in fact are not) that distract participants’

effort. We will analyze the false errors later in this section.

We also examined how well the participants completed Task 2.1 with both systems2.

Ten participants used TC and four used HL for Task 2.1. We find 4 of 10 (40%) succeeded

in finding some true errors using TC, and 3 of 4 (75%) succeeded using HL. Although the

sample size is small, we think the results are reasonable in some sense. We also get some

comments submitted by the participants which explain the problems with using TC for

this task.

“The problems is the errors may always occur in those tags that is quite small.

If I want to find those errors as many as I can, that is not a good experience.”

“When the words are smaller it hurts your eyes to search through a list that

is not indented in the same manner.”

“For task group 2, tag cloud based scheme is awkward. It is difficult to locate

a detailed stuff based on words flushing the whole screen.”

While TC makes larger tags more noticeable to users, it also makes the smaller tags

less likely to get noticed. Also because of the layout, users may fatigue more easily while

reading through the various sized tags on a page. In contrast, the tags in HL are well

aligned, and users can quickly go through the list that start with some string if they

2This analysis was not done for Task 2.2, as both system resulted in successful task

think these tags can be skimmed for the task. For example, there are a lot of tags like

“UniversityIn...” and users can usually go through the list more quickly.

We do not report the time spent in this task for several reasons. First, the questions

in this task are open questions, and participants were allowed to stop at any time they

felt they had produced sufficient answers. During the study, we also found that when they

felt frustrated or tired, some participants would move on to the next task or step, while

some others would take a short break and then resume the task. We also noticed that a

participant was very likely to be affected if he/she found the other participant in the same

group had completed all the questions and submitted the form.

Now we discuss the false errors with examples from participants’ answers. The first

category are the classes that are too abstract. For example, classes such as schema:Thing,

pos:SpatialThing, foaf:Agent are very frequently recorded as errors. Usually experts

or users with experience of Semantic Web will find it very natural to have these abstract

classes in the high levels of the ontology hierarchy. However, from this study we find that

most non-expert users, including participants with computer science background, will

have difficulty understanding these classes without proper tutorials. The second category

are the classes that are designed in unusual ways. For example, we find some classes

from freebase like freebase:common.topic, umbel:Attributes, gml: Feature are also

very frequently chosen. Few people understand what these classes mean, but also feel

uncertain to claim those are wrong. These classes can be very dependent on the domain

most users even including some experienced Semantic Web users. The third category are

the classes that represent categories of related topics. For examples there are many classes

from yago that use the names of athletic teams as the local name of that class, such as

yago:SamsungLions and yago:ChicagoWolves. These classes are usually considered as

errors, if the participants did not realize that those are the team names representing their

stadiums.

We wonder whether improving the interface could help reduce any misunderstandings.

We think representing the hierarchy of classes is useful for clarifying the meaning of ab-

stract classes. Although subsumption relationship were indicated in both systems, we

noticed that participants were more likely to understand the folder-like representation for

the hierarchy, compared with the gray-colored tags shown in the tag cloud which were used

to indicate super tags of the context. Another idea is that probably we should develop

some algorithm to discover or some syntax to denote those abstract classes, and hide them

from the non-expert users.