There are two contexts for the Task 2,{lgv:Stadium} and {dbpediaowl:College}. Be- fore the study, we recognized that identifying errors is a much more difficult task for
non-expert users, and it is well known that people might sometimes disagree on the On-
tology models. Thus we told the participants when they started Task 2 that the answers
would be very subjective, and they did not need to provide a perfect and complete list of
errors. They only need to record any classes that they think are weird and suspicious.
We observed that most participants felt unconfident about some tags during this task,
and they tended to skip a few following tags after they saw some tag that made them
hesitant or frustrated. We also noticed that the native English speakers (5 participants
including 2 in the trial) were obviously quicker at reading the tags and thus usually quicker
at completing this task.
We examined all the answers, and checked if a participant had found any true errors.
Again, the errors can be very subjective, some can be very controversial and we only con-
sider the ones that are definitely incorrect. For example, in displays for the stadium class,
there are classes of teams, which are mistakenly considered the same as their stadiums. In
the display for the college class, we find classes such as websites, companies, and person.
After examining all the answers, we find 7 of 14 (50%) participants found some true errors
in Task 2.1, and 14 of 14 (100%) found some true errors in Task 2.2. We think one reason
to explain why Task 2.2 is answered better is that there are more errors in the college
errors (the ones that seem to be an error but in fact are not) that distract participants’
effort. We will analyze the false errors later in this section.
We also examined how well the participants completed Task 2.1 with both systems2.
Ten participants used TC and four used HL for Task 2.1. We find 4 of 10 (40%) succeeded
in finding some true errors using TC, and 3 of 4 (75%) succeeded using HL. Although the
sample size is small, we think the results are reasonable in some sense. We also get some
comments submitted by the participants which explain the problems with using TC for
this task.
“The problems is the errors may always occur in those tags that is quite small.
If I want to find those errors as many as I can, that is not a good experience.”
“When the words are smaller it hurts your eyes to search through a list that
is not indented in the same manner.”
“For task group 2, tag cloud based scheme is awkward. It is difficult to locate
a detailed stuff based on words flushing the whole screen.”
While TC makes larger tags more noticeable to users, it also makes the smaller tags
less likely to get noticed. Also because of the layout, users may fatigue more easily while
reading through the various sized tags on a page. In contrast, the tags in HL are well
aligned, and users can quickly go through the list that start with some string if they
2This analysis was not done for Task 2.2, as both system resulted in successful task
think these tags can be skimmed for the task. For example, there are a lot of tags like
“UniversityIn...” and users can usually go through the list more quickly.
We do not report the time spent in this task for several reasons. First, the questions
in this task are open questions, and participants were allowed to stop at any time they
felt they had produced sufficient answers. During the study, we also found that when they
felt frustrated or tired, some participants would move on to the next task or step, while
some others would take a short break and then resume the task. We also noticed that a
participant was very likely to be affected if he/she found the other participant in the same
group had completed all the questions and submitted the form.
Now we discuss the false errors with examples from participants’ answers. The first
category are the classes that are too abstract. For example, classes such as schema:Thing,
pos:SpatialThing, foaf:Agent are very frequently recorded as errors. Usually experts
or users with experience of Semantic Web will find it very natural to have these abstract
classes in the high levels of the ontology hierarchy. However, from this study we find that
most non-expert users, including participants with computer science background, will
have difficulty understanding these classes without proper tutorials. The second category
are the classes that are designed in unusual ways. For example, we find some classes
from freebase like freebase:common.topic, umbel:Attributes, gml: Feature are also
very frequently chosen. Few people understand what these classes mean, but also feel
uncertain to claim those are wrong. These classes can be very dependent on the domain
most users even including some experienced Semantic Web users. The third category are
the classes that represent categories of related topics. For examples there are many classes
from yago that use the names of athletic teams as the local name of that class, such as
yago:SamsungLions and yago:ChicagoWolves. These classes are usually considered as
errors, if the participants did not realize that those are the team names representing their
stadiums.
We wonder whether improving the interface could help reduce any misunderstandings.
We think representing the hierarchy of classes is useful for clarifying the meaning of ab-
stract classes. Although subsumption relationship were indicated in both systems, we
noticed that participants were more likely to understand the folder-like representation for
the hierarchy, compared with the gray-colored tags shown in the tag cloud which were used
to indicate super tags of the context. Another idea is that probably we should develop
some algorithm to discover or some syntax to denote those abstract classes, and hide them
from the non-expert users.