Marcin Kuli ski, Jerzy Grobelny Politechnika Wrocławska
Users’ preferences as a basis for designing a human– computer interface.
Keywords: human-computer interaction, software engineering,
software usability, user interface, Conjoint Analysis.
Connection between quantitative metrics, like time to complete a task or number of errors made, and subjective overall quality of given interface is discussed. An exemplary experiment based on usability questionnaire and task completion time recording is described. Usage of Conjoint Analysis as a tool for decomposing subjective interface usability and estimating importance of its respective attributes is shown. Preliminary results obtained by the authors are also presented and discussed.
Preferencje u ytkowników jako podstawa projektowania interfejsu komputerowego.
Słowa kluczowe: interakcja człowiek-komputer, in ynieria
oprogramowania, jako u ytkowa oprogramowania, interfejs u ytkownika, analiza Conjoint.
W dokumencie omówiony został zwi zek pomi dzy metrykami ilo ciowymi oprogramowania, takimi jak czas wykonania zadania czy liczba popełnionych w trakcie jego wykonywania bł dów, a subiektywnie i cało ciowo postrzegan jako ci jego interfejsu. Opisano przykładowy eksperyment oparty na kwestionariuszu oceniaj cym jako u ytkow oraz rejestrowaniu czasu wykonania zadania. Przedstawiono sposób zastosowania analizy Conjoint w procesie dekompozycji subiektywnie postrzeganej u yteczno ci
całkowitej na u yteczno ci cz stkowe i szacowania
przypisywanych im przez u ytkowników wag. Zaprezentowano równie wst pne wyniki bada uzyskane opisan metod .
Quantitative software metrics, e.g. time to complete a task or number of errors made, are thought to be in close connection with user's satisfaction constituted during process of work with specific
software. In fact they are incorporated into ease of use, which is
one of the dimensions of overall usability [1]. ISO 9241-11 norm [2] defines 3 main criteria, which should be used for describing perceived perfection of a software product, and those mentioned
metrics belong to one of them, named effectiveness (remaining two
are efficiency and satisfaction). Nevertheless a simple associating a piece of software, which makes possible to carry out a specified task quickly, with one positively perceived by users may lead to a pitfall. It should be remembered that many aspects of software usability have their roots in human factors, which are not as straightforward and objective, as one might think. The authors have tried to investigate the way, in which time needed to complete a task and perceived overall usability are linked together in case of
a simple point and click interface.
11 students participated in this study. Three basic window layouts were chosen, namely a horizontal, a square, and a vertical one. Furthermore, each layout exists in 3 versions because of different icon sizes used. Small icons are 8 millimeters tall squares, medium - 12, and big ones 16 millimeters respectively (all sizes were measured using a 17" CRT monitor with resolution set to 1024x768 pixels). In the picture below the layouts with medium icons are presented.
Considering every combination of icon sizes and layouts, there are 9 different interfaces. A single experiment consists of 10 consecutive queries, presented at the center of the screen. Every query points at a specific icon, which should be localised and then selected as fast, as possible. At this point a window with icons stays hidden. An example query is shown in the following picture.
Figure 2: An exemplary query.
After user's confirmation a query window disappears, and a window with icons emerges, which is shown below.
Figure 3: A window with icons appears after query confirmation.
The time between confirmation and specified icon selection is measured and stored. After that the window hides and then a next query is presented on the screen. During examination every experiment is run twice, thus providing a sample of 20 measured times for each interface.
Just after performing the tasks described above, a questionnaire for acquiring subjective preferences was used. Interfaces were compared in random pairs using 7-point, two sided metric scale with a neutral point in the middle. Marking any square on the right means that the right one interface is preferred over the left one, and vice versa. The more distant square is marked, the stronger the preference is. Here is a fragment of the questionnaire.
Figure 4: A fragment of the questionnaire used after the experiment.
At this point participants were assisted by a set of slides presented on the screen. Every slide contained a pair of interfaces being judged, and the full set consisted of all 36 possible pairs. A single, exemplary slide is presented below.
After filling up, each column of every questionnaire was attributed with a number (4, 3, 2, 1, 1/2, 1/3, and 1/4 respectively), which illustrates the strength of preference while comparing an interface from the left with one from the right. A normalized sum of partial preferences may be considered as an overall usability rating of specified interface against the background of remaining ones, and
is called total utility, in accordance with Conjoint Analysis
methodology.
The significance level of mean time differences was computed using Student's t test. While testing pairs with different layouts and the same icon size only two tests of 9 indicated significant difference: horizontal window with small icons turned to be considerably faster in use than vertical window with small icons (p<=0.009); similarly square window with big icons was faster than vertical one (p<=0.007). Among 9 tests of pairs with different icon sizes and the same layout more than a half showed some relationship between icon size and time needed to complete the task. Precisely speaking shorter times were obtained with use of bigger icons: horizontal window with small icons was slower than one with big icons (p<=0.009), the same for square-small versus square-big (p<=0.00008), square-medium versus square-big (p<=0.002), vertical-small versus vertical-medium (p<=0.003), and vertical-small versus vertical-big (p<=0.003). Overall there is a weak evidence that windows with vertical layouts are operated slower than horizontal and square ones, and a stronger evidence that usage of bigger icons cuts down the time to complete a task. Below are the graphs illustrating these statistically significant differences within specified pairs.
1, 27 1, 07 1, 27 1, 36 1, 25 1, 39 1, 39 1, 39 1, 23 1, 12 1, 07 1, 07 1, 23 1, 23 0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 ho riz on ta l-s m al l v s. ve rti ca l-s m al l sq ua re -b ig v s. ve rti ca l-b ig ho riz on ta l-s m al l v s. ho riz on ta l-b ig sq ua re -s m al l v s. sq ua re -b ig sq ua re -m ed iu m v s. sq ua re -b ig ve rti ca l-s m al l v s. ve rti ca l-m ed iu m ve rti ca l-s m al l v s. ve rti ca l-b ig M ea n tim e [s ec ]
Figure 6: Statistically significant differences in execution time.
Mann-Whitney U test was used to calculate the significance level of differences in mean subjective usability ratings obtained via the questionnaire. There are two results that indicates significant differences in groups of ratings with the same icon size and different layouts: window with vertical layout and medium icons turned to be less preferred than one with horizontal (p<=0.02), and square layout (p<=0.04). In case of groups with the same layout and different icon sizes 5 tests shows a significant difference in mean of users ratings. Horizontal window with small icons was less preferred than horizontal window with medium icons (p<=0.001), the same result was found for small icons versus medium icons with square layout (p<=0.004), and vertical one (p<=0.02). On the contrary, among ratings of medium and big icons with horizontal and vertical layouts, the bigger ones were perceived as less usable (p<=0.003 and 0.04 respectively). Generally speaking, there was a strong preference in favour of medium icons; additionally square or horizontal layouts were more appreciated than vertical layout,
although in that case the results aren't explicit. The following graphs shows statistically significant differences in ratings within pairs mentioned here.
0, 17 0,17 0, 08 0, 08 0, 07 0, 17 0, 12 0, 12 0, 12 0, 17 0,17 0, 12 0, 10 0, 08 0 0,02 0,04 0,06 0,08 0,1 0,12 0,14 0,16 0,18 ho riz on ta l-m ed iu m v s. ve rti ca l-m ed iu m sq ua re -m ed iu m v s. ve rti ca l-m ed iu m ho riz on ta l-s m al l v s. ho riz on ta l-m ed iu m sq ua re -s m al l v s. sq ua re -m ed iu m ve rti ca l-s m al l v s. ve rti ca l-m ed iu m ho riz on ta l-m ed iu m v s. ho riz on ta l-b ig ve rti ca l-m ed iu m v s. ve rti ca l-b ig M ea n us ab ili ty ra tin g
Figure 7: Statistically significant differences in usability ratings.
Considering individual participants and their results, the correlation between time to complete the task, and subjective overall quality derived from the questionnaire of preferences varies greatly. 7 persons obtained a negative correlation ranged from r=-0.18 to r=-0.69, while 4 others had a positive correlation between r=0.08 and r=0.84. The latter means that some people actually preferred versions of the interface which had needed more time to find, point and click desired icon. According to these results, usually mentioned quantitative metrics, like time needed to complete a task, probably can not be used in assessment of interface usability alone, at least in case of interfaces and tasks similar to these presented in this paper.
Next, the linear regression model was constructed. Mean subjective usability rating became a dependent (Y) variable. Icon size, distance (measured from the center of the screen to the center of every examined window), and layout were selected to be independent (X) variables. In case of layout there is a need of special treatment forced by statistical calculations involved. The square layout was chosen to be a relative base for others layouts,
and two artificial binary variables, Xh and Xv, were introduced to
reflect the presence of horizontal and vertical icon arrangement respectively. Using this technique it is possible to encode existence
of horizontal (Xh=1, Xv=0), square (Xh=0, Xv=0), or vertical (Xh=0,
Xv=1) icons layout into the model, thus obtaining a relative
influence of horizontal and vertical arrangement, as compared to "neutral", square one. After estimating the coefficients of every independent variable, the model looks as follows:
Y = 0.9 - 0.037 * X1 - 0.056 * X2 - 0.051 * Xh + 0.023 * Xv
where:
X1 - icon size,
X2 - distance.
Because of a low coefficient of determination (R2=0.48) this model
can not be accepted as a good one, there is a need of tune it up by incorporating another independent variable. Index of difficulty, originally introduced in 1954 by Fitts [3], was chosen. It binds together size of a target and amplitude of a movement in a way which reflects quantity of information needed to be processed to complete a movement task. The longer an amplitude or smaller a target is, the more difficult pointing it becomes. An improved version of the model is presented below:
Y = 3.34 - 0.8 * X1 + 0.05 * X2 - 0.65 * X3 - 0.03 * Xh - 0.02 * Xv
where:
X1 - icon size,
X2 - distance,
In this case the coefficient of determination is higher (R2=0.98). According to Conjoint Analysis methodology, individual estimated coefficients reflect partial utilities of respective attributes, presented in the model as independent variables. Following this interpretation it becomes noticeable, that bigger icon size or higher index of difficulty leads to lowering perceived overall usability, while longer distance rises it. The presence of both horizontal and vertical layouts has an adverse effect on overall usability, however a very marginal one.
Putting all the results together it becomes visible that participants have preferred interfaces, which were actually slower in use, namely those with smaller icons and longer distance of movement. Regarding size of icons it should be pointed out that this relationship is probably of nonlinear nature, as numerous results of Mann-Whitney U tests suggest. The smallest icons used may be less legible than medium ones, thus the latter were more preferred. Similarly the biggest icons may be responsible for some perturbations during window scanning process (e.g. more eye fixations to localise and recognize specific icon), in this way degrading perceived usability. In case of different layouts a significant role of users habits can not be excluded. Users who usually work with a typical office applications (e.g. MS Word or Excel, which have horizontal toolbars in default interface configurations) may perceive horizontal icons layout as more known and usable, thus discriminate vertical one, in contrast to users experienced in graphical applications (e.g. Adobe Photoshop or Corel Draw), which have a fair amount of vertical toolbars. The participants involved in presented experiment are considered to be a members of the former group. Therefore a more detailed studies involving larger sample of data are needed (actually they are being performed at the time of writing this paper) to determine the exact character of described phenomenon.
[1] Górski, J.: In ynieria oprogramowania w projekcie informatycznym. MIKOM, Warszawa 1999
[2] ISO 9241-11: Ergonomic requirements for office work with visual display terminals (VDTs), Part 11: Guidance on usability. 1998
[3] Fitts, P. M.: The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, vol. 47, pp. 381-391. 1954