Introduction What is Usability? Brief History of Usability Testing at Windsor What is User Centered Design?

(1)

Introduction

What is Usability?

Usability can be defined has been defined in a number of ways by various specialist in the area. All these definition include common elements to describe usability and usability testing. At the University of Windsor we define usability as a measurement of effectiveness, efficiency and ease of use of an application or website. For more information about usability, including a deeper definition, visit Jakob Neilsen’s website useit.com. (http://www.useit.com/alertbox/20030825.html)

Brief History of Usability Testing at Windsor

Over the past year at the Information Technology Services (ITS) department of the University of Windsor has tried a number of different methods to improve the usability of the university’s website. The methodology has evolved using a variety of practices into a deeper understand of why the university needs to do usability testing on a more serious level. There are many benefits to having a usable website; from a University point of view recruitment and retention present the biggest benefits. Spending time critically thinking about the layout and design of the website could make it easier for a potential student to use and hopefully make them want to apply in order to attend the University.

The Web Group at IT Services has been working very hard to implement feasible practices of user centered design (UCD) into their daily practices. The first usability study that the Web Group performed took place summer of 2005 and relied heavily on statistical evidence, such as task times, to indicate areas that needed improvement by comparing task times on the website when users used the search functions and when the were strictly not allowed to use the search. The more recent usability studies of winter 2006 take a different approach and rely on subjective evidence, such as observable behavior, to show where and how we should improve the website. Moving from the early quantitative studies to the current qualitative studies was put into motion based on different practices and ideas of well known experts of the field, such as Eric Shaffer.

Many experts debate about the merits of the two approaches to usability testing. This report will outline both types of studies, their strengths and their weaknesses, and how they work with the UCD principles that the department is working towards. To do this we need an understanding of what is involved in UCD principles and what goes into a usability test at the University of Windsor.

What is User Centered Design?

User centered design, UCD, is a methodology for the process of developing and designing new products for the market. The core idea behind UCD is developing for the user and incorporating the user in the development process. For example, at the University of Windsor, when creating a new website, the incorporation of UCD methods involves asking for target audience¹ members for opinions and feedback at various stages of development. UCD is a method growing in popularity; major companies like IBM focus on using UCD methods to produce better products. For more information about UCD, IBM has produced an article about ease of use and user centered design.

(http://www.lieb.com/Readings/IBMUserCentered.pdf). The University of Windsor is in the process of incorporating UCD methods into every new project they start. It involves

(2)

more time and patience but should produce higher quality results. There are 6 general principles to the UCD process;

• Set the goals; what is the purpose of the site and who is going to use it?

• Knowing the users; develop stereotypical profiles of a typical user of the website.

• Knowing the enemy; what does the competition do and how are they presenting their website?

• Think total user experience; picture how everything you do to the project will affect the user.

• Evaluate – early and often; continually ask for feedback from the target audience group. Starting with paper prototypes to final product.

• Continue to involve the user; over time users change so continuing to test user on the site is key to always having a great site.

Keeping in mind that the goal is to accomplish all 6 of these steps in as little time as possible, the Web Group requires efficient and effective evaluation methods. The best results in the shortest amount of time.

Methods

Equipment

There was different equipment used for both studies. Both studies captured audio;

the first used an analog tape recorder, the second captured audio digitally through a webcam microphone. Perhaps the largest difference was the software used by each study.

The more quantitative study (herein referred to as study A) used a program called I- SpyNow to collect the necessary data. The more qualitative study, (herein referred to as study B), used a software system called Morae created by TechSmith.

I-SpyNow

The program I-SpyNow is a program designed to monitor internet activity. It records the time and location of every website visited in Internet Explorer on a computer with the software installed. It will also record keystrokes, instant messenger conversations, and any application activity. The results are stored online and in table format. During the usability study, after participants completed the tasks online, the moderator would take the table and develop the results to be used. Figure 2.1 is a view of the I-SpyNow website showing a partial view of the table. To develop the results, the moderator would: 1. determine the starting location 2. determine the ending location 3 calculate the time in between, number of clicks, and number of searches used. All the results were calculated by hand.

The software had a few positive aspects, including the relatively low cost and the easy implementation. The licensing fee on this software was very little making it ideal to start testing right away with it, it allowed us to learn what we were capable of with out committing too much. However, it was very much a starter program. There were many aspects of the program that could have been improved. These include compatibility with other browsers and the handling of data among other minor details. Both of these aspects are crucial to a good usability test and needed to be improved for the next study. The previous work term report goes into more detail about this software and how it was used.

(3)

Figure 2.1 – I-SpyNow Interface This is the website for the I-SpyNow software.

Morae by TechSmith

The Morae software system is designed for any application usability testing. It is a set of three programs designed to capture the user’s experience, evaluate the experience and present the experience to others. The system will capture audio, video, screen capture, key capture and mouse clicks. It then allows the moderator to select and present certain clips of the test or find the time to complete each task easily. Figure 2.2 shows how this system works. The volunteer sits at the Test PC – computer (1). Any number of optional observers can watch volunteers as they perform the test. These observers can be located anywhere on campus and are shown as remote viewers (2) in figure 2.2. They can watch but they have no ability to interact with the volunteer. The usability specialist interacts with the volunteer and is typically located in the same room as the volunteer.

The usability specialist sits at the analyst PC (3), asks the volunteer to perform tasks, monitors the complete digital record being created as the volunteer works through the tasks, and interacts with the volunteer as required. A typical test takes between 30 and 45 minutes including the time required by the usability specialist to explain the process to the volunteer. Figure 2.3 shows the Morae Manager view; this is where the moderator would perform analysis on the videos.

(4)

Figure 2.2 The Morae System

Demonstration of the 3 programs working together

The Morae system was highly sophisticated and allowed for testing to be better documented. Because of the recordings of audio and video, there was no need to rely solely on the moderator to get the facts correct. The transfer of knowledge and experience from participant to moderator was very smooth. While the Morae system was a vast improvement on I-SpyNow, it was not without its own faults. The Morae system for statistical purposes was more frustrating. For example, if it was counting mouse clicks, it would not just count the clicks that made the users go to a different page, it would also count clicks if the user was highlighting information. The studies became more qualitative in part because of this frustration.

(5)

Figure 2.3 – Morae Manager

One of three programs included in the Morae system.

Procedure

The procedures of each test will be outlined in general terms here. For a deeper explanation please consult the previous work term report or contact [email protected]. Some common procedures to both studies include the task development, participant recruitment methods, and some of the recording methods. The task development was a team effort, developing specific questions that outline the goals of the website. The most difficult part about task development is asking enough of the participant without leading them in the proper direction. The participant recruitment methods include asking for volunteers in the various news sources on campus, as well as being at key events like HeadStart and Fall Focus days. They were asked to give 20-45 minutes of their time to complete the test and in study A they were given a $10 incentive for completing the test. Before each test the participant was briefed about the study and the goals of the study. After each test the data was repurposed into an appropriate form.

For all studies at least 15 participants were used with varying attributes of the target audience.

Study A Procedures

Procedures that are specific to study A include the analog audio recorder and I- SpyNow software mentioned earlier. On top of that some users, picked at random, were asked to complete the tasks without using the search functions. This allowed us to compare the time differences between using and not using the search and showed how

(6)

much people rely on the search to find them the right answer. The times, mouse clicks and searches were found and then compared using analysis of variance (ANOVA). Based on these results recommendations were made in order to increase the usability. Most recommendations were based on statistical evidence, easily allowing us to classify this as a quantitative study.

Study B Procedures

In study B, the webcam was used in cooperation with the Morae software package to record digital audio and video. The participant was recorded while they navigated through the website, the moderator watched the participant on a separate computer where they were able to make their own notes. These notes included tone of voice and facial expression as well as where they were struggling with the website. The participants were asked to complete the tasks as best they could using the website and based on how they did were given a score out of 2 on each task. A score of 2 implied that they completed the task with little or no problem in a timely manner, a 1 indicated they completed the task eventually with some difficulty or help. A score of 0 indicates that the participant could not complete the task. These subjective scores were recorded in a frequency chart and used in combination with the moderator’s notes to make recommendations and changes to the website being tested. This study is more reliant on the moderator’s opinions and bias and therefore is a qualitative study.

Results

Study A

The statistical evidence was complied after the participants finished their tests. All task times, mouse clicks and searches were taken into consideration to create a number of charts such as Figure 3.1, to demonstrate what was observed. Most of the evidence presented was in numerical form with an interpretation following it. The times were easy enough to see however it is hard to tell what is considered an acceptable time. From all these measurements and interpretations recommendations were formed on the idea of improving all task times. These recommendations varied from vocabulary changes to structural changes of the website.

Search allowed No Search

Overall 53.25 61.57

Question 1 21.76 21.85

Question 2 65.52 92.56

Question 3 42.00 63.00

Question 4 ** 45.95 94.29

Question 5 60.23 51.60

(7)

Question 6 28.37 46.29

Question 7 80.47 63.50

Question 8 ** 54.67 94.50

Question 9 92.20 59.71

Figure 3.1 – Time comparison of Participants

This chart shows the difference in time (recorded in seconds) when search was allowed and when it was restricted. ** indicates there was statistical differences between the sample set

In the follow up study performed early fall 2005, the participants completed the tasks on the updated website using all the same methods and equipment. This study was performed in hopes of seeing dramatic improvements in task times across all tasks. When the study finished and the times were calculated the anticipated statistical difference in time was not on all tasks. However, looking at where people were finding the information and the number of searches used it was easy to see that the changes implement effected the site in a positive way. The participants were finding the information in a more unified and proper place. This led the team to believe that task time should not be the only indication of success. A shorter set of recommendations were made following the follow up study to be implemented and worked upon, these recommendations included modifying the methodology to include the recent discovery.

Study B

Once all volunteers have completed the study the data is tabulated into a frequency chart as described in the methods, similar to figure 3.2 which shows success rates for all tasks that volunteers were asked to complete. These charts dramatically illustrate which goals are achieved and those that are not. Based on the data in the charts and reviewing the participant’s tests the moderator will develop a set of recommendation.

These recommendations can be vague or very detailed depending on what the participants found during the testing. For example during the study many participants pointed out that the wording “Academic Regulations” was confusing the follow up recommendation was to change the title to read “Academic Requirements”.

Once those changes have been completed the process begins anew. Volunteers work through all the tasks in the goals list. The data is tabulated once again.

Improvements should be evident, as shown in figure 3.3. One such iteration may not be enough. As figure 3.3 shows, there was considerable improvement in the number of participants achieving the defined goals but there are still some tasks that need to be improved. So the process is repeated until we achieve a measure of success that satisfies the department.

(8)

Percent of Participants Achieving the Goals

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

SIS The school of music’s website Leddy Concert Event Music therapy information Music therapy courses Music Required courses Information in the Academic Calendar Music Program information Tickets Music requirements for admissions Music therapy website Music therapy internship The table of contents How to research a subject at leddy Contents of the TOC correctly How to order tickets That an audition is required The difference soundstation & gallery

Find Identify

Frequency

Figure 3.2 – The Frequency Chart

Results from the first iteration of the Study B. Note the tasks appear on the x-axis while the success rate appears along the y-axis.

Percent of Participants Achieving the Goals

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

The school of music’s website Music therapy information Music therapy courses Leddy Music Program information Tickets Music requirements for admissions Music Required courses Concert Event Information in the Academic Calendar Music therapy website Music therapy internship The table of contents Contents of the TOC correctly How to research a subject at leddy How to order tickets That an audition is required The difference soundstation & gallery

Find Identify

Frequency

Figure 3.3 – The Follow-Up Frequency Chart

The results from the second iteration of Study B shows a significant improvement in success rates for all the tasks that volunteers were asked to perform.

(9)

Recommendations

Both studies were effective and produced a set of recommendations that can be shown to improve the website. The results are presented in chart form, making them easily readable to programmers and managers alike. In study A, however the results require more interpretation to figure out exactly what is wrong and how wrong it is. In study B it is easy to see what requires the most improvement. Both test produced different results due to different expectations. Managing expectations is an important part to usability testing to know how to test the website properly. The recommendations that follow include proposals for the quantitative studies, qualitative studies and the future of usability testing at the University of Windsor. Each has an area of strength and an area of weakness that can be improved upon.

Quantitative Studies

The use of the qualitative studies at the University of Windsor has been ignored in the past few months due to the new software received. However, these studies still have their merits and can be used in the future. There are certain types of studies that would benefit from the quantitative measurements. These studies include comparison studies of our website to our competitors or an evolutionary study. Comparison studies would take the University’s prospective student page and compare it to other University’s pages to see where we can improve. An evolutionary study would involve taking a participant pool of prospective students or users new to the website and ask them to test the website each week in different ways to examine the learning curve. The task times on similar items would be compared each week to see if the user is not only learning but retaining the knowledge. Doing an evolutionary study would require a lot of patience and a well defined set of questions. The questions each week would have to be different so they can not simply be memorized, but at the same time have to be similar so that they can be compared to the week before. The new Morae software can be incorporated into these studies as it will track times easily as well as any changes in the user’s attitude towards the website. The quantitative studies would be able to measure the efficiency and effectiveness of the website however they alone will not give a complete set of recommendations to improve the usability of the website.

Qualitative Studies

The qualitative studies produce a wealth of knowledge and recommendations to improve the usability of any website. By spending only minutes with just about any user the moderator can see areas that should be improved on the website. These ideas go into a report and the charts are formed based on a scale of three. While these recommendations can be very sound and should be used all of the data is reliant on the moderator’s opinion.

The scale of three is highly subjective and not easily transferable to another moderator.

So this means that every study can only have one moderator and that moderator’s opinion can not change through out the scoring phase. In order to improve this, the scale of three needs to clearly defined so that anyone could pick it up and apply it to their study. This can be done in a number of ways including expanding the scale to a larger number. At each step in the scale different requirements must be met, these requirements can be objective such as completing the task in less than three click, or subjective such as completing the task with very little struggle or commentary. The qualitative studies can

(10)

borrow from the quantitative studies and use task times to determine how well the user completed the task and therefore what mark they should be assigned on that task.

Secondly, tasks should be easily transferred into goals, if a participant completes a task it should be easy to see which goals they have completed as well.

The qualitative studies can and should be used in the future for all sorts of usability testing. They produce effective results and many recommendations. The qualitative studies can be used to easily test a number of websites in very little time.

Many of the university’s existing websites can be reviewed and improved using these methods, including many of the Faculty of Arts and Social Science websites.

Forward Thinking

In future months and years at the University of Windsor there are many departments and faculties that could benefit from usability testing of their website. The first problem that presents itself is order of precedence, which website gets to be tested first and why? When looking for which website to test, it is important to have invested interest from those who will benefit from it. Meaning that if the Web Group decides to test the Music website it is important to have support from the School of Music. The support should be in place to help recruit participants and develop goals for the website in order to help develop a set of tasks. It is also important to have the University’s goals in mind when picking which websites to test. The university will receive many benefits from testing aimed at prospective students and past alumni. Improving the website to meet the needs of these groups will help increase enrollment and perhaps the reputation of the school. Once the websites are chosen and the target groups are addressed, combining methods from both quantitative and qualitative studies the methodology can be improved again and again.

Conclusion

Over the past year at the University of Windsor the Web Group has been learning and evolving the way usability testing is performed. The group has played with a number of different methods and continues to change the way testing can be done. The quantitative studies of past summer worked to fit their purpose as did the qualitative studies of the past winter. Moving towards the future the Web Group must combine the two studies in a manner in order to continue to produce effective recommendations and results from each study. This report demonstrated the differences between the studies, the equipment used and how the studies can be incorporated with each other.

(11)

Appendix

Notes

1. Target Audience – The target audience member is any participant that fits the ideal user of the website or application. When defining the target audience the moderator would describe the user as much as possible.

Bibliography

Chi, Tom and Kevin Cheng. OK/Cancel. http://www.ok-cancel.com

Krug, Steve. Don’t Make me Think: A Common Sense Approach to Web Usability. Indianapolis : New Riders Press, 2000.

Nielsen, Jakob. Usability Engineering. San Francisco, Calif. : Morgan Kaufmann Publishers, 1993.

Neilsen, Jakob. Useit.com: Jakob Neilsen’s Website. http://www.useit.com

Pearrow, Mark. Web Site Usability Handbook. Rockland, Mass.: Charles River Media, 2000.

Poock, Michael C. and Dennis Lefond. “How College-Bound Prospects Perceive University Websites: Findings, implications, and turning browsers into applicants.” C & U Journal Summer 2001: 15-21.

Pula, Katie and Sylvia Smith. Usability Testing Methods: Usability Methodologies and Strategies. McGill, March 15, 2004

Sauro, Jeff. Measuring Usability. http://www.measuringusability.com

Schaffer, Eric. Institutionalization of Usability A Step-By-Step Guide. Boston Mass. : Pearson Educational, 2004.

Spool, Jared, Tara Scanlon, and Carolyn Snyder. Web Site Usability: A Designers Guide. San Francisco : Morgan Kaufmann Publishers, c1999.