A Moveable Test: Finding feature appropriate methods for mobile user testing

(1)

Daniel Jackson

ENGL 5181 Dr. Greg Wickliff May 7, 2014

A Moveable Test: Finding feature appropriate methods for mobile user testing

In January 2014, mobile devices accounted for 55% of all Internet usage in the United States. And for the first time ever, Americans used mobile applications to access the Internet more often (47%) than browsers on a desktop PC (45%), according to the Pew Research Center’s Internet & American Life Project. In all, 90% of

Americans have cell phones and 63% use a cell phone to go online, Pew reported. Worldwide 4.55 billion people have cellphones and almost half of them (2.23 billion) are expected to go online using a mobile device this year. By 2017, eMarketer expects 70% of the world’s population will have a cell phone, a third will have

smartphones and 40 percent will access the Internet from a mobile device. By then, they expect that 90 percent of all Internet users will do so via a mobile device.

Methods for testing usability and user experience on personal computers are pretty much settled like the device itself on a desktop. User testing of mobile devices, on the other hand (and sometimes both hands), must be more dynamic with methods that vary depending on the features being tested and the normal context in which they would be useful.

In my review of scientific and professional literature spanning almost a decade, including six peer-reviewed papers and a recent magazine article by a user-experience (UX) professional on the subject of mobile user testing, I found a

(2)

steady growth and evolution of academic research struggling to keep pace with the rapid adoption and advancement of the technology itself.

By the time some of these papers are written, drafted, reviewed and finally published, the devices and applications studied are outdated. The main body of text in our very enlightening textbook for ENGL 5181, “Observing the User Experience” published in 2012, assumes user testing on a conventional desktop. The book

addresses mobile user testing in a short sidebar (300-302), but given the rapid growth in mobile technology, mobile research will soon dominate the discussion.

Even before the explosion of smartphones and mobile applications, HCI (human computer interaction) designers and developers knew that traditional laboratory user testing would not be sufficient for mobile devices and apps. They had to find ways to adapt traditional methods to the field, where observation is more complicated and the environment of use is uncontrolled, possibly in motion.

After decades of user testing mobile devices and apps, there is no single, universally-accepted, one-size-fits-all method for user testing all aspects of mobile technology. Researchers have improvised a variety of strategies and tools – some simple and some very sophisticated – to factor context of use into their data with field tests, laboratory simulations and even attempting to anticipate user concerns with a list of expert criteria applied without costly user testing.

But the right method for user testing mobile technology almost always

depends on what is being tested and what we hope to learn from the tests. For some applications, traditional laboratory testing might accurately reflect the user

experience and detect all usability issues, but for a truly mobile experience, the user test cannot be confined to a desktop in a lab.

(3)

The following bibliographic essay is my best attempt to find, select and review scientific literature on the subject of user testing mobile technology. I sincerely hope that this sample is an accurate reflection of research in this area over the past

decade, but I don’t offer this paper as a complete overview. It’s a brief review of my introduction to some of the research in this field.

1. Filling in the gaps

User testing mobile applications presents several challenges related to the unique features of mobile devices, including “limited bandwidth, unavailability of wireless networks as well as changing context (environmental factors),” Zhang and Adipat wrote in their 2005 article, “Challenges, Methodologies and Issues in the User Testing of Mobile Applications.” (293)

Traditional approaches to HCI research don’t always apply to mobile applications. Ideally, user testing of mobile apps would cover all possibilities in a mobile environment, but in practice, it’s difficult to predict every variable. Will users be sitting, standing or on the go while using the application? Will they be indoors or outdoors in direct sunlight? Will their environment be quiet and secluded or noisy and distracting? Will they have a strong Internet connection or not? (294)

A usability test is an evaluation of how well users use specific software, eliciting the users feedback and measuring their performance to identify problems. (294) Zhang and Adipat summarized the challenges of user testing mobile

(4)

● Mobile context: All characteristics of the interaction between users,

applications and the surrounding environment like location, nearby people, objects or various other distractions.

● Connectivity: The speed and reliability of the wireless network connection and its effects on data transfer speeds, download times and the quality of streaming media.

● Small screen size: The physical constraints of mobile devices, especially the smaller screen size renders traditional webpages “aesthetically unpleasant, un-navigable and in the worst case, completely illegible.”

● Different display resolutions: Mobile devices, especially pre-2005, had much lower display resolution often degrading the quality of multimedia

presentations.

● Limited processing capability and power: Mobile devices of the time had far less computational power, processing speed, graphic support and memory than desktop models.

● Data entry methods: Typing and other input can be more difficult on smaller devices with smaller buttons, reducing the speed of input and increasing errors, especially when users aren’t sitting still. (295-296)

Lab tests and field studies both have pros and cons. Measuring specific usability attributes and interpreting results is easier in the lab because other

variables can be controlled, but lab experiments often ignore the mobile context like unreliable wireless connectivity. Comparisons of interface design or data input mechanisms work well in the lab. By contrast, field studies better capture user behavior, attitudes and an authentic user experience in a real mobile environment,

(5)

but the researcher has limited control of that environment, which often makes observation and data collection more difficult. In either case, real mobile devices should be used in user testing whenever possible. (298-301)

Observing users in the field and recording video and audio of a field test, especially a longer study, was and is more complicated. For this reason, data collection methods in the field might include voice-mail diaries, regular meetings, e-mail reports and daily questionnaires, where the user is responsible for reporting data and the quality of that data.

Depending on the objectives of the test, lab experiments or field studies or a hybrid of the two might be the most appropriate methodology for measuring

particular usability attributes like learnability, efficiency, memorability, user errors, user satisfaction, effectiveness, simplicity, comprehensibility, learning performance, user-perceived usefulness and system adaptability. (298-302)

2. A test of two applications

Presenting the results of two case studies involving usability testing of mobile applications for Nokia, Kangas and Kinnunen claim that providing a “real usage context” is the most important aspect of testing in a user-centered design (UCD) process. “For mobile phones, this means users need to be able to touch the buttons and see software that feels like it is actually working,” Kangas and Kinnunen wrote in their 2005 article, “Applying User Centered Design to Mobile Application

Development.” (57)

These projects were the first UCD projects in the company, so they

(6)

organizational limitations like stunted time lines, small budgets and sometimes organizational resistance to an unfamiliar process.

Pilot implementation of the Genimap Navigator, a global positioning system, was underway before the user experience (UX) group joined the project. They held a quick design workshop and tested a paper prototype with co-workers. After the pilot product was developed they field tested with 20 participants who provided feedback in diaries and interviews. At the same time, the UX group conducted a study of mobile messaging among professionals and teenagers and one resulting concept was an integrated image or multi-media message editor later named ImagePlus. They tested a paper prototype and later two early versions of the product.

The paper prototypes are inexpensive and sufficient to verify usability before actual implementation of applications with simple interaction, but for more

sophisticated interaction like map zooming or image manipulation, a more realistic prototype is better. (58) User testing of ImagePlus took place in a meeting room, which worked because user tasks weren’t related to a specific user context, but traditional lab tests weren’t very useful in tests of Genimap Navigator because simulating the mobile context was impossible. The diary method was useful, but future usability projects dependent on mobile context would benefit from some direct observation of users in the field. To do that, video cameras used in the lab would need to be replaced with something more portable like camera phones, Kangas and Kinnunen suggested. (59)

Usability testing is often expensive and for smaller companies or customers with small budgets, the UX group has used paper prototypes and interview sessions to test the design. (58)

(7)

3. An expert framework

User testing is indeed expensive and time consuming, which is probably why Heo, Ham, Park, Song and Yoon developed a checklist of criteria to assist experts in evaluating the usability of mobile technology and predict likely problems without testing actual users in actual use situations. In their 2009 article, “A framework for evaluating the usability of mobile phones based on multi-level, hierarchal model of usability factors, the author’s admit that their evaluation method has limitations, “particularly in terms of context of use,” but this analytical approach is also popular in the mobile phone industry because “it’s quick and does not require actual users.” (264)

The analytical framework is based on the basic functions of a mobile phone of that time in contrast to smartphones The authors predict that the boundaries of different mobile devices will become ambiguous and disappear, but this framework can be applied to any mobile device, they claim. (264)

Using 136 usability problems reported in earlier usability tests conducted at one mobile phone company in Korea, the authors created four checklists – one task-based evaluation and three task-independent evaluations of various user

interface (UI) properties like operational efficiency, screen layout, the physical shape of the mobile phone, fonts and icons. With the checklists, experts can evaluate

mobile phone properties and quantify the results, scoring devices by Likert scale on five “usability indicators,” including visual support of task goals, support of

(8)

support of efficient interaction. Then they compile those scores into an overall rating. Finally usability issues are identified and addressed. (270-273)

Task-independent evaluations can be reasonably evaluated based only on expert knowledge and experience on user interfaces without actually conducting a task, the authors suggested. Because these usability evaluations require special knowledge and skills, they tested the framework with eight usability practitioners working for mobile phone companies instead of students. Participants used the framework to compare picture taking between a phone manufactured by their own company and a competitor’s phone. All said that the framework could be a useful tool, sharing opinions and ideas for improvement. Despite its limitations in terms of mobile context, the task-based checklist helped practitioners consider task contexts, “though it could not be complete.” (273-274)

4. Logging user events

Also concerned with the high cost and limitations of laboratory testing, Ma, Yan, Chen, Zhang, Huang, Drury and Wang developed a way to automatically collect and analyze user events with a toolkit embedded into the code of Android mobile devices. They evaluated the toolkit with real users using an application on a real Android device, while simultaneously conducting a conventional usability test in the lab and then compared the data. (81)

In their 2012 article, “Design and Implementation of a Toolkit for Usability Testing of Mobile Apps,” authors embedded code in Android devices that capture events like clicks, menu selections and transitions as users navigate an application. These events are automatically stored on the phone and then uploaded to a central

(9)

server for analysis. The data is analyzed for usability with a “sequence comparison technique.” The developer maps expert sequences for tasks consisting of baseline states with one action or step between each state. When actual users deviate from the expert sequence of steps, hit mistaken states or backtrack, that could be evidence of a usability problem. (85, 87-88)

In user tests conducted with a dozen undergraduate and graduate students using an Android application, authors collected usability data with conventional laboratory methods and compared it simultaneously to a “quantified state-machine based analysis using the collected events.” The event-logging method effectively identified all critical usability issues and even found one major problem that observers overlooked in the lab test. The event-logging also detected subtle actions that observers missed like repeated clicking when the application was loading slowly. (89, 94-95)

But the event-logging method did not capture cosmetic issues and found fewer issues related to the Android device itself. Event-logging can point to the location of a problem, but it can’t explain the cause of the problem. This method lacks the

information gleaned from listening to participants think out loud and observing non-verbal signals like facial expressions. At the very least, the event logging method could compliment traditional lab tests and could be especially useful collecting data in the field, where observation gets more complicated, the authors concluded. (94-95)

(10)

Researchers used event logging to conduct a four-month, holistic study of smartphones in 2007 and 2008 that demonstrated the importance of long-term usability analysis in a real mobile context because the usage is highly mobile and also evolves over time. (1417)

The field study conducted with 14 teenagers in a Houston, TX, compiled data over four months, using limited in-device event logging to protect the participants’ privacy as well as focus group meetings every three weeks and interviews. Authors Rahmati and Zhong found that users assessed the usability and usefulness of

features throughout the process and their usage changed as their assessments evolved. Factors affecting their evolving usage included: initial opinion, knowledge and skills, context dependence, boredom and personalization. Assessing features that require the user to learn new skills or features that are valuable only in certain mobile contexts can take time. The initial novelty of some features may wear off with prolonged use and become boring to the user. And over time, users will personalize the phone, its features and settings. (1419, 1423)

For instance, it took participants more than a month to embrace the hardware keyboard on the phone, citing it’s accuracy. And participants rarely used a mobile word processor at the beginning of the study, but as their typing skills improved, they started to use it for typing homework when they didn’t have access to a

personal computer. Participants also preferred instant messaging at the beginning of the study, but as they became frustrated with inadequate wireless signals, they favored text messaging. They also loved playing games on the phone, but after awhile they grew bored of those games and stopped playing. (1424)

(11)

Participants used the smartphones in “a highly mobile fashion even within the same area, and at different locations,” the study concluded. And the long-term study allowed researchers to interpret usage evolution. The mobile context and prolonged usage patterns are difficult to simulate in a lab. (1426)

6. An elaborate simulation

Some researchers have created dynamic simulations to replicate real world contexts and mobility in the lab – asking participants to walk on a treadmill or projecting computer-based driving simulations to replicate distractions. (2)

In a comparison of field studies to lab experiments, Sun and May attempted to simulate in a lab the experience of using a sporting information application on a smartphone while watching a soccer match in a stadium. They projected

prerecorded sports footage on the front wall of the lab and hung large posters of a crowd scene on both side walls. They also conducted a similar usability study at an actual soccer match in a real stadium; then analyzed and compared the data. (5)

In their 2013 article, “A Comparison of Field-Based and Lab-Based Experiments to Evaluate User Experience on Personalised Mobile Devices,” Sun and May

concluded that they discovered about the same number of usability problems in both experiments and the lab simulation even identified some of the same context-related issues like the font being too small to read on a bright day in an open stadium. (7)

Lab tests identified more problems related to details of the interface design like the color of icons, while field tests tended to focus more on problems of actual use like the accuracy of sports information presented in the application. The atmosphere of the sports stadium enhanced user experience and participants felt

(12)

more relaxed and free to express themselves in the field study, but they were also more distracted by the sporting action. Lab experiments eliminated some these distractions and variables like weather and crowd noise that complicate data collection in a real stadium. (7)

So, Sun and May concluded that well-designed lab experiments should be valid when the test objectives are focused on the user interface and the

device-oriented usability issues, not to mention easier, faster and cheaper to conduct. But field studies capture a wider range of factors affecting mobile services, system functions, usage contexts and user experience. When open and relaxed

communication is important, a field study could be more effective. (8)

7. Practical tips from a practitioner

In some of the preceding academic studies, data was collected as much as six years before the findings were published in an article. So I reviewed one extra article from UX Magazine, by practitioner Tania Lang titled, “Eight Lessons in Mobile

Usability Testing.” Lang summarized eight lessons based on her professional experience user testing mobile applications, browsers and devices:

1. Paper mock-ups are a valuable tool to avoid costly fixes after the actual device has been developed. To simulate scrolling on a paper prototype, she uses a cardboard template of a smartphone with two slits at the top and bottom. A long strip of paper can be inserted through the slits and moved up and down to simulate scrolling.

2. Screen recorders are an unobtrusive way to capture screen activity during a usability test, but they can’t record a user’s hand gestures, facial expressions

(13)

or verbal comments; might not work with other applications and software; and usually can’t be used on a participant’s own device.

3. Nearly all UX practitioners are using homemade mobile test sleds that allow them to mount a small camera over the mobile device and sometimes mount a second camera to capture facial expressions. The sled should be light and easy to hold, should accommodate different devices, easily switch orientation and remain stable during testing.

4. Cameras are getting cheaper, lighter and more compact with high definition resolution, and they’re often compatible with most systems and usability software.

5. Recording technology can be intrusive and should not be used in some cases. For instance, Lang said that was the case when she user tested a quit smoking application with a new mother while she was breastfeeding a 5-month-old on her sofa.

6. Real mobile context may not be important to all testing. An application used mainly in the morning and evening hours at home, sitting idle on the sofa, might be tested on a sofa in the lab without missing any usability problems. However, in testing a journey planner for public transportation, real mobile context was critical, she said.

7. Mobile context can sometimes be simulated successfully in the lab. Lane said she has created a “home-like environment,” in the lab. In another case, she stopped a test and asked users to imagine that their signal dropped suddenly. 8. Using a participant’s own mobile device can reveal issues that might be

(14)

Mobile usability is no longer an area of expertise for only a few specialized UX professionals, Lane said. It is the new standard and “we need to start designing and testing everything for smartphones and tablets as well as computers,” she said. (6)

Conclusion

Traditional laboratory user testing can be adapted to mobile devices and applications, but to observe the true mobile experience in a real mobile context, devices must be tested with users in the field. Smartphones and tablets have larger screens with higher resolution, and far more computational power and memory than cell phones 10 years ago. Wireless networks are more reliable and WiFi hot-spots more plentiful, but mobile usage still involves many contextual variables that can’t always be simulated in a lab.

Collecting data in the field can be complicated, difficult, expensive and time consuming. Fortunately, as usability tools and technology become more

sophisticated, affordable, light-weight, portable and more compatible, observing users and collecting data in the field is becoming easier. But sometimes it’s not necessary – like when the study is analyzing interface design or the context of use isn’t on the go. The appropriate methodology – whether it’s applied in the field or the lab or both – depends on the user, the mobile application, the most likely context of use and the objective of the study. User testing continues to improve the usability of mobile technology. As the adoption of mobile devices rapidly grows worldwide,

(15)

practitioners and researchers should further refine strategies to improve the quality of data collection, analysis and the user experience.

References

1. Goodman, Elizabeth; Kuniavsky, Mike; Moed, Andrea; Observing the User

Experience: A Practitioner’s Guide to User Research, 2012, pages 300 - 302.

2. O’Toole, James, “Mobile apps overtake PC Internet usage in US,” CNN Money, http://money.cnn.com/2014/02/28/technology/mobile/mobile-apps-internet/, Feb. 28, 2014.

3. “Mobile Technology Fact Sheet,” Pew Research Internet Project, January 2014, http://www.pewinternet.org/fact-sheets/mobile-technology-fact-sheet/.

4. “Smartphone Users Worldwide Will Total 1.75 billion in 2014,” eMarketer, http://www.emarketer.com/Article/Smartphone-Users-Worldwide-Will-Total-17 5-Billion-2014/1010536; Jan. 16, 2014.

5. Zhang, Dongsong; Boonlit, Adipat; “Challenges, Methodologies, and Issues in the Usability Testing of Mobile Applications,” International Journal of

Human-Computer Interaction, 18(3), 293 - 308, 2005.

6. Kangas, Eeva; Kinnunen, Timo; “Applying User-Centered Design to Mobile Application Development,” Communications of the ACM, Vol. 48, No. 7, July 2005, 55-59.

(16)

7. Heo, Jeongyun; Ham, Dong-Han; Park, Sanghyun; Song, Chiwon; Yoon, Wan Chul; “A framework for evaluating the usability of mobile phones based on multi-level, hierarchical model of usability factors,” Interacting with

Computers No. 21, 2009, 263 - 275.

8. Xiaoxiao, Ma; Yan, Bo; Chen, Guanling; Zhang, Chunhui; Huang, Ke; Drury, Jill; Wang, Linzhang; “Design and Implementation of a Toolkit for Usability Testing of Mobile Apps,” Mobile Newt Appl, 2013, 18:81 - 97.

9. Rahmati, Ahmad; Zhong, Lin; “Studying Smartphone Usage: Lessons from a Four-Month Field Study, IEEE Transactions on Mobile Computing, Vol. 12, No 7, July 2013, 1417 -1427.

10. Sun, Xu; May, Andrew; “A Comparison of Field-Based and Lab-Based Experiments to Evaluate User Experience of Personalized Mobile Devices,” Advances in Human-Computer Interaction, Vol. 2013, Jan. 12, 2013, 1 - 10. 11. Lang, Tania; “Eight Lessons in Mobile Usability Testing,” UX Magazine, Article

No. 998, https://uxmag.com/articles/eight-lessons-in-mobile-usability-testing, April 10, 2013.