5.3 System Design and Materials
5.3.4 The interface
The storyboard3has been developed as web interface using PHP, MySQLi exten- sion, HTML, CSS and JQuery. In order to grant access to my web application
Chapter 5 A. Rossi to the participants recruited on Amazon Mechanical Turk (AMT), I deployed the storyboard using Amazon Web Services (AWS), in particular Elastic Beanstalk to provision and manage the underlying infrastructure and stack components, and Amazon RDS to set up, operate, and scale my relational database in the cloud. Human users recruited on AMT accessed to my web interface hosted on AWS us- ing the URL:http://storyboarduh.eu-west-2.elasticbeanstalk.com/. Figure5.7shows the connection between each component of my system.
5.4
Participants
Responses from 200 participants (115 men, 85 women), aged 18 to 65 years old [avg. age 33.56, std. dev. 9.67] were collected. Participants’ country of residence was: 60% USA; 34% India; 1.5% Venezuela; 1.5% Portugal; 0.5% UK; 0.5% Canada; 0.5% Germany; 0.5% Dominican Republic; 0.5% Sweden; 0.5% Nigeria. The recruitment was carried out by using the crowd sourcing webservice Amazon Mechanical Turk4. I decided to use it because over the last decade, online surveys, questionnaires and experiments have become standard tools to conduct research both in Academia (Sheehan et al.2016) and Industry thanks to the use of webservices, as SurveyMonkey and Amazon Mechanical Turk that increase the efficiency and effectiveness of the data gathering process (Buhrmester et al.2011). These services are not used to replace live human-robot interactions, but provide useful data in the early phases of a research project.
5.5
Results
I asked participants to rate how realistic each of the scenarios were using a seven point rating scale, ranged from 1 to 7 (disagree to agree). Sixty-five percent of participants rated the scenarios as very realistic (rating values > 4), 20% rated the scenarios as not realistic (rating values < 4) and 15% neither agreed nor disagreed (see Figure5.8).
I also asked participants four questions about the content of the scenarios to verify their level of engagement with the scenarios narratives. Correct answers were received for 79.75% (max 92%, min. 71.5%). However, for the question "Which secret did your robot Jace tell you?", 13% of the participants answered with the
Figure 5.7: Overview of the interaction diagram. Participants access to the web application through Amazon Mechanical Turk. The storyboard is provided by setting up a web server and a database.
secret that they themselves had told the robot instead of what Jace told them. I hypothesize that they misunderstood the question.
I analysed the responses of 154 participants and not including participants who failed the engagement test, i.e. those who gave more than one wrong answer thus identified as not paying very much attention to the study, which can be expected in an online survey (Berinsky et al.2011).
All participants were presented with the same final emergency scenario. The participants had to choose between the following options: 1) "I trust Jace to deal with it."; 2) "I do not trust Jace. I will deal with it"; 3) "I want to extinguish it with Jace"; 4) "I will both leave and call the brigade".
The options were carefully chosen as indicators that the participant respectively trusts the robot, does not trust the robot, trusts in collaboratively solving the task or does not trust neither herself nor the robot.
Figure5.9 shows the total percentages of choices made by the participants for the emergency scenario. The results showed that a majority of participants from
Chapter 5 A. Rossi
Figure 5.8: Responses of participants regarding the realism of scenarios and their interaction with the robot.
C1 chose to deal with the emergency situation collaboratively, and a slightly smaller majority chose to trust the robot (as described in Figure 5.1). A big majority of participants from C2 did not trust the robot to deal with the emergency. Similarly majority of participants from C3 and C4 chose either to solve the task collaboratively or to not trust the robot. The majority of participants from C5 preferred to work in collaboration with the robot. In summary, participants chose not to trust the robot when it made severe errors, while they were more inclined to trust in teamwork when the robot made small errors. Moreover, observing the conditions C3 and C4 I notice that while the majority of participants chose either to solve the task collaboratively or to not trust the robot, the number of participants who chose to trust the robot increased in C4. Therefore, I am inclined to think that participants did not trust the robot more when the severe errors were made by the robot at the beginning of the interaction.
I used a Chi-Square Test to evaluate the association between the choices of the participants for the emergency scenario and the experimental conditions. I observed that the association of the choices of the participants for the emergency scenario and the experimental conditions is statistically significant (χ2(12) = 32.91, p = 0.001). The strength of the relationship (Cramer’s V) between the emergency choice and experimental conditions is moderate (φc= 0.26, p = 0.001). I used the adjusted
standardised residuals (called Pearson residuals in Agresti (2002)) to further analyse the differences between the results obtained. The adjusted residuals are the raw
Figure 5.9: Responses of participants from different conditions to the Emergency Scenario. C1: 10 different tasks executed correctly; condition C2: 10 different tasks with 3 severe errors at the beginning and at the end of the interaction; condition C3: 10 different tasks with 3 severe errors at the beginning and 3 trivial errors at end of the interaction; condition C4: 10 different tasks with 3 trivial errors at the beginning and 3 severe errors at the ends of the interaction; and condition C5: 10 different tasks with 3 trivial errors at the beginning and at the end of the interaction.
residuals (or the difference between the observed counts and expected counts) divided by an estimate of the standard error.
Table 5.1shows there is a correlation between the condition C2 and the choice of the participants to not trust the robot (adjusted value > 1.96). I can observe that participants’ trust is affected more severely when the robot made errors with severe consequences. I did not find any significant dependency (p>0.05) between the gender of the participants and their choice in trusting the robot to deal with the emergency. I did not find any statistically significant association for different age ranges of the participants and their emergency choices (p>0.05). Therefore, I assume that these results can be generalised to a generic population independently of gender and age. Moreover, in order to test the association between participants’ emergency choices and their country of residence, I used a Chi-Square Test. Since the majority of the countries of residence were only with one individual, I applied the test only to India and USA. I observed that the association is not statistically
Chapter 5 A. Rossi Table 5.1: The adjusted standardised residuals of the Crosstabulation between the choices made by the participants in the emergency scenario and the different conditions presented to the participants.
Condition Emergency Choice
Do not trust the robot
Trust the robot
Teamwork with the robot
No trust the robot or oneself
Flawless tasks -3.5* 3.5* 1.4 -1.1 Big-Big errors 2.7* 0.4 -2.4* -0.6 Big-Small errors 0.6 -1.6 -0.5 1.7 Small-Big errors 0.8 -1.2 -0.4 0.9 Small-Small errors 0.0 -1.3 1.6 -0.9 significant (χ2(3) = 4.138, p > 0.05).