Background Context - To Game or not to Game – a pilot study on the use of gamification for tea

To Game or not to Game – a pilot study on the use of gamification for team allocation in entrepreneurship

2. Background Context

We have earlier explained the reason for the instructor-selected method over the two other methods of random-assignment and student self-selection. It is important to contextualise this a little further in context of this exploratory design. The instructor-selected method is atypical within Aarhus University and especially so within entrepreneurship education practice.

When the lead author contacted other teaching teams and did an arbitrary survey within the university, he found out that the main method in use was the random or self-selection method.

The lead author also observed that the formation of the self-selected teams led to teams formed of either - male, or all-female teams. Another course (not taught by the lead author), using “constrained” self-selection (i.e. the imposition of a topic of interest as an amalgamation point for individuals with an interest in the topic/problem), also indicated that teams formed over similar interests were low in gender and professional background diversity. Self-selection seems to be the most popular team assignment method in simulations (Decker, 1995).

It is easy, requiring no action on the teacher’s part except when remainders emerge (as discussed later), and it often leads to higher initial group cohesion, expediting group development (Mello, 1993; Strong & Anderson, 1990). Although not all self-selected teams are initially cohesive (some players may simply choose people sitting close by for teammates; see Norris &

Niebuhr 1980), early cohesion apparently gives a group a leg up in performance. Minimal instructor effort and positive team cohesion results clearly make this an “easier” method from both practical and methodological considerations. However, the distinct disadvantage of this method such as selection of

friends, prevalence of remainders and at a cost of reduced diversity or higher homogeneity (also known in theory as in Bacon et al, 2001) were readily observed as described above.

Thus, the instructor-selected method was experimented as an alternative to achieve the diversity gains despite the higher practical/resource costs.

About Virtuoso

Virtuoso, the game used for team allocation in this study is the product of GraviTalent (www.gravitalent.com). Virtuoso is in commercial use as a selection tool in the hiring process (please read details about the game below in the Methods section). By exploring cognitive and other psychological skills, this serious game offers help in assessing how well the applicant would fit the job and the organizational culture. Recently, Virtuoso has also been used to examine the composition of existing work teams. The lead author after meeting the GraviTalent team at their demo stand at a gamification conference and trying out the game came up with the idea to test its use for making teams in his classroom as an alternative approach to the instructor-selection method that he uses. This resulted in collaboration between GraviTalent and the lead author of Aarhus University to explore this new use case.

3. Methods

Participants

The participants were 36 STEM students at Masters Level from Aarhus University who were enrolled for an entrepreneurship course run by the lead author. The participants were mostly Danish students barring one exchange student from Spain.

The participants were divided into two sets based on gender, academic discipline and JTI profiles, such that each set had more or less equal number of different profiles. This was done to remove any biases due to inequality between the sets. One set was allocated to the instructor-selection method (labelled as R-teams, where R=Rajiv, the instructor & lead author) and the other to the game-based method of team formation (labelled as G-teams, where G=GraviTalent, the firm behind the Virtuoso game). The detailed distribution of the two sets is available at http://dx.doi.org/10.6084/m9.figshare.1494666.

Instructor-selection method (for teams R1-R5)

In the present study we focused on the composition of the teams rather than on each individual alone. As the students participated in the same course, we presumed similar level of

In-depth

knowledge and abilities. The focus of the team allocation was the measured skills and other characteristics. The overarching goal of the instructor-selected method that the lead author employs in his teaching practices is of increased diversity from multiple fronts. Thus, the method achieves this mix by combining team-formation literature of strong-ties and weak-ties (Ruef, 2002), gender mixes (Ruef, Aldrich and Carter, 2003), cultural background (Davidsson, 2006), Jung’s theory of personality types - JTI (Budd, 1993) which is similar to the MBTI (Myers’

1962). While an interesting combination in itself and results in high-performing teams (as observed from team engagement and overall performance, subjective though) in a short 7-week course (spring semester, 2015), the mixed use of these methods requires both time and resources. The JTI workshop normally takes place in a 3-hour time span with an additional feedback session on a following day. To be able to identify, to some extent, the students’ knowledge of weak/strong ties the students are also asked to submit a 2 page assignment on “Who they are”

that combines information about their social and professional networks with their skills. Finally, this is coupled to a half-a-days work of assigning mixed teams based on gender, cultural and professional diversity and ties. This then results in an arguably well-balanced team.

The GraviTalent method (for teams G1-G4) The Virtuoso Game

All participants played the Virtuoso game at the start of the course and informed that this was one of many methods used in this course to form diverse teams. They received an invitation for the game which they opened individually from their computers at any time convenient to them. In this game, their goal was to build a structure to reach the target point positioned at the top of the screen (Figure 1.). A training level was used to familiarize the participants to the building of structures and procedures.

After completion of this short training they had twenty minutes to find a solution for the measured challenge. The gameplays were recorded and analyzed.

Based on their gameplay participants were scored along four dimensions. The first dimension (with the two endpoints intuitive and analytical) measured how much one was likely to make extensive plans before acting and how much one preferred a keenly structured and precise solution. The second dimension (with the two endpoints experimenting and focused) showed how much one aspired solely for the goal and tried to avoid unnecessary and potentially wasteful attempts in the process.

The third dimension (with the two endpoints conventional and innovative) depended on how much one preferred original and unconventional solutions over safe and conventional ones.

The fourth dimension (with the two endpoints specialist and generalist) indicated if one’s profile had a few disproportionately dominant features or if it was more well-balanced.

Figure 1. Screenshot of the Virtuouso game

The participants in the game-based group were allocated into teams using the GraviTalent method. The main focus during team allocation was to create teams with maximum potential for effective work. Each team needed students towards the endpoints innovative and experimenting in the first two dimensions for creative input and people towards the analytical end to deliberate the different ideas. Also needed in each team were members who are closer to the focused end on the second dimension and closer to the conventional end on the third dimension to make sure that the project finishes on time and the ones towards the generalist end of the forth dimension who communicate well with everybody and who can fill in the missing roles. The allocation process was based on the algorithmic solution of the stable roommate problem (Irving, 1984). All 16 participants in the GraviTalent group were measured along all four dimensions using an ordinal scale from 1 to 10 (10 being the endpoints analytical, focused, innovative and generalist respectively). The final teams had similar mean values in all dimensions (see Table 1).

In-depth

eLearning

Paper s 43

eLearning Papers • ISSN: 1887-1542 • www.openeducationeuropa.eu/en/elearning_papers n.º 43 • July 2015 Table 1. Descriptives of the four dimensions in the four teams.

Team performance measures Individual

All participants in the course who work in teams and even come up with team solutions at the end of the course are subject to a final individual 20 minute oral examination that measures their understanding of the entrepreneurial process they have experienced and theoretical considerations of their individual and team-performance. The grades are based on individual performance in an oral examination of 20 minutes where the student pitches the idea for 2 minutes and then defends his idea in a Q&A format where the instructor and an external examiner are present. The grades are assigned in a 7 point scale (-3 to 12) according to the Danish Grade Assessment scale.

Team-based

All participants in this course have to present their team idea at the end of the course to an external 4-member jury panel comprised of business and academic experts. These jury members are unaware of the students’ processes and backgrounds and the students are informed about the jury panelists just 3 days before their final pitches in front of the jury. The students are expected to present their cumulative experience of the past 6 weeks of the course and the resultant “product/business idea”

in a 3-minute pitch and 7 minute business model explanation followed by a 5 minute Q&A with the jury. The standardized jury criteria to rate the teams can be accessed here: http://dx.doi.

org/10.6084/m9.figshare.1494670. The business ideas always relate to the student’s background (in this case Science & Tech and IT) and involve the students finding a problem of their own within a defined arena (for example food waste) and then coming up with a solution that they can solve with their skill sets in diverse teams (for example a nano-based food spoilage detector that send you a message on the phone).

Team assessment questionnaire

At the end of the semester, students completed a shortened version of the Team Assessment Questionnaire (Simmering &

Wilcox, 1994). One or two items were selected from the sub-scales Team foundation, Team functioning, Team performance, Team skills, Team climate and atmosphere and Team identity and two items were added about the distribution of work within the team. All questions used a 5-point Likert scale. The survey

instrument can be accessed here: http://dx.doi.org/10.6084/

m9.figshare.1494671

2. Results

Workload & blind trial conditions

All students were informed that they will be grouped based on their individual profiles, background, psychometric profiles and interests. The students were grouped into teams by the 2^nd week of the course. They participated in active team-work in the course for a total of 5 weeks (meeting bi-weekly for 4 hours per teaching session per week). Thus the total in-class interaction time pre-team formation was 16 hours. The total in-class interaction time in teams (excluding the final jury presentation day) was 36 hours. While there was some off-class interaction required, this was not measured and it was dependent upon the teams to find time for tasks as they deemed fit. It was observed that some teams worked more than the others while some others made more effective use of the time and decided to focus most of their activities and meetings within the class-hours owing to official scheduling differences that were a result of them being from two different disciplines (Engineering vs Basic Sciences).

The students were blind to the team allocation intervention experiment and most believed that the only tool used was the JTI and to some extent their discipline and gender mixes. They totally forgot about the game that they played after the first couple of weeks and in their final report (a reflection assignment over the entire process they went through) none acknowledged the use of the GraviTalent game. This could be attributed to the fact that it was just a short 20-30 min game session very early on in the course done individually, thus indicating low impact at a conscious level. This can be an indicator that the subjects were “blind” to the experiment at a conscious level and that the experiment has not been biased by the participants’ knowledge of differences in the team formation.

Grades

One student from the instructor-selection group did not receive a final grade due to a no-show at the exam and was therefore excluded from the analysis. The average of the grades in the GraviTalent group was 10.31 (SD=2.63) and 9.58 (SD=3.37) in the instructor-selection group. As the number of cases was low in both groups, a Mann-Whitney test was conducted. According

In-depth

to this test the final grades of the two groups did not differ significantly (U=141.5, z=-0.38, p=0.7) as shown in Figure 2.

Figure 2. The final grades received by the members of the

GraviTalent and instructor-selection teams. The error bars represent confidence interval.

Jury ranking

The jury selected the three best performing teams and gave out one reward for the best pitch. The first and the second place went to teams allocated by the GraviTalent method (G2 and G1 respectively). The best pitch award was won by the instructor-selection team (R3). The team-survey average response data, the team ranking as rated from survey responses and the jury ranking are shown in Table 2.

Table 2. Average data of the team-work survey instrument (excluding the responses from Qs. 2 and 9 due to their subjective nature) and comparison of rankings from students self-assessment of how well the team worked with jury ranking.

Questionnaire

The questionnaire was completed by 29 students of which 13 belonged the G-teams and 16 belonged to the R-teams. For each team there was a minimum response from at least two students and an overall response rate of 80.5%. For comparing the two groups a Mann-Whitney test was conducted as the number of cases in both groups were low. The detailed

analysed data can be accessed here: http://dx.doi.org/10.6084/

m9.figshare.1494849. Significant differences between the two groups (G-teams Vs. R-teams) were found in the responses to Q1-“Everyone on the team had a clear and vital role” (p=0.023) and Q8 “I was pleased to be in this team” (p=0.021). For the other questions, though the averages indicate some differences, they were not significant enough. A more in-depth team analysis was conducted on the survey responses. The first surprising result is already documented in Table 2, where the cumulative average mean of all responses (excluding Qs. 2 & 9) indicates that the highly positive team-work score ranks the teams according to the student’s self-assessment and that this ranking closely correlates with the jury ranking.

While the above results may suggest that the G-teams have performed relatively better than the R-teams and also cite good team-work experience, it is still too early to call this result definitive in any way. This is further seen when the data is analysed by contrasting individual responses with each of the team-members as shown in Table 3. We focus on 3 of the G-teams that feature in the top-5 ranked teams to further analyse if the differences seen between the two methods are meaningful.

Team G3 had only two respondents which was too low for any meaningful pattern analysis and therefore was not included in this analysis. Table 3 highlights the individual responses of the team-members to four survey questions – Q2 & 9 on the right hand side (green-to-cream color scale) and questions 3 & 8 on the left hand side (yellow-to-red color scale). Qs 2 & 9 were not included in the team work average shown in Table 2 as these are

“subjective type responses”. Qs 3 & 8 were selected to correlate in more objective terms with the parameters being measured in Qs 2& 9.

Table 3. Individual responses of the G-team members to four survey questions – Q2 & 9 on the right hand side (green-to-cream color scale) and questions 3 & 8 on the left hand side (yellow-to-red color scale) correlating the subjective answers to the objective ones.

Green is positive for team work while cream is not. Similarly, red is considered negative for team-work while yellow is positive.

In-depth

eLearning

Paper s 43

eLearning Papers • ISSN: 1887-1542 • www.openeducationeuropa.eu/en/elearning_papers n.º 43 • July 2015 Team G2 that won the external jury 1st rank and also topped

the team-work rank (as shown in Table 2 earlier) indicates coherence and agreement from all team members on shared work-load distribution and good team-work in general. Team G4 by contrast shows a slight disagreement by one of the team-members on equal sharing of the work-load even though there is a general consensus that the team worked well together. The sharpest insight, however, comes from team G1 where there is a clear disagreement by one team member (highlighted in red – red indicating a negative outcome for team work) w.r.t.

work-load sharing, a fact that is supported by two others and negated by one. The same person also does not agree that the team has worked well together nor does he think there was a good team work – despite the team performing well overall.

While statistical measures may be inclined to consider this an outlier, in terms of team-work analysis, this is data to be further explored. A more detailed comparison (also for the R-teams that consist of 3 respondents or more) in similar fashion as shown in Table 3 can be accessed as open-access data here:

http://dx.doi.org/10.6084/m9.figshare.1494857.

3. Discussion

Teams of students, allocated by two different methods, worked together through a semester and created a business project.

The outcome of their work was measured by two means: the students received a final grade and the teams were judged by a jury. The instructor-selected team formation has been the method of choice for the lead author – with historically good results observed both by self-reporting and anonymized student evaluation reports. However, there is a significant resource cost and unconscious bias risk with this method which is why it is not that widely used in addition to an overall recognition of “fairness” that is generally accompanied with a randomly assigned method (Bacon et al., 2001). Thus, exploring alternate methods of team formation gave rise to the opportunity to test an in-market-available game - developed for soft skill assessment in the hiring process – as a tool for team allocation in an educational setting.

The method of allocation itself – individuals play a short game, game generates a profile and the profiles are used to create well-balanced teams can be viewed as inherently “fair” or comparable to the “fairness” criteria of a “random” method.

This stems from – a) the teacher is blind to the student profiles per se and b) Gravitalent gets very basic information about the student. In the instructor-selection method, the instructor

collects a mini-CV, a network base and the students’ JTI profiles coupled to subjective information based on initial interactions with the students. However, students can be sensitive sharing some of this personal information and even though the method is entirely optional, the lead author has seen some students raising a few concerns or declining to share information like, for example, their JTI profiles (which they have the right to). As the students were not aware of the Virtuoso game test – gauged

In document Applied Games and Gamification Drivers for Change (Page 38-43)