3. Research Methodology and Methods
3.12. Research Methods
3.12.4. Measuring Final Team Performance with the
This section presents the assessment instrument which was created to measure each team’s performance. This instrument was built on earlier
research by Amabile (1982, 1983, 1996), who proposes a new methodology for assessing creativity. Amabile argues that meaningful assessments of creativity should be based on subjective ratings from a panel of expert peers.
This approach is referred to as the Consensual Assessment Technique (CAT). In her research, Amabile focuses on the conceptualisation of a tool, which can be used to assess creativity in real-world settings, rather than in experimental settings. The general idea of the CAT is that all assessments of real-world creativity are subjective (Amabile, 1982). Therefore, the CAT assumes that each relevant assessment of creative works should be based on the judgment of recognised expert peers within the same domain from which the creative work originated (Baer & McKool, 2009). Several subjective expert opinions combined, allow the development of a consensual
assessment of the creative work (Amabile, 1982). Baer and McKool (2009) note that each expert should judge the work independently from the other experts. While rating the creative work, they should rely on their expert sense, which is largely based on their individual experiences. When explicit rating scales are provided, the experts should be asked to utilise the full scale to differentiate the various levels of creative work between the artefacts they are judging. In this process, different experts will arrive at different conclusions.
Nonetheless, raters often show reasonable levels of inter-rater reliability (Baer & McKool, 2009), especially if the performed creative task is somewhat standardised (Kaufman et al., 2007) and if the jury consists of impartial objective raters (Petersen & Stevels, 2009).
During framing of the research design, the author also explored other potential assessment approaches of creative ability and personality, e.g. via self-report inventories such as Gough’s Creative Personality Scale (Gough, 1979; Zampetakis, 2010). However, for the purpose of this study, the author chose to focus on measuring the final team performance via the CAT, as this approach provided the opportunity to rely on an external point of reference (i.e. experienced external evaluators) for the team performance assessment.
Due to the fact that the analysed design thinking teams were embedded in real-world industry settings, where their abilities and performance are predominantly evaluated by external stakeholders such as clients or
investors, a CAT approach was deemed the most appropriate way of meaningfully measuring their performance in action.
The team performance evaluation tool, which was used for the following research study, was built on the CAT framework (see Appendix D). It consisted of a one page assessment tool which was provided to several industry professionals at the final public events, where all project teams presented the outcomes of their innovation projects. Each team was given eight minutes to present their concept. After all presentations had concluded, each team gathered around a booth, which they had previously set up. At each booth, additional information for each project was displayed and the team members made themselves available for follow-up discussions. Each team had previously been briefed about the exact procedure and the rating criteria of their final assessment.
In their verbal briefing as well as in the written instructions (see Appendix D), the industry professionals were advised to complete the assessment tool right after each presentation had finished. They were asked to assess all five assessment dimensions quickly and succinctly. They were also made aware that their assessment should be based on their intuition, experience and gut feeling. They were ensured that their ratings would not influence the students’
grades and that they therefore should use the full range of the available scales for each rating dimension. Raters were also instructed not to interact with each other during the presentations.
The assessment consisted of the following five assessment dimensions:
(1) Desirability. Does the presented product or service address
unmet/latent needs of the proposed target group(s)? Would customers buy this product?
(2) Viability. Do the key assumptions of the proposed business model and financial model make sense? Are they realistic?
(3) Feasibility. From a technology point of view, do you think that the product or service can be built by this team? (with/without external help)
(4) Selling & team. How well did the presenter(s) sell the concept to you?
Do you think this team has what it takes to bring the product or service to market?
(5) Investment intent. Imagine you have 10,000 € in your pocket right now. You can put this money in a bank account to collect interest or invest (some of) it in the team. How much would you invest?
The first three dimensions of “desirability”, “viability” and “feasibility” were based on one of the more general definitions of potential outcomes of design thinking activities (see Section 2.2.2). These three categories were meant to assess the quality of the produced artefact, based on key principles of the underlying design thinking theory. The fourth dimension of “selling & team”
was included to provide a measurement of how well the team convinced the audience of their capabilities to successfully bring their proposed product or service to market (Kawasaki, 2015). The fifth category was built on research by Morwitz et al. (2007) as well as Kornish and Ulrich (2012) who have identified purchase intention as a reliable predictor of later sales.
Raters were provided with a continuous scale, ranging from low ( ) to high (☺) for each of the five dimensions (see Appendix D). To indicate their answer, the professionals were asked to mark the continuous scale at the point which reflects their answer. The continuous scales were later converted into numerical rating between “.0” and “10.0” for each category. This answer format was a deliberate choice over a more common Likert-scale format, as it provoked fast assessments based on each professional’s intuition (Baer &
McKool, 2009).
As Kaufman et al. (2007) point out, securing suitable expert judges is a time consuming endeavour. For both performance assessments, minimum requirements for desirable industry experts raters were defined. Invitations for the public presentations were then send out to selected individuals within the network of the SCE. For both assessments, a minimum of seven industry professionals were involved in the CAT performance assessment process.
These included experienced professionals from target industries, current or
former venture capitalists, entrepreneurship professors, experienced design thinking practitioners as well as programme alumni now working in industry.