The Halpern Critical Thinking Assessment : Towards a Dutch Appraisal of Critical Thinking

(1)

Critical Thinking

Hannie de Bie & Pascal Wilhelm

1

Faculty of Behavioural Sciences, University of Twente, Enschede, The Netherlands

Abstract

When implementing critical thinking learning objectives in education, a valid and reliable instrument for assessing the level of critical thinking skills is needed. This study focuses on the psychometric properties of the Dutch version of the Halpern Critical Thinking Assessment (HCTA). A real-world outcomes inventory (RWO-NL) was developed to measure negative life events. The number of negative life events was hypothesized to be inversely related to critical thinking ability. The HCTA and RWO-NL were administered to university students in communication and psychology (N = 240). Reliability of the HCTA appeared sufficient (α = .75; λ2 = .77) and factor analysis indicated that the use of the constructed response and forced choice format each containing the five critical thinking subscales is an adequate method for assessing critical thinking ability. The total HCTA and weighted RWO-NL scores did not show a significant relationship, r = -.12, ns. Recommendations for improving the Dutch HCTA are discussed.

Keywords: Halpern Critical Thinking Assessment, validity, reliability

(2)

There are many examples of thoughtless actions of people resulting from blindly accepting information or carelessness. One example is the outcome of an investigation of Professor Diederik A. Stapel of Tilburg University, the Netherlands. The press release that indicated that meat eaters are more selfish and less social, was well received, especially by vegetarians. Many people believed this without asking critical questions or consider how the study was designed. Fortunately, some reporters questioned the study and thoroughly checked the methods and conclusions (Hupkens, 2011, September 14) and chose not to publish the research. Later it became known that Stapel’s data was fictitious (TilburgUniversity, 2011) and eventually all Stapel’s publications were examined in search of scientific malpractice.

How important is the skill of critical thinking within society? Quite important, according to the increasing focus in education on ‘21st century skills’ (Ananiadou & Claro, 2009). These skills, including critical thinking, communication, ICT literacy, social and/or cultural skills, creativity, collaboration and problem solving skills (Voogt & Roblin, 2010) are not new in their entirety (Rotherham & Willingham, 2010). However, due to economical and societal change (from an industrial- to a knowledge society) and ICT developments, having success on the labour market often depends on having these skills. This is one of the reasons why these skills have to be taught, beginning early in education (Voogt & Roblin, 2010, 2012).

This study will focus on one of the 21st century skills, namely critical thinking. Despite many conflicting definitions (e.g. Black, 2012; Ennis, 1996; Facione, 1998; Halpern, 1998; Moseley et al., 2005; Sternberg, 1986), Butler (2012) concludes that most researchers agree that critical thinking “involves attempting to achieve a desired outcome by thinking rationally in a goal-oriented fashion” (p. 721). In accordance with this observation, the definition of Halpern (1998) will be used:

The term critical thinking refers to the use of those cognitive skills or strategies that

increase the probability of a desirable outcome. […] Critical thinking is purposeful,

reasoned, and goal-directed. It is the kind of thinking involved in solving problems,

formulating inferences, calculating likelihoods, and making decisions. Critical thinkers

use these skills appropriately, without prompting, and usually with conscious intent in

(3)

If critical thinking becomes an educational objective, appropriate assessment is needed. Many critical thinking assessments have been studied thoroughly (e.g. Black, 2012; Hatcher, 2011; Ku, 2009; Spector, Schneider, Vance, & Hezlett, 2000; Stein & Haynes, 2011). This study focuses on the Halpern Critical Thinking Assessment (HCTA), which presents 25 everyday scenarios. These scenarios are derived from various domains in real life; issues like the lottery, the death penalty, and slimming programs are discussed. In each scenario, respondents are asked to answer an open-ended question to give a first reaction in their own words, which is followed by a forced choice question to choose the best response out of the possible answers. The HCTA is unique when it comes to the combination of these two formats in one assessment. The other advantages of the HCTA relative to other assessments is the use of a computerized grading system and the option for both offline and online administration. In the following, these differences and other characteristics of the HCTA will be discussed.

The HCTA consists of five categories to measure critical thinking skills: (a) verbal reasoning skills (e.g. the ability to detect and defend against ubiquitous or deceptive language usage), (b) argument analysis skills (e.g. the ability to assess the strength of an argument), (c) skills in thinking as hypothesis testing (e.g. the ability to reason scientifically, to determine whether or not the given information confirms the hypotheses), (d) using likelihood and uncertainty (e.g. make use of the correct estimate of a probability), and (e) decision making and problem solving skills (e.g. the ability to define the problem, identify goals and weigh both positive and negative findings) (Halpern, 2012).

(4)

development, and (b) the cognitive factor was measured by SAT scores, grade point average scores, and scores on the Watson-Glaser Critical Thinking Appraisal (WGCTA: Watson and Glaser (1980)). The Ennis-Weir Critical Thinking Essay Test (EWCTET: Ennis and Weir (1985)), which consists out of a constructed response format, had a significant relationship with both factors (Taube, 1997). These components appear to be associated with different response formats, in which the constructed response format exposes more of the dispositional component than the forced choice format does (Ku, 2009). Both response formats are discussed below.

Constructed response relies on free recall, the respondents have to search their memory and select the knowledge to construct an answer (Bridgeman & Morgan, 1996; Butler, 2012; Ku, 2009). A constructed response format takes more cognitive effort, but also reveals more of the dispositional part of critical thinking because of the opportunity for the respondent to display the appropriate skills. The motivational and intentional aspects of critical thinking will become more apparent, because answering open-ended questions shows to what extent the respondent is willing and able to engage into critical thinking at the right moment (Ku, 2009). However, in contrast to the forced choice format, an assessment with a constructed response format is much more time consuming and there is concern about the subjectivity of the scoring. The critical thinking assessments that rely solely on free recall, are for instance the EWCTET and the ICAT Critical Thinking Essay Examination (SonomaStateUniversity, 1996).

(5)

Both response formats have their benefits and limitations, but the HCTA combines these two formats into one assessment (Halpern, 2012). The response formats combined should potentially give a more compatible assessment, since both formats are needed to measure the dispositional and cognitive components of critical thinking. There is a considerable amount of evidence indicating high reliability and validity of the HCTA (Halpern, 2012). Internal consistency (Cronbach’s Alpha) ranges from .85 to .97 and because of the scoring method that guides the grader with scoring the constructed response, the inter-rater reliability is high (Halpern, 2012). Because items are based on constructs that are most frequently mentioned in descriptions of critical thinking, content validity is presumably high (Halpern, 2012). Four small studies mentioned in Halpern (2012) reported correlations between scores on the constructed response items and forced choice items of .39, .49, .42, and .51. This suggests a reasonable relationship between the two formats, and simultaneously gives evidence for the separability of free recall and recognition (Halpern, 2012).

The factor structure of the HCTA is evaluated in more detail in Halpern (2012) with a U.S. norm sample of 450 respondents collected by the test author. The author proposed three models and concluded that a two-factor model (comprising the constructed response and forced choice format) each containing the five categories of critical thinking skills, fits the data best. Hau et al. (2006, April) and Ku (2009) also evaluated the factor structure of the HCTA with a sample of respondents from the U.S. and Hong Kong and concluded that the same model represented the best fit. The constructed- and forced choice format are related, but also have separate characteristics. This supports the combined use in assessing critical thinking (Halpern, 2012; Hau et al., 2006, April; Ku, 2009).

Regarding construct validity, positive correlations (r = .12 - .59) exist with level of education and academic ability tests like the SAT- and grade point average scores (Butler, 2012; Halpern, 2012; Ku, 2009), but the correlation with SAT scores appears higher (r = .50 - .58) (Hau et al., 2006, April; Marin & Halpern, 2011). In addition, scores on need for cognition- and conscientiousness scales are moderately related to critical thinking skills (Clifford, Boufal, & Kurtz, 2004; Halpern, 2012; Ku, 2009; Ku & Ho, 2010; Spector et al., 2000).

(6)

compared to the U.S. norm sample (M[450] = 109.71, SD = 18.23). The relatively constant scores illustrate the quality of the scenarios in the HCTA. Few changes by native translators were needed to make the assessment more culturally fair, perhaps because cultural differences are taken into account when developing the scenarios (Halpern, 2012).

An interesting study on the external validity of the HCTA was conducted by Butler (2012). Instead of the more frequently studied relationship between academic performance and critical thinking, she studied the relationship between real-world outcomes and critical thinking level. The real-world outcomes were measured by an inventory (Decision Outcomes Inventory (DOI)) adapted from Bruine de Bruin, Parker, and Fischhoff (2007), that determines frequencies of negative life events in numerous domains, like finances, education, and relations over the past 6 months. Butler (2012) hypothesized that the respondents who score higher on the HCTA would report fewer negative life events than those who score lower on critical thinking. This hypothesis was based on the notion that critical thinkers transfer and exercise their critical thinking skills in multiple domains of life in order to be successful and avoid negative life events caused by poor decision making. As Halpern (1998) states: “critical thinkers will have more desirable outcomes than ‘noncritical’ thinkers (where ‘desirable’ is defined by the individual, such as making good career choices or wise financial investments)” (p. 450). This prediction was confirmed by Butler (2012), HCTA and DOI scores of 131 respondents showed a modest relationship (r = -.38, p = < .001) in the expected direction. Also, the assumption was made that the critical thinking assessment scores and real-world outcomes would yield similar results for three qualitatively different groups of respondents (community adults, state university students, and community college students). Butler (2012) reported no differences in the number of negative life events between the groups and differences among the groups of respondents in the relationship between the number of life events and the critical thinking assessment scores. The only difference found was the score on the critical thinking assessment; community college students scored significantly lower (M = 92.31, SD = 17.50) than state university students (M = 105.15, SD = 21.48) and community adults (M = 110.42, SD = 20.43).

(7)

evidence for the external validity. If there is a relationship between critical thinking and decision making, critical thinking instruction may also yield benefits beyond the classroom.

The HCTA was translated in Dutch and published by Schuhfried GmbH. However, little psychometric data is available about the Dutch edition. Therefore, this study explored the internal structure and reliability of the Dutch HCTA. Secondly, this study attempted to replicate the findings of Butler (2012), whose primary objective was to determine whether HCTA scores are related to a real-world outcomes inventory of everyday life.

Like Halpern (2012), Hau et al. (2006, April), and Ku (2009), Cronbach's Alpha of the total item set is hypothesized to be .85 or higher and reliability of the subscale items should resemble those of the U.S. norm sample (constructed response format α = .84, and forced choice format α = .79). Additionally, it is expected that the factor analysis will show two related, but separable latent factors for the constructed- and forced choice format, each containing the five critical thinking categories. Finally, like Butler (2012), it is predicted that the correlation between HCTA scores and scores on a real world outcomes inventory shows a modest, but significant negative relationship. This prediction implies that the respondents who score higher on the HCTA will report fewer negative life events than those who score lower on critical thinking. To check for possible ambiguities in the HCTA encountered during administration, respondents indicated at the end of the test if and where they experienced problems in the comprehensiveness of the HCTA. The results of these observations are reported.

Method Participants

(8)

of the concerning studies. The respondents were aged 18 - 32 (M = 20.53, SD = 2.07). Ethnic background was distributed as follows: 46% Dutch, 34% German, and 20% stated another ethnicity or refused to fill out this question.

Materials

Halpern Critical Thinking Assessment. The instrument used to measure critical thinking is the HCTA (Halpern, 2012). A few technical language errors were corrected before it was used, without affecting the meaning of the items (see Appendix A). Test form S1 and version A has been used in this study. The HCTA Form S1 presents 25 everyday scenario's accompanied by questions in two response formats: first a constructed response (open-ended) and then forced choice (e.g., multiple-choice, rating of alternatives or ranking) (See Appendix B for the HCTA used in this study). Form S2 only consists of the forced choice questions, and can be used as a short form. Version A and version B are parallel versions of the HCTA. This enables a repeated measures design, without possible memory bias for the items.

(9)

the principle of regression to the mean and the improbability of an extreme score followed by another extreme score (correct answer: c). The administration of the HCTA (Form S1: constructed response and recognition items, Version A) took 45 to 80 minutes.

The HCTA was provided within the Vienna Test System (VTS: Halpern, 2012), but in this study the HCTA and RWO-NL were administrated online with Thesistools1. With Thesistools, both tests could be administered online so that a large group of respondents could be reached. Scoring was derived from VTS which automatically calculates the forced choice response and guides scoring of the constructed responses. Guided grading uses computerized prompting of the grader. For example, in response to the answer to the constructed response question regarding the example of the score of Ahmed at the end of the semester described above, the grading system for example displays the following question: “Did the respondent recognize the principle of regression to the mean or the improbability of an extreme score followed by another extreme score?” By each question, the grader determines if the respondent’s answer: (a) clearly indicated this, (b) less clearly indicated this, or (3) did not indicate this at all (Halpern, 2012). By scoring in a standardized way with the use of computerized prompting, the concern for scoring bias can be reduced. Halpern (2012) reported a high (r = .83) inter-rater reliability for the constructed response, which indicates that scoring objectivity can be assumed.

RWO-NL. The computerized inventory used to assess real-world outcomes is the RWO-NL, adapted from Bruine de Bruin et al. (2007) and Butler (2012). The items were translated and adapted for Dutch respondents by altering language use within certain expressions, culturally unfamiliar or uncommon terms, and economically different items (Butler et al., 2012). The respondents indicate whether or not they have experienced a particular event in the past six months by selecting a check box with ‘yes’ or ‘no’. The inventory contains items from a wide variety of domains, like finances, education, and relations. There are two types of items possible: items containing sub-questions, and items with no sub-questions. The items with sub-questions always start with an opportunity (e.g. "Gone shopping for food or groceries") that made the (several) negative event(s) possible. These negative events were measured by the sub-questions (e.g. "Threw out food or groceries you had bought, because they went bad"). Therefore this inventory considers the possibility of actually experiencing a negative life event due to a previous decision. Nine items have no

(10)

preceding occurrence (e.g. "Been in a public fight or screaming argument"). See Appendix C for the complete list of 50 presented items.

The total item set was created in three steps. First, 31 items and a total of 31 sub-questions were adopted from the original inventory published by Bruine de Bruin et al. (2007). Of these items, 10 items and 12 sub-questions were modified for the Dutch population. Second, the adjustments made by Butler (2012) to make the items more applicable for university students were included, which resulted in nine additional items and 11 additional sub-questions. Of these items, one item and four sub-questions were modified for the Dutch population. Finally, the addition of 10 items and 19 sub-questions were made to make the inventory more suitable and culturally appropriate for the Dutch population. For example, an original item contained the words "Used checks". This was altered in "Used a debit card", because checks are rarely used in the Netherlands. In this way, the item was made more suitable for the Dutch population, without too much deviation from the original item. Additional items were created which nowadays play a greater role in the life of the Dutch students. For example: (a) "Had a mobile phone", with subsequent negative events: (b) "Lost a mobile phone" and (c) "Had to pay at least three times extra on your phone bills because you went over your call/text/data limit". These additional items all fit in the various domains used for the RWO. The administration of the RWO-NL took 5 to 15 minutes.

(11)

make sense. The opportunity, like driving a car, is necessary to experience a negative life event, like getting a speeding ticket. In these cases, it is assumed that the respondent has forgotten to answer the first question and therefore the nonresponse is changed to "yes". If the nonresponse was a negative life event, then the whole item is excluded from analyses by also excluding the opportunity that could make the negative life event possible from the proportion score.

Procedure

All respondents who participated in the study were presented with an informed consent statement describing the purpose of the study and stating confidentiality. All respondents received the same assessment and conducted this computerized assessment through an online questionnaire. After completion of the test, the outcomes were checked and the respondents received the compensation of 1.5 out of 10 participation points. After the total study was finished, the respondents received a debriefing by email which also included reading suggestions for students who are interested in learning more about critical thinking.

Results

The mean score on the HCTA (n = 240) was 108.23 (SD = 13.91) out of the maximum score of 194. The mean score on the weighted RWO-NL (wRWO-NL) was 0.14 (SD = 0.08). All measures met the criteria for univariate normality (skewness and kurtosis between -1 and 1). Different age groups were compared, including ≤ 19 years and ≥ 20 years, the age boundary was incremented twice, and finally the age groups of ≤ 26 years and ≥ 27 years were compared. There were, as predicted, no differences in HCTA scores based on age or gender (all ps > .05). Also, there were no differences observed in HCTA scores between Dutch and German students, t(189) = 1.52, p = .13, and between Dutch students and remaining students with other ethnicities, t(115) = 0.58, p = .57. It is therefore concluded that ethnicity had no effect on the HCTA scores, thus students with a native language other than Dutch were presumably able to correctly understand the content of the items.

(12)

[image:12.595.62.532.117.350.2]

Table 1

Summary of the Cronbach's Alpha (sub)scale scores of the HCTA

(Sub)scale

Cronbach's Alpha Guttman's Lambda

Constructed response

Forced

choice Total

Constructed

response

Forced

choice Total

Critical Thinking .61 .64 .75 .63 .67 .77

Verbal Reasoning .30 .19 .33 .33 .23 .37

Argument Analysis .24 .22 .38 .30 .26 .42

Thinking as Hypothesis

Testing .29 .38 .53 .32 .42 .55

Likelihood and Uncertainty .31 .18 .39 .33 .20 .42

Decision Making and

Problem Solving .30 .43 .52 .34 .47 .54

Table 2

Model fit statistics of the measurement models in the HCTA sample

Model χ2 df p CFI RSMEA ∆χ2 ∆df p ∆CFI

M1 32.050 29 .318 .990 .021 - - - -

M2 38.556 30 .136 .972 .035 6.506 1 .011 0.018

M3 95.614 30 < .001 .782 .096 63.564 1 < .001 0.208

poor values for Cronbach's Alpha. A better reliability estimation like Guttman's Lambda is endorsed by Sijtsma (2009). This analysis results in a slightly better overall reliability value, λ2 = .77, and subscale values.

[image:12.595.63.548.411.496.2]

(13)

Figure 1. Standardized factor loadings of measurement model 1. CTF = latent factor critical thinking - free recall, CTR = latent factor critical thinking - recognition, VRF = sub-scale score verbal reasoning - free recall, AAF = sub-scale score argument analysis - free recall, HTF = sub-scale score hypotheses testing - free recall, LUF = sub-scale score likelihood and uncertainty = free recall, PSF = sub-scale score decision making and problem solving - free recall, VRR = sub-scale score verbal reasoning - recognition, AAR = sub-scale score argument analysis - recognition, HTR = sub-scale score hypotheses testing - recognition, LUR = sub-scale score likelihood and uncertainty - recognition, PSR = sub-scale score decision making and problem solving - recognition.

there is a relationship between the two formats, and simultaneously gives evidence for the separability of free recall and recognition. The only change in the second model (M2) and the third model (M3) compared to M1, is that the standardized latent correlation of the two latent factors was set to respectively 1 and 0, to test the hypotheses whether both factors have indistinguishable- or completely separated characteristics.

The calculations of the factor structure of the HCTA were carried out with IBM® SPSS® Amos(TM) 22 (Arbuckle, 2013). Maximum likelihood was used to estimate the model parameters, since the data is normally distributed. The following cut-off values were used to evaluate the goodness of fit of the models: non-significant χ2-test, CFI ≥ .95, and RMSEA < .05 (Jackson, Gillaspy Jr, & Purc-Stephenson, 2009; Marsh, Hau, & Wen, 2004).Also, to test whether model M2 and model M3 significantly differ from model M1, criteria of a significant ∆χ2

[image:13.595.118.476.102.324.2]

(14)

The model fit statistics of the three measurements are summarized in Table 2. The statistics in Table 2 show a significantly better fit of model M1, ∆χ2

(1, N = 240) = 6.51, p = .01, which indicates that the latent factors of "Critical Thinking - free recall" and "Critical Thinking - recognition" have a strong relationship (r = .785). However, the factors are separable because M2 (in which the standard latent correlation of the two factors was set to 1) gave a poorer fit than M1.

The standardized factor loadings of the constructed- and multiple choice sub-scales on their associating latent factor were all significant (all ps < .001). The correlated unique errors between the associated sub-scales turned out to be rather small and only reached significance in "Hypothesis Testing" and "Decision Making and Problem Solving". The lack of significance indicates a good fit of the model, since the correlations between the latent variables and the associated subscales are preferred to explain all variance. The structural relations and standardized factor loadings are depicted in Figure 1.

The Halpern Critical Thinking Assessment in relation with the Real World Outcomes inventory

In contrast to what was expected, there was no significant relationship between the total HCTA score and the score on the RWO-NL, r = -.10, ns. Also, the total HCTA score and the weighted score on the RWO-NL did not result in a significant relationship, r = -.12, ns. The item nonresponse was not included in these calculations. Refusing to answer the question when uncomfortable could probably be the result of experiencing the outcome, but it might be too embarrassing to admit in the inventory. To find out how the relationship changes between the HCTA and RWO-NL when this consideration is taken into account, all unanswered questions were replaced by 'yes' (experiencing the outcome). Surprisingly, the relationship between the total HCTA score and the RWO-NL score appeared to be significant, r = -.16, p = .01. Also, the weighted score on the RWO-NL had a somewhat stronger relationship with the HCTA, r = -.17, p = .007. However, these correlations are small. This finding seems to support the hypothesis that a higher score on critical thinking is related to experiencing less negative events in daily life. But these results should be interpreted with caution, because of the alternative calculation of the item nonresponse. See Appendix C for the complete list of the 50 presented item sets and the corresponding response frequencies.

Feedback of the respondents

(15)

syntax and the use of scientific language makes it difficult to comprehend the scenario's and questions of the HCTA. Also, seven respondents commented on the excessive length of the test, some said they spent three hours to complete the assessment. In order to maintain concentration and motivation, the sentences should be kept short and easy to understand. Although few respondents used the opportunity to give feedback about specific items, the comments still can be useful to give an indication of where bottlenecks are. Specific useful comments about ambiguities within certain items of the HCTA were mentioned with regard to the following items: 5, 7, 8, 14, 19, and 21. Each item will be discussed below (see Appendix B The Dutch HCTA for the display of the items).

The following feedback of one respondent about item 5 summarizes the feedback of the other nine respondents. This respondent cites the following sentence derived from scenario 5: "After a year it was found that the average result of the at risk students was .2 higher than at risk student of the previous year." She comments: "This sentence is not clear. Is it about the average result, the number of current students that increased, or the students who have successfully completed the first year? Also, it was not clear to me what .2 exactly means. Is it that the grades increased by 0.2? Or did 20% of the students achieve the first year? Or is the result of the at risk students increased by 20%?"

Seven of the respondents thought scenario 7 was too difficult. They felt they did not have enough knowledge for this question and would like to have had more explanation about terms like "diagnostic category". Another respondent thought the last question was too long and complex.

All four of the respondents are not very clear about why they found scenario 8 difficult to understand. They used terms like "vague" to describe this item.

Item 14 received feedback of five respondents. Four of these respondents even said that the correct answers were not among the forced choice response options. They probably did not understand that the response options included an underlying reasoning, instead they took it literally.

Four respondents did not understand what was meant by "deficiencies" in scenario 19. One respondent posed the question: "Certain shortcomings of food?" This indicates a misinterpretation of the question. Instead, there should be looked at the shortcomings of the reasoning of the news article.

(16)

question of part B, whether the respondents could assess the quality of several problems, was often not properly understood.

Conclusions and Discussion

The results of the present study partly replicate those reported by Halpern (2012). First, the confirmatory factor analysis revealed that the model which reflects two correlated latent factors (the constructed response and forced choice format) each containing the five subscales of the HCTA, best fits the factorial structure of the Dutch HCTA. This supports the findings of Halpern (2012), Hau et al. (2006, April), and Ku (2009) in which the same model fitted the data of a U.S. and Chinese sample. Most important, the analysis confirms that the latent factors of the constructed response and the forced choice format are closely related, yet separate in their properties. The separability could reflect the difference between measuring more of the dispositional component with the constructed response format, and measuring the cognitive component with the forced choice format. This indicates that the use of both response formats in the HCTA is a valid method to obtain an accurate indication of the ability in critical thinking. However, the estimated value of Cronbach's Alpha did not confirm the hypothesis of α ≥ .85, neither did the value of the subscale scores (constructed response format α = .61 in lieu of α = .84, and forced choice format α = .64 in lieu of α = .79). Still, the overall values of Cronbach's Alpha (α = .75) and Guttman's Lambda (λ2 = .77) indicate good reliability of the Dutch HCTA. Taken together, these results not only confirm the quality of the Dutch translation, but also the universality of the two factor model each containing five subscales. This justifies the use of the Dutch version of the HCTA in the Netherlands, despite the fact that respondents still reported ambiguities in the scenarios. Once these issues have been resolved, further research on the Dutch HCTA will probably yield more reliable results.

(17)

questions to lessen this concern during administration of the RWO-NL. When the item nonresponse was replaced with an affirmative answer (that the respondent actually experienced the outcome), a significant negative relationship between the HCTA and the (w)RWO-NL appeared, r = -.17. This relationship is still relatively weak compared to the findings of Butler (2012) and Dwyer et al. (2012), who found correlations of respectively r = -.38 and r = -.28. But that aside, when not answering the question is caused by the embarrassment to admit experiencing the outcome, the hypothesis that a higher score on critical thinking is related to experiencing less negative events in daily life is supported. However, this assumption can not be determined with this study. A closer look at the item nonresponse revealed a significant relationship between HCTA scores and the number of unanswered questions on the RWO-NL in the opposite direction, r(239) = -.20, p = .002. This does not correspond with the proposition that respondents with a higher critical thinking score gave also more social desirable answers and thus caused more item nonresponse on the RWO-NL. It could be that respondents with a lower HCTA score were perhaps less motivated to seriously complete the RWO-NL, and therefore did not answer all questions. Or that respondents with a higher HCTA score are more aware of the anonymity of the RWO-NL, and therefore answered more questions even when they felt embarrassed. Nevertheless, further development and validation of the RWO-NL can possibly give a more reliable tool for measuring negative life events.

(18)

thinking, like in the California Critical Thinking Disposition Inventory (CCTDI) (Facione, 2000), can provide further evidence. The CCTDI uses seven elements of the overall disposition (inquisitiveness, systematicity, analyticity, open-mindedness, maturity of judgment, truth-seeking, self-confidence, and their negative poles) and these elements are measured using 75 Likert style items. Comparing scores on the CCTDI and scores on the constructed response format of the HCTA may shed some light on whether the disposition is revealed in the constructed response. Third, the online administration of the test was unproctored. There are mixed opinions about this method of administration within the cognitive domain; proctored and unproctored conditions may be equivalent (Lievens & Burke, 2011), or the unproctored condition yields higher test scores than the proctored conditions because of the presence of a proctor (Carstairs & Myors, 2009). This effect does not seem to occur in the present study, because no inflation of the unproctored test scores is observed compared with the mean of the U.S. norm sample (M = 109.71, SD = 18.23) (Halpern, 2012). Just the opposite, a lower mean of 108.23 (SD = 13.91) was found. The U.S. norm sample has a wider spread in terms of age (M = 29, SD = 12.53), thus this could mean that these respondents had more years of education. More years of education is related to the level of critical thinking (Butler, 2012), which can explain the lower mean in comparison with the U.S. norm sample. But, as a fourth limitation, the lower mean may also be caused by the translation of the HCTA, where cultural differences could have influenced the test score. The Netherlands are a welfare state, where different opinions about politics, diets, drugs and alcohol could distort the score on these particular items. Respondents may for example react more lenient towards the scenario about alcohol abuse due to a lower age at which alcohol may be consumed in the Netherlands, making alcohol more socially accepted. This mildness may cause Dutch respondents to not easily report the alcohol abuse to an authority figure in that particular scenario, possibly making them to score lower on this item. Another drop in the score on the HCTA could be due to incorrect syntax. Respondents had the possibility to give feedback on the HCTA after completion of the assessment. Based on this feedback, we infer that the complex and incorrect sentence structure and scientific questioning could have affected the comprehensibility of some of the items. We suggest that these issues are addressed before further research is done.

(19)

feedback and a thorough review of the authors. Because our sample of respondents had a smaller spread regarding age and educational level, it is recommended to include a larger and more diverse sample of respondents. The supervision of a proctor could give more control over the behaviour of the respondents during the completion of the tests. In addition, another method for measuring negative life events can be introduced. For instance, an unobtrusive experiment could be developed to observe actual behaviour that indicates good or bad decision making. For example, the respondent is approached by an aggressive seller and the researcher scores whether the respondent uses good or bad decision making. Although this is a very laborious method to collect data, it can circumvent the social desirability issue. Based on these observations, it is recommended to re-examine the relationship between HCTA and RWO-NL scores.

Finally, with a legitimate Dutch critical thinking assessment instrument, it is possible to examine the effectiveness of critical thinking instruction. Gains in critical thinking can be measured by an increase in HCTA scores from pretest to posttest. The repeated measures design requires the use of version B of the HCTA, which in turn should also be subject of validation studies for the Dutch population. However, further validation research is needed. Still, a thoroughly researched HCTA is a promising tool to assess the 21st century skill of critical thinking among Dutch learners.

Acknowledgements

(20)

References

Ananiadou, K., & Claro, M. (2009). 21st century skills and competences for new millennium learners

in OECD countries. OECD Education Working Papers, 41. doi: 10.1787/218525261154

Arbuckle, J. L. (Ed.). (2013). Amos 22 Reference Guide. Crawfordville, FL: Amos Development

Corporation.

Black, B. (2012). An overview of a programme of research to support the assessment of Critical

Thinking. Thinking Skills and Creativity, 7(2), 122-133. doi: 10.1016/j.tsc.2012.04.003

Bridgeman, B., & Morgan, R. (1996). Success in college for students with discrepancies between

performance on multiple-choice and essay tests. Journal of Educational Psychology, 88(2),

333. doi: 10.1037/0022-0663.88.2.333

Bruine de Bruin, W., Parker, A. M., & Fischhoff, B. (2007). Individual differences in adult

decision-making competence. Journal of personality and social psychology, 92(5), 938. doi:

10.1037/0022-3514.92.5.938

Butler, H. A. (2012). Halpern Critical Thinking Assessment Predicts Real-World Outcomes of Critical

Thinking. Applied Cognitive Psychology, 26(5), 721-729. doi: 10.1002/acp.2851

Butler, H. A., Dwyer, C. P., Hogan, M. J., Franco, A., Rivas, S. F., Saiz, C., & Almeida, L. S. (2012).

The Halpern Critical Thinking Assessment and real-world outcomes: Cross-national

applications. Thinking Skills and Creativity, 7(2), 112-121. doi: 10.1016/j.tsc.2012.04.001

Carstairs, J., & Myors, B. (2009). Internet testing: A natural experiment reveals test score inflation on

a high-stakes, unproctored cognitive test. Computers in Human Behavior, 25(3), 738-742. doi:

10.1016/j.chb.2009.01.011

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing

measurement invariance. Structural Equation Modeling, 9(2), 233-255. doi:

10.1207/S15328007SEM0902_5

Clifford, J. S., Boufal, M. M., & Kurtz, J. E. (2004). Personality Traits and Critical Thinking Skills in

College Students Empirical Tests of a Two-Factor Theory. Assessment, 11(2), 169-176. doi:

10.1177/1073191104263250

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal

of applied psychology, 78(1), 98-104. doi: 10.1037/0021-9010.78.1.98

Dwyer, C. P., Hogan, M. J., & Stewart, I. (2012). An evaluation of argument mapping as a method of

enhancing critical thinking performance in e-learning environments. Metacognition and

Learning, 7(3), 1-26. doi: 10.1007/s11409-012-9092-1

Ennis, R. H. (1996). Critical thinking. Upper Saddle River, NJ: Prentice Hall.

Ennis, R. H., Millman, J., & Tomko, T. N. (1985). Cornell Critical Thinking Tests Level X & Level Z:

(21)

Ennis, R. H., & Weir, E. E. (1985). The Ennis-Weir Critical Thinking Essay Test: An Instrument for

Teaching and Testing. Boise, ID: Midwest Publications.

Facione, P. A. (1990). The California Critical Thinking Skills Test: College Level. Millbrae, CA:

California Academic Press.

Facione, P. A. (1998). Critical thinking: What it is and why it counts. Millbrae, CA: California

Academic Press.

Facione, P. A. (2000). The disposition toward critical thinking: Its character, measurement, and

relationship to critical thinking skill. Informal Logic, 20(1), 61-84.

Field, A. (2009). Discovering statistics using SPSS. Thousand Oaks, CA: Sage publications.

Halpern, D. F. (1998). Teaching critical thinking for transfer across domains: Disposition, skills,

structure training, and metacognitive monitoring. American Psychologist, 53(4), 449-455. doi:

10.1037/0003-066X.53.4.449

Halpern, D. F. (2012). Halpern Critical Thinking Assessment: Test Manual. Mödling, Austria:

Schuhfried GmbH.

Hatcher, D. L. (2011). Which test? Whose scores? Comparing standardized critical thinking tests. New

Directions for Institutional Research, 2011(149), 29-39. doi: 10.1002/ir.378

Hau, K. T., Halpern, D. F., Marin-Burkhart, L., Ho, I. T., Ku, K. Y. L., & Chan, N. M. (2006, April).

Chinese and United States students’ critical thinking: Cross-cultural construct validation of a

critical thinking assessment. Paper presented at the Paper presented at the American

Educational Research Association Annual Meeting, San Fransisco, CA.

Hupkens, J. (2011, September 14). Veel media trappen in vals wetenschapsnieuws [Media are misled

by false science news], nrc.next. Retrieved from http://www.nrcnext.nl

Jackson, D. L., Gillaspy Jr, J. A., & Purc-Stephenson, R. (2009). Reporting practices in confirmatory

factor analysis: an overview and some recommendations. Psychological methods, 14(1), 6-23.

doi: 10.1037/a0014694

Kline, P. (2000). The handbook of psychological testing. London: Routledge.

Ku, K. Y. L. (2009). Assessing students’ critical thinking performance: Urging for measurements

using multi-response format. Thinking Skills and Creativity, 4(1), 70-76. doi:

10.1016/j.tsc.2009.02.001

Ku, K. Y. L., & Ho, I. T. (2010). Dispositional factors predicting Chinese students’ critical thinking

performance. Personality and Individual Differences, 48(1), 54-58. doi:

10.1016/j.paid.2009.08.015

Lievens, F., & Burke, E. (2011). Dealing with the threats inherent in unproctored Internet testing of

cognitive ability: Results from a large-scale operational test program. Journal of Occupational

(22)

Marin, L. M., & Halpern, D. F. (2011). Pedagogy for developing critical thinking in adolescents:

Explicit instruction produces greatest gains. Thinking Skills and Creativity, 6(1), 1-13. doi:

10.1016/j.tsc.2010.08.002

Marsh, H. W., Hau, K.-T., & Wen, Z. (2004). In search of golden rules: Comment on

hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu

and Bentler's (1999) findings. Structural Equation Modeling, 11(3), 320-341. doi:

10.1207/s15328007sem1103_2

Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist,

34(4), 207-218.

Moseley, D., Baumfield, V., Elliott, J., Higgins, S., Miller, J., Newton, D. P., & Gregson, M. (2005).

Frameworks for thinking: A handbook for teaching and learning. Cambridge, UK: Cambridge

University Press.

Rotherham, A. J., & Willingham, D. T. (2010). 21st century skills the challenges ahead. Educational

Leadership, 67(1), 16-21.

Sijtsma, K. (2009). On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha.

Psychometrika, 74(1), 107-120. doi: 10.1007/s11336-008-9101-0

SonomaStateUniversity. (1996). ICAT Critical Thinking Essay Test. Rohnert Park, CA: Sonoma State

University.

Spector, P. E., Schneider, J. R., Vance, C. A., & Hezlett, S. A. (2000). The relation of cognitive ability

and personality traits to assessment center performance. Journal of Applied Social Psychology,

30(7), 1474-1491. doi: 10.1111/j.1559-1816.2000.tb02531.x

Stein, B., & Haynes, A. (2011). Engaging faculty in the assessment and improvement of students'

critical thinking using the critical thinking assessment test. Change: the magazine of higher

learning, 43(2), 44-49. doi: 10.1080/00091383.2011.550254

Sternberg, R. J. (1986). Critical Thinking: Its Nature, Measurement, and Improvement. Washington,

DC: National Inst. of Education.

Taube, K. T. (1997). Critical thinking ability and disposition as factors of performance on a written

critical thinking test. The Journal of General Education, 46(2), 129-164.

TilburgUniversity. (2011). Interim Report Regarding the Breach of Scientific Integrity Committed by

Prof. D. A. Stapel. Retrieved from

http://www.tilburguniversity.edu/nl/nieuws-en-agenda/commissie-levelt/interim-report.pdf.

Voogt, J., & Roblin, N. P. (2010). 21st Century Skills: Discussion paper. University of Twente.

Retrieved from http://onderzoek.kennisnet.nl/onderzoeken-totaal/21stecentury

Voogt, J., & Roblin, N. P. (2012). A comparative analysis of international frameworks for 21st century

competences: Implications for national curriculum policies. Journal of Curriculum Studies,

(23)

Watson, G., & Glaser, E. M. (1980). Manual for the Watson Glaser critical thinking appraisal.

(24)

Appendix A

Corrections of the Dutch HCTA before administration

Location Incorrect text Corrected text

Instructie We willen graag begrijpen hoe je denkt over complexe dagdagelijkse situaties.

We willen graag begrijpen hoe je denkt over complexe dagelijkse situaties.

Probleem 1: Deel B Je baserend op deze informatie, welk van onderstaande stellingen is het meest plausibel?

Je baserend op deze informatie, welk van onderstaande stellingen is het meest aannemelijk?

Probleem 1: Deel B Schoolresultaten zullen waarschijnlijk verbeteren als we adolescenten verhinderen van te roken, ... (zo ook in optie 2 en 4)

Schoolresultaten zullen waarschijnlijk verbeteren als we adolescenten verhinderen te roken, ...

Situatie en probleem 4 Beide programma's kosten evenveel. Beide programma's kosten evenveel geld.

Probleem 4: Deel B Welk percentage van de deelnemers weegt binnen het jaar terug evenveel als zijn begingewicht?

Welk percentage van de deelnemers weegt binnen een jaar weer evenveel als zijn of haar begingewicht?

Situatie en probleem 5 "Zoals men kan afleiden uit de

verhoging in gemiddeld resultaat bij de studenten, was dit programma was een gigantisch succes."

"Zoals men kan afleiden uit de

verhoging in gemiddeld resultaat bij de studenten, was dit programma een gigantisch succes."

Situatie en probleem 6 Als leerlingen in je tekenlessen dezelfde tekeningen maken als ze zouden gemaakt hebben als ze thuis waren gebleven of niet begeleid werden, ...

Als leerlingen in je tekenlessen dezelfde tekeningen maken als ze gemaakt zouden hebben als ze thuis waren gebleven of niet begeleid werden, ...

Probleem 6: Deel B Leerkrachten zitten vaak reeds voor het schooljaar ten einde is doorheen hun materiaalvoorraad voor tekenlessen.

(25)

Appendix A (continued)

Probleem 7: Deel B Het gebruik van deze term suggereert dat het slachtoffer van mishandeling er in zekere zin zelf voor

verantwoordelijk is dat hij mishandeld wordt.

Het gebruik van deze term suggereert dat het slachtoffer van mishandeling er in zekere zin zelf voor verantwoordelijk is dat zij mishandeld wordt.

Situatie 8: Deel A Beoordeel de redenering van de minister-president over deze kwestie, gebruik maken van een 7-puntenschaal waarin.

Beoordeel de redenering van de minister-president over deze kwestie, gebruik makend van een 7-puntenschaal.

Probleem 8: Deel B Welke veronderstelt de baas wanneer hij deze analogie maakt?

Welke veronderstelling maakt de minister-president wanneer hij deze analogie maakt?

Probleem 9: Deel B Als de ouders erin slagen in hun opzet en hun voorstel wordt een nieuwe regel in de schoolgemeenschap, wat is dan waarschijnlijk het grootste probleem waarmee ze zullen geconfronteerd worden?

Als de ouders slagen in hun opzet en hun voorstel wordt een nieuwe regel in de schoolgemeenschap, wat is dan waarschijnlijk het grootste probleem waar ze mee geconfronteerd zullen worden?

Probleem 9: Deel B Sommige ouders zijn nalatig en leren hun kinderen niet van vriendelijk te zijn tegen anderen.

Sommige ouders zijn nalatig en leren hun kinderen niet vriendelijk te zijn tegen anderen.

Situatie en probleem 10 Een politicus werd gevraagd om zijn standpunt uit te leggen over het wetsvoorstel dat voorziet dat de staat propere naalden zou geven aan drugsverslaafden, om de verspreiding van ziekten als aids tegen te gaan. Hij antwoordde dat hij zich verzet tegen een ‘propere naalden’-programma omdat dit verkeerd is.

(26)

Probleem 10: Deel B Hij heeft niet duidelijk gemaakt of hij voor of tegen een ‘propere naalden’-programma is.

Hij heeft niet duidelijk gemaakt of hij voor of tegen een ‘schone naalden’-programma is.

Situatie en probleem 12 Natuurlijk is het geen goede keuze als je schrik hebt voor wiskunde of graag buiten werkt.

Natuurlijk is het geen goede keuze als je angst hebt voor wiskunde of graag buiten werkt.

Probleem 12: Deel B Computerwetenschappen is geen goede keuze als je schrik hebt voor wiskunde

Computerwetenschappen is geen goede keuze als je angst hebt voor wiskunde

Situatie 14: Deel A Beschrijf het redeneerwijze van de Immigratiedienst.

Beschrijf de redeneerwijze van de Immigratiedienst.

Situatie en probleem 16 Ann Marie, een Amerikaanse vrouw, wilt naar Hollywood verhuizen ...

Ann Marie, een Amerikaanse vrouw, wil naar Hollywood verhuizen ...

Probleem 16: Deel B De kans dat ze om het even welke at random geselecteerde vrouw een succesvolle actrice zal worden

De kans dat een willekeurig

geselecteerde vrouw een succesvolle actrice zal worden

Probleem 18: Deel B Welke van de volgende stellingen over de kans dat om het even welke zes nummers de winnende getallen van de Lotto zijn, is waar?

Welke van de volgende stellingen over de winkans van een getallenreeks van de Lotto, zijn waar?

Probleem 21: Deel B De vriendin zou van school kunnen gestuurd worden als ze zich zo vaak blijft bezatten.

De vriendin zou van school gestuurd kunnen worden als ze zich zo vaak blijft bezatten.

Situatie en probleem 23 Je maakt een toets in de les natuurkunde en je stoot op een probleem waarvoor je geen antwoord kan bedenken.

(27)

Probleem 23: Deel B Schrijf in een gemene nota aan de leerkracht omdat hij zo'n moeilijk probleem gebruikt.

Schrijf een brief aan de leerkracht waarin je aangeeft dat hij te moeilijke vragen of sommen gebruikt.

Probleem 24: Deel B Laat de pil op de grond liggen en kijk of de hond hem opeet.

(28)

Appendix B The Dutch HCTA

(29)

Instructie

We willen graag begrijpen hoe je denkt over complexe dagelijkse situaties. Alle vragen starten met een korte situatieschets. Nadat je de situatieschets hebt gelezen, krijg je hierover verschillende vragen. Bij sommige vragen moet je zelf een kort antwoord formuleren. Bij andere vragen moet je tussen een aantal alternatieven kiezen.

Hier is een voorbeeldvraag waar je zelf een kort antwoord moet formuleren.

Voorbeeld van een situatieschets:

Na afloop van een televisiedebat over de doodstraf, werden de kijkers aangemoedigd om naar de website van de zender te surfen en online te stemmen of ze "voor" of "tegen" de doodstraf zijn. Binnen het eerste uur "stemden" bijna 1000 mensen op de website, waarbij ongeveer de helft "voor" en de helft "tegen" de doodstraf stemde. Het nieuwsanker van deze zender maakte de resultaten de volgende dag bekend. Hij concludeerde dat de inwoners van het land evenredig verdeeld waren over het al dan niet toepassen van de doodstraf.

Hier is een voorbeeldvraag waar je zelf een kort antwoord moet formuleren:

Ben je het op basis van deze gegevens eens met de conclusie van het nieuwsanker?

ja nee

Geef twee suggesties om deze studie te verbeteren: Eerste suggestie:

Tweede suggestie:

Het antwoord dat bij deze voorbeeldvraag verwacht wordt:

Ben je op basis van deze gegevens eens met de conclusie van het nieuwsanker?

ja nee

Geef twee suggesties om deze studie te verbeteren:

Eerste suggestie: Ik zou een steekproef proberen samen te stellen die meer representatief is voor het land - niet alleen mensen die het internet kunnen gebruiken om vragen te beantwoorden.

(30)

Instructie

Nu krijg je een voorbeeldvraag waar je moet kiezen tussen de aangeboden alternatieven. Deze voorbeeldvraag gaat over dezelfde situatieschets beschreven op de vorige pagina. Tracht eerst zelf een antwoord op de vraag te formuleren.

Denk op basis van de informatie in de situatieschets na over elk van de volgende alternatieven en beslis of het alternatief waar of waarschijnlijk waar is. Kruis alle stellingen aan die juist of wellicht juist zijn. Laat de andere blanco.

Veel mensen gingen kort na het einde van de show achter hun computer zitten om te "stemmen".

Ongeveer de helft van alle vrouwen en de helft van alle mannen zijn voorstander van de doodstraf.

Zowel het standpunt pro als het standpunt contra de doodstraf waren even overtuigend in het debat.

Mensen die naar deze show gekeken hebben en vervolgens gestemd hebben op hun computer zijn mogelijk niet representatief voor alle mensen in dit land.

Mensen die gestemd hebben, voelen zich waarschijnlijk sterker betrokken bij deze kwestie (in positieve of negatieve zin) dan degenen die niet gestemd hebben.

Het antwoord dat bij deze voorbeeldvraag verwacht wordt:

Veel mensen gingen kort na het einde van de show achter hun computer zitten om te "stemmen".

Ongeveer de helft van alle vrouwen en de helft van alle mannen zijn voorstander van de doodstraf.

Zowel het standpunt pro als het standpunt contra de doodstraf waren even overtuigend in het debat.

Mensen die naar deze show gekeken hebben en vervolgens gestemd hebben op hun computer zijn mogelijk niet representatief voor alle mensen in dit land.

Mensen die gestemd hebben, voelen zich waarschijnlijk sterker betrokken bij deze kwestie (in positieve of negatieve zin) dan degenen die niet gestemd hebben.

Dit is niet juist.Verklaring: We weten dat voor elke positie telkens de helft van alle mensen gestemd heeft, maar we weten niets over de proportie vrouwen of mannen die voor elke positie gestemd hebben. Dus kunnen we niet concluderen dat deze stelling waar of waarschijnlijk waar is.

(31)

Instructie

Nu weet je wat je verondersteld wordt te doen. Het is de bedoeling dat alle vragen duidelijk zijn. Er zijn geen strikvragen.

Denk eraan om je redenering zo kort en duidelijk mogelijk weer te geven. Gelieve niet vooruit te bladeren voor alle vragen zijn beantwoord in het belang van het onderzoek!

(32)

Situatie 1: Deel A

In een tijdschrift voor ouders en leerkrachten verscheen recent een rapport dat duidelijk maakte dat adolescenten die sigaretten roken ook geneigd zijn lage resultaten te halen op school. Naarmate het aantal sigaretten dat per dag gerookt werd toenam, daalde het gemiddelde schoolresultaat. Een suggestie die in dit rapport werd gemaakt, was dat we de schoolprestaties zouden kunnen verbeteren door te verhinderen dat adolescenten roken.

Je baserend op deze informatie, zou je dit idee ondersteunen als een manier om de schoolprestaties van rokende adolescenten, te verbeteren?

ja nee

Leg a.u.b. uit waarom wel of waarom niet.

(33)

Probleem 1: Deel B - Vraag 1 van 1

In een tijdschrift voor ouders en leerkrachten verscheen recent een rapport dat duidelijk maakte dat adolescenten die sigaretten roken ook geneigd zijn lage resultaten te halen op school. Naarmate het aantal sigaretten dat per dag gerookt werd toenam, daalde het gemiddelde schoolresultaat. Een suggestie die in dit rapport werd gemaakt, was dat we de schoolprestaties zouden kunnen verbeteren door te verhinderen dat adolescenten roken.

Je baserend op deze informatie, welk van onderstaande stellingen is het meest aannemelijk?

Schoolresultaten zullen waarschijnlijk verbeteren als we adolescenten verhinderen te roken, omdat de onderzoekers vaststelden dat als het roken toeneemt, de resultaten dalen.

Schoolresultaten zouden kunnen verbeteren als we adolescenten verhinderen te roken, maar we kunnen hier niet zeker van zijn omdat we alleen weten dat de resultaten verslechteren wanneer het roken toeneemt, maar niet wat er gebeurt wanneer het roken vermindert.

Je kan niet weten of de schoolresultaten zullen verbeteren als we adolescenten verhinderen te roken, want we weten alleen dat er een verband is tussen roken en schoolresultaten, niet of roken een verandering in schoolresultaten veroorzaakt.

(34)

Een gerenommeerde krant uit je buurt publiceert verschillende verhalen over criminelen die gruwelijke misdrijven gepleegd hebben nadat ze door de Commissie voor Voorwaardelijke Invrijheidstelling vervroegd werden vrijgelaten uit de gevangenis. Eén boze lezer wil dat alle leden van de Commissie ontslagen worden omwille van de slechte beslissingen die ze nemen.

Stel dat jij zou moeten oordelen over het ontslag van de commissieleden, op welke twee vragen zou je dan een antwoord willen zien om een gefundeerde beslissing te kunnen nemen?

Eerste vraag:

(35)

Een gerenommeerde krant uit je buurt publiceert verschillende verhalen over criminelen die gruwelijke misdrijven gepleegd hebben nadat ze door de Commissie voor Voorwaardelijke Invrijheidstelling vervroegd werden vrijgelaten uit de gevangenis. Eén boze lezer wil dat alle leden van de Commissie ontslagen worden omwille van de slechte beslissingen die ze nemen.

Hieronder staan enkele vragen die je misschien zou willen stellen om een gefundeerde beslissing te kunnen nemen. Duid voor elke vraag aan hoe belangrijk ze is voor het nemen van je beslissing.

Welk percentage van degenen die vrijgelaten worden, pleegt nooit een ander ernstig misdrijf?

helemaal niet belangrijk van zeer weinig belang een klein beetje belangrijk van middelmatig

belang belangrijk

heel belangrijk

extreem belangrijk

Bestaat de Commissie uit links- of rechtsdenkende leden?

belang belangrijk

heel belangrijk

extreem belangrijk

Heeft iemand van de Commissieleden familieleden die in de gevangenis zitten?

belang belangrijk

heel belangrijk

extreem belangrijk

Welk percentage van vrijgelaten gevangenen die nooit een ander ernstig misdrijf plegen, kennen ze in landen die vergelijkbaar zijn met het onze?

belang belangrijk

heel belangrijk

(36)

Vervolg...

Welk soort informatie gebruikt de Commissie om haar beslissingen te nemen?

belang belangrijk

heel belangrijk

extreem belangrijk

Zijn de leden van de Commissie politiek benoemd?

belang belangrijk

heel belangrijk

extreem belangrijk

Heeft iemand van de Commissieleden familieleden die ooit in de gevangenis hebben gezeten?

belang belangrijk

heel belangrijk

(37)

Een kruidenierswinkel is onlangs een gigantische reclamecampagne gestart om haar imago van eerder dure winkel te veranderen in dat van een winkel met lage prijzen. Televisie-, kranten- en radioadvertenties overspoelden de buurt, met als slogan: "De Voedselwereld, kampioen in lage prijzen!". Een maand nadat de reclamecampagne van start ging, werd een studie uitgevoerd in de lokale gemeenschap, waarbij aan de klanten die buitenkwamen bij De Voedselwereld het volgende werd gevraagd: "Welke winkel heeft volgens u de laagste prijzen?" Het onderzoek toonde aan dat meer dan 60% van de respondenten De Voedselwereld als antwoord gaven. De PR-verantwoordelijke rapporteerde overtuigd aan de algemeen directeur van de firma dat de campagne erin geslaagd was om de perceptie die de lokale gemeenschap heeft van De Voedselwereld, te veranderen van "dure supermarkt" in "kampioen in lage prijzen".

Als jij de algemeen directeur was van De Voedselwereld, welke twee veranderingen zou jij dan aanbrengen in deze studie om te bepalen of de reclamecampagne heeft gewerkt?

Eerste verandering:

(38)

Een kruidenierswinkel is onlangs een gigantische reclamecampagne gestart om haar imago van eerder dure winkel te veranderen in dat van een winkel met lage prijzen. Televisie-, kranten- en radioadvertenties overspoelden de buurt, met als slogan: "De Voedselwereld, kampioen in lage prijzen!". Een maand nadat de reclamecampagne van start ging, werd een studie uitgevoerd in de lokale gemeenschap, waarbij aan de klanten die buitenkwamen bij De Voedselwereld het volgende werd gevraagd: "Welke winkel heeft volgens u de laagste prijzen?" Het onderzoek toonde aan dat meer dan 60% van de respondenten De Voedselwereld als antwoord gaven. De PR-verantwoordelijke rapporteerde overtuigd aan de algemeen directeur van de firma dat de campagne erin geslaagd was om de perceptie die de lokale gemeenschap heeft van De Voedselwereld, te veranderen van "dure supermarkt" in "kampioen in lage prijzen".

Lees elk van de volgende voorstellen. Duid alle voorstellen aan die de studie volgens jou zouden verbeterd hebben. Laat de andere voorstellen blanco.

Achterhaal het percentage van mensen uit de buurt die hun inkopen doen in supermarkten.

Vraag respondenten of ze de

advertenties gehoord of gezien hebben.

Vraag de respondenten of ze liever televisie kijken, de krant lezen of naar de radio luisteren.

Bevraag de buurt om te bepalen

hoeveel mensen de voorkeur geven aan merkproducten.

Vraag de klanten of ze graag hun inkopen doen bij De Voedselwereld.

Bevraag de klanten voor de advertenties plaatsvinden en bevraag ze nadien opnieuw.

Bevraag de klanten alvorens ze de winkel binnengaan en niet wanneer ze hem verlaten.

Bevraag ook de klanten die hun inkopen in andere supermarkten doen.

Bel willekeurig gekozen mensen uit de buurt op en vraag hen naar de

(39)

Je probeert te beslissen welk van twee afslankingsprogramma's het beste is om een zwaarlijvige vriend van je definitief van zijn overgewicht af te helpen. Je beschikt over de brochures van twee programma's die allebei een goede naam hebben. Het eerste programma beweert dat het gemiddelde gewichtsverlies 12 kilo is. Het tweede programma beweert dat het gemiddelde gewichtsverlies 15 kilo is. Beide programma's kosten evenveel geld.

Welke twee vragen zou je aan de vertegenwoordigers van elk programma stellen om te bepalen welk programma je zal adviseren aan je vriend?

Eerste vraag:

(40)

Je probeert te beslissen welk van twee afslankingsprogramma's het beste is om een zwaarlijvige vriend van je definitief van zijn overgewicht af te helpen. Je beschikt over de brochures van twee programma's die allebei een goede naam hebben. Het eerste programma beweert dat het gemiddelde gewichtsverlies 12 kilo is. Het tweede programma beweert dat het gemiddelde gewichtsverlies 15 kilo is. Beide programma's kosten evenveel geld.

Geef elk van de volgende vragen een score volgens hoe bruikbaar de informatie zou zijn voor je beslissing.

Hoeveel mensen volgen uw programma?

belang belangrijk

heel belangrijk

extreem belangrijk

Maakt u reclame voor uw programma in de lokale media?

belang belangrijk

heel belangrijk

extreem belangrijk

Wordt het programma onderschreven door een filmster of model?

belang belangrijk

heel belangrijk

extreem belangrijk

Wat is het gemiddelde gewicht van de deelnemers voor en na het programma?

belang belangrijk

heel belangrijk

(41)

Vervolg...

Welke soort training of opleiding krijgt het begeleidend personeel?

belang belangrijk

heel belangrijk

extreem belangrijk

Hoeveel deelnemers stoppen met het programma voor ze het voltooid hebben?

belang belangrijk

heel belangrijk

extreem belangrijk

Welk percentage van de deelnemers weegt binnen een jaar weer evenveel als zijn of haar begingewicht? helemaal niet belangrijk van zeer weinig belang een klein beetje belangrijk van middelmatig

belang belangrijk

heel belangrijk

(42)

Een grote universiteit in de VS heeft het moeilijk om studenten uit een bepaald deel van de bevolking aan te trekken en te behouden. Er werd een 'Ga voor de onderscheiding!'-programma ontworpen om het gemiddelde resultaat van deze risicostudenten te verhogen zodat er meer van hen een diploma zouden behalen. Een groot 'Ga voor de onderscheiding'-symbool werd uitgestald bij de dienst waar deze studenten geholpen worden. De studenten kregen ook elk trimester een nieuwsbrief toegezonden vol studietips, verhalen over succesvolle studenten en een groot 'Ga voor de onderscheiding'-logo. Na één jaar werd vastgesteld dat het gemiddelde resultaat van de risicostudenten .2 hoger was dan dat van de studenten die het jaar voordien als risicostudenten geïdentificeerd waren. De directeur van het 'Ga voor de onderscheiding'-programma zei: "Zoals men kan afleiden uit de verhoging in gemiddeld resultaat bij de studenten, was dit programma een gigantisch succes."

Welke informatie uit de situatiebeschrijving ondersteunt het sterkst de bewering van de directeur?