3.7 Notable Learning Potential Measures Originating from South Africa
3.7.1 The Learning Potential Computerised Adaptive Test (LPCAT)
The LPCAT is an example of a more measurement/psychometric approach to dynamic assessment. The LPCAT is a dynamic and adaptive learning potential test focussed on measuring the general fluid reasoning ability or ‘g’ domain (mentioned earlier and elaborated on in a later section of this thesis). It uses non-verbal figural material, as this is seen to show less bias in multicultural assessment. Verbal content, in particular, often underestimates the cognitive ability of African language test-takers. Responding to such non-verbal figural pattern items requires common reasoning skills such as identification, comparison, and recognition of relations (De Beer, 2010a).
De Beer (2006) pitches the LPCAT as a solution to some of the concerns surrounding learning potential testing, including the time taken to complete the measures, the lack of reliability and validity information, and the lack of standardisation, as specifically mentioned by Grigorenko and Sternberg (as cited by De Beer). Despite the general positivity towards dynamic testing, these factors have been standing in the way of general practical application and usage, as discussed earlier.
The LPCAT was conceptualised based on the practical need in South Africa for instruments that can be group-administered, with the purpose of identifying individuals, over a broad spectrum of ability, who show the potential to benefit most from further training and development. The developers linked dynamic assessment and Computerised Adaptive Testing (CAT) based on item response theory (IRT). At the core of IRT methods are three features, namely that item difficulty and individual ability are measured on the same scale; item characteristics are sample- independent; and individual abilities are item-independent (Embertson; Weiss, as cited by De Beer, 2006). This makes possible a form of CAT in which a unique set of items is selected for each test-taker during test administration, so that items presented to individuals are continually and interactively selected from a bank of available items to match the estimated ability of the individual at that point in time (Weiss, as cited by De Beer). IRT furthermore allows for the accurate measurement of difference scores, and CAT shortens testing time (De Beer). IRT-based analysis was used to perform bias analysis of items (in terms of gender, culture, language, and level of education) with a large (N = 2 450) representative sample (De Beer, 2005). Classical test theory, as well as IRT item analysis, was performed, and items that did not meet the criteria in terms of measurement properties or differential item functioning (DIF) were discarded in the compiling of the final test. The item characteristic curves of different subgroups were compared to determine the extent of DIF (De Beer, 2005).
Two separate, but linked, adaptive tests are used for the pre-test and post-test respectively. The total testing time is nearly one hour. The advantages of computerised adaptive models are that testing time is shortened, and the results are available immediately after completion of the test.
Although the test is computer-based, candidates need only use the space bar and enter key, and as such computer literacy is not a requirement to take the test. The training that is provided between the pre-test and post-test is aimed at explicating the applicable reasoning strategies, by providing more example questions in which the basic principles, building blocks, and general strategies for answering the particular types of questions are provided. Questions are not repeated from pre- to post-test (De Beer, 2010a).
As a dynamic CAT, the LPCAT must be computer-administered to permit for the interactive selection of appropriate items for each individual, depending on the response pattern and the estimated performance level at the time. In CAT, a bank of pre-calibrated items is available for presentation during the testing process. Unlike standard tests, which all individuals who take the test complete the same items in the same sequence, CAT presents a selection of items unique to each individual, while continuously selecting items to be presented based on their difficulty level, matching the individual’s estimated ability level at that point in time. Candidates will also receive different numbers of items with a minimum and maximum number of items pre-set to be used during administration. Test termination is only partially linked to the number of items. It is also linked to the accuracy of measurement, which in turn depends on the psychometric measurement quality of items presented. Entry level to the pre-test is set, thereafter the following steps are repeated until the testing is terminated:
• The first item presented is the item that measures best at the predetermined entry level (the best psychometric quality item available in the bank that has a difficulty level closest to that of the initial level of ability).
• When the respondent answers a question, three things happen:
o If the item is answered correctly, the respondent’s estimated ability level is readjusted upwards – assuming that, since the question aimed at the entry level of ability was answered correctly, the respondent has a higher level of ability. If the item is answered incorrectly, the respondent’s estimated ability level is adjusted downwards, based on a similar assumption as above.
o The characteristics of the item presented are also used to calculate an accuracy index, reflecting the accuracy of the ability estimation at that time. A check is done to determine whether the termination criteria are met – if they are, then the test is terminated. The test terminates as soon as the required level of accuracy or the maximum number of items is reached.
o If the test is not terminated, the next question selected will be the one in the bank that measures most accurately and provides the best information at the current newly estimated ability level.
• When the next item is presented, the process repeats, and does so until the termination criteria are met (De Beer, 2005).
One of the advantages of CAT is that items of appropriate difficulty level are presented throughout testing, thereby not overwhelming or boring participants (De Beer, 2011).
There are two versions of the LPCAT – one with no language text on screen, for which instructions can be read in any of the eleven official South African languages; or one with either English or Afrikaans text on screen. A grade six or seven reading level is required in the language of administration for the text-on-screen versions. The entry level (initial estimated ability level) is lower for the version of the test that has no text, although due to the adaptive process, this does not affect these individuals in terms of their possible final scores. The same introductory practice examples and example items for training are used for both versions (De Beer, 2005).
The results are presented in graph form, and reports can be generated showing the pre-test score, post-test score, difference score, and a composite score (which is a reasoned combination of the pre- and post-test). The levels of performance in both the pre-test and post-test should be noted, as well as the patterns and gradients of the graphs. The estimated ability/performance levels after answering each question are plotted, and these levels, as well as the number of questions answered, can be seen in both the pre- and post-test plots. In the pre-test, between eight and twelve questions are administered adaptively from an item bank of 63 questions, while in the post- test, between 10 and 18 questions are administered adaptively from a separate post-test bank containing 125 questions. The performance level at the end of the pre-test is used as the entry level of the post-test, thereby maximising the accurateness of it (De Beer, 2006; 2013).
Stanine and percentile rankings are also provided for the pre- and post-tests, however these are less useful than the T-scores, which are used for the interpretation of the level of reasoning revealed in the tests in relation to the National Qualifications Framework or academic levels (De Beer, 2005).
Interestingly, the LPCAT compares to early dynamic tests on several dimensions. According to Weiss (as cited by De Beer, 2006) the original dynamic test by Binet and Simon also had a variable entry level, items were scored during administration, results were used for further branching and selection of additional items, and the test also had a variable termination criterion.
Challenges associated with the LPCAT could include limited face validity for individuals at higher educational levels, since its content is unrelated to job performance or training at higher levels; the reliance on computer technology means that the assessment is vulnerable to technical or electricity problems; it does not provide a direct link to a particular career or job level and other information will be needed for career-related guidance and decisions; and ongoing software
updates are required. Future developments for the LPCAT include internet-based administration (De Beer, 2010b).
3.7.1.1 The psychometric properties of the LPCAT
There have been numerous empirical studies on the psychometric properties of the LPCAT during its development and validation, as well as in the time since its release in 2000.
The reliability of CAT’s is not measured in the same way as standard classic tests, due to the individualised test questions (although the scores obtained are on the same scale that measures the latent trait of the domain). McBride (as cited by De Beer, 2013) indicates that adaptive tests can achieve higher reliability compared to conventional tests in the upper and lower extremes of the ability scale, and at the same time reach a given level of precision, using substantially fewer items than standard tests. The IRT equivalent to test score reliability and standard error of measurement of classical test theory is the test information function. This reflects the level of information available at a given ability level, as a result of the number and quality of items available at that level in the item bank. The standard error is a function, which means that it is not a single measure over the entire ability range, but is calculated across various ability levels, based on the amount of information at different ability levels. LPCAT coefficient alpha reliability levels range between .926 and .981 for subgroups based on gender, culture, language, and level of education for the standardisation sample of 2450 grade nine and grade eleven learners (De Beer).
The validity of a test refers to the usefulness of a test for various groups in different contexts. It usually requires evidence of the relationships between the test and some other independent measure which reflects the construct or behaviour of concern. Although DA was previously criticised for its lack of empirical psychometric evidence, this has changed somewhat in the twenty-first century. The construct and predictive validity for the LPCAT is visible in results of samples at different educational levels, from low-literate adults, to tertiary university students - though of all the studies listed by De Beer (2011), only two are with adult groups. These groups were from an industry environment and included members of the designated groups. One in particular stands out, with 194 Black males, where the LPCAT correlated well with Adult Basic Education and Training (ABET) scores and a similar measure (Paper-and-Pencil-Games (PPG)) showing construct and predictive validity (De Beer, 2011).
A study on 52 (mainly White) production employees at a polymers company showed strong relationships between learning potential and English language proficiency. No support was found for the hypotheses relating to the predictive validity of the LPCAT and the English Second Language Proficiency Test when the criteria were training results (Schoeman, De Beer & Visser, 2008). This study also showed that language proficiency is a large issue in the business environment, as English is the business language and, in general, lingua franca, and as such
impacts on learning, training, and performance (Huysamen; Van Eeden; De Beer, & Coetzee; and Van Rooyen, as cited by Schoeman, et al., 2008). Most Black people prefer to receive their education in English (Rossouw, as Schoeman, et al., 2008). According to Van Eeden, De Beer, and Coetzee (2001), English proficiency seemed to influence performance on both the predictor (LPCAT) and criterion measures (first year academic performance) in a study of N=224 grade 12 students.
High school grades have been found to correlate with subsequent academic and work performance. Shochet (as cited by Van Eeden, De Beer, & Coetzee, 2001), argues that grades obtained in a disadvantaged school system cannot accurately reflect academic potential. Van der Merwe and De Beer (2006) does note however that Matric results do correlate with tertiary academic performance, even for disadvantaged students and despite poor schooling standards in South Africa.
Van den Berg (as cited by Van Eeden, De Beer, & Coetzee, 2001), argues that language proficiency is the single most important moderator of test performance as it reflects familiarity with concepts and access to the language medium through which knowledge has to be gained.
The acquisition of second language literacy is furthermore influenced by proficiency in the first language, the motivation to learn the second language, as well as cultural determinants. This is an important consideration in the South African developmental context (Van Eeden, De Beer, & Coetzee, 2001).
The LPCAT is registered with the Health Professions Council of South Africa (HPCSA) and can be used in contexts in which decision-making involves obtaining information related to required future performance or development and training levels (in terms of the NQF level framework). It has shown statistically and practically significant predictive validity for academic results at different levels. The individual’s level of required functioning can be compared with his/her current and potential level of functioning. Smaller improvement scores indicate that the individual is likely to perform at similar levels in the future, as currently. Larger improvement scores indicate that the individual can be expected to perform at higher levels in the future than those currently shown, provided that effective and relevant development opportunities are provided. The LPCAT scores can be used to determine the appropriate level at which assessment can be targeted, with due consideration to actual academic attainment De Beer (2005; 2010a).