CONSIDERATIONS IN THE DEVELOPMENT AND USE OF PERSONNEL SCREENING AND TESTING METHODS

Any type of measurement instrument used in industrial/organizational psychol- ogy, including those used in employee screening and selection, must meet certain measurement standards. Two critically important concepts in measurement (that were introduced in Chapter 2) are reliability and validity. Reliability refers to the stability of a measure over time or the consistency of the measure. For example, if we administer a test to a job applicant, we would expect to get essentially the same score on the test if it is taken at two different points of time (and the applicant did not do anything to improve test performance in between). Reliability also refers to the agreement between two or more assessments made of the same event or behavior, such as when two interviewers independently evaluate the appropriateness of a job candidate for a particular position. In other words, a measurement process is said to possess “reliability” if we can “rely” on the scores or measurements to be stable, consistent, and free of random error.

A variety of methods are used for estimating the reliability of a screening instrument. One method is called test–retest reliability. Here, a particular test or other measurement instrument is administered to the same individual at two different times, usually involving a one- to two-week interval between testing sessions. Scores on the first test are then correlated with those on the second test. If the correlation is high (a correlation coefficient approaching +1.0), evidence of reliability (at least stability over time) is empirically established. Of course, the assumption is made that nothing has happened during the adminis- tration of the two tests that would cause the scores to change drastically.

reliability

the consistency of a measurement instrument or its stability over time

test–retest reliability a method of determining the stability of a measurement instrument by administering the same measure to the same people at two different times and then correlating the scores

A second method of estimating the reliability of an employment screening measure is the parallel forms method. Here two equivalent tests are constructed, each of which presumably measures the same construct but using different items or questions. Test-takers are administered both forms of the instrument. Reliability is empirically established if the correlation between the two scores is high. Of course, the major drawbacks to this method are the time and difficulty involved in creating two equivalent tests.

Another way to estimate the reliability of a test instrument is by estimating

its internal consistency. If a test is reliable, each item should measure the same

general construct, and thus performance on one item should be consistent with performance on all other items. Two specific methods are used to determine internal consistency. The first is to divide the test items into two equal parts and correlate the summed score on the first half of the items with that on the second half. This is referred to as split-half reliability. A second method, which involves numerous calculations (and which is more commonly used), is to determine the average intercorrelation among all items of the test. The result- ing coefficient, referred to as Cronbach’s alpha, is an estimate of the test’s inter- nal consistency. In summary, reliability refers to whether we can “depend” on a set of measurements to be stable and consistent, and several types of empirical evidence (e.g., test–retest, equivalent forms, and internal consistency) reflect different aspects of this stability.

Validity refers to the accuracy of inferences or projections we draw from

measurements. Validity refers to whether a set of measurements allows accurate inferences or projections about “something else.” That “something else” can be a job applicant’s standing on some characteristic or ability, it can be future job success, or it can be whether an employee is meeting performance standards. In the context of employee screening, the term validity most often refers to whether scores on a particular test or screening procedure accurately project future job performance. For example, in employee screening, validity refers to whether a score on an employment test, a judgment made from a hiring interview, or a conclusion drawn from the review of information from a job application does indeed lead to a representative evaluation of an applicant’s qualifications for a job, and whether the specific measure (e.g., test, interview judgment) leads to accurate inferences about the applicant’s criterion status (which is usually, but not always, job performance). Validity refers to the quality of specific inferences or projections; therefore, validity for a specific measurement process (e.g., a specific employment test) can vary depending on what criterion is being predicted. Therefore, an employment test might be a valid predictor of job performance, but not a valid predictor of another criterion such as rate of absenteeism.

Similar to our discussion of reliability, validity is a unitary concept, but there are three important facets of, or types of evidence for, determining the validity of a predictor used in employee selection (see Binning & Barrett, 1989; Schultz, Riggs, & Kottke, 1999). A predictor can be said to yield valid inferences about future performance based on a careful scrutiny of its content. This is referred to

as content validity. Content validity refers to whether a predictor measurement

process (e.g., test items or interview questions) adequately sample important job behaviors and elements involved in performing a job. Typically, content

parallel forms a method of establishing the reliability of a measurement instrument by correlating scores on two different but equivalent versions of the same instrument

internal consistency a common method of establishing a measurement instrument’s reliability by examining how the various items of the instrument intercorrelate

validity

a concept referring to the accuracy of a measurement instrument and its ability to make accurate inferences about a criterion

content validity the ability of the items in a measurement instrument to measure adequately the various characteristics needed to perform a job

Employee Screening and Assessment 103 validity is established by having experts such as job incumbents or supervisors

judge the appropriateness of the test items, taking into account information from the job analysis (Hughes & Prien, 1989). Ideally, the experts should determine that the test does indeed sample the job content in a representative way. It is common for organizations constructing their own screening tests for specific jobs to rely heavily on this content-based evidence of validity. As you can guess, content validity is closely linked to job analysis.

A second type of validity evidence is called construct validity, which refers to whether a predictor test, such as a pencil-and-paper test of mechanical ability used to screen school bus mechanics, actually measures what it is supposed to measure—(a) the abstract construct of “mechanical ability” and (b) whether these measurements yield accurate predictions of job performance. Think of it this way: most applicants to college take a predictor test of “scholastic aptitude,” such as the SAT (Scholastic Aptitude Test). Construct validity of the SAT deals with whether this test does indeed measure a person’s aptitude for school- work, and whether it allows accurate inferences about future academic success. (Students taking the SAT may agree or disagree with how accurately the SAT measures their personal scholastic aptitude—likely related to their scores on the test.) There are two common forms of empirical evidence about construct validity. Well-validated instruments such as the SAT, and standardized employment tests, have established construct validity by demonstrating that these tests correlate positively with the results of other tests of the same construct. This is referred to as convergent validity. In other words, a test of mechanical ability should correlate (converge) with another, different test of mechanical ability. In addition, a pencil-and-paper test of mechanical ability should correlate with a performance-based test of mechanical ability. In establishing a test’s construct validity, researchers are also concerned with divergent, or discriminant, validity— the test should not correlate with tests or measures of constructs that are totally unrelated to mechanical ability. Similarly to content validity, credible judgments about a test’s construct validity require sound professional judgments about patterns of convergent and discriminant validity

Criterion-related validity is a third type of validity evidence and is empirically

demonstrated by the relationship between test scores and some measurable criterion of job success, such as a measure of work output or quality. There are two common ways that predictor–criterion correlations can be empirically gen- erated. The first is the follow-up method (often referred to as predictive validity). Here, the screening test is administered to applicants without interpreting the scores and without using them to select among applicants. Once the applicants become employees, criterion measures such as job performance assessments are collected. If the test instrument is valid, the test scores should correlate with the criterion measure. Once there is evidence of the predictive validity of the instrument, test scores are used to select the applicants for jobs. The obvious advantage of the predictive validity method is that it demonstrates how scores on the screening instrument actually relate to future job performance. The major drawback to this approach is the time that it takes to establish validity. During this validation period, applicants are tested, but are not hired based on their test scores.

construct validity refers to whether an employment test measures what it is supposed to measure criterion-related validity the accuracy of a measurement instrument in determining the relationship between scores on the instrument and some criterion of job success

In the second approach, known as the present-employee method (also termed

concurrent validity), the test is given to current employees, and their scores are

correlated with some criterion of their current performance. Again, a relationship between test scores and criterion scores supports the measure’s validity. Once there is evidence of concurrent validity, a comparison of applicants’ test scores with the incumbents’ scores is possible. Although the concurrent validity method leads to a quicker estimate of validity, it may not be as accurate an assessment of criterion-related validity as the predictive method, because the job incumbents represent a select group, and their test performance is likely to be high, with a restricted range of scores. In other words, there are no test scores for the “poor” job performers, such as workers who were fired or quit their jobs, or applicants who were not chosen for jobs. Interestingly, available research suggests that the estimates of validity derived from both methods are generally comparable (Barrett, Phillips, & Alexander, 1981).

All predictors used in employee selection, whether they are evaluations of application materials, employment tests, or judgments made in hiring inter- views, must be reliable and valid. Standardized and commercially available psychological tests have typically demonstrated evidence of reliability and validity for use in certain circumstances. However, even with widely used standardized tests, it is critical that their ability to predict job success be established for the particular positions in question and for the specific criterion. It is especially necessary to assure the reliability and validity of nonstandardized screening methods, such as a weighted application form or a test constructed for a specific job.

TYPES OF EMPLOYEE SCREENING TESTS

The majority of employee screening and selection instruments are standardized tests that have been subjected to research aimed at demonstrating their validity and reliability. Most also contain information to ensure that they are administered, scored, and interpreted in a uniform manner. The alternative to the use of standardized tests is for the organization to construct a test for a particular job or class of jobs, and conduct its own studies of the test’s reliability and validity. However, because this is a costly and time-consuming procedure, most employers use standardized screening tests. While many of these tests are published in the research literature, there has been quite a bit of growth in consulting organizations that assist companies in testing and screening. These organizations employ I/O psychologists to create screening tests and other assessments that are proprietary and used in their consulting work. More and more, companies are outsourcing their personnel testing work to these consulting firms.

Test formats

Test formats, or the ways in which tests are administered, can vary greatly. Several distinctions are important when categorizing employment tests.

Individual versus group tests—Individual tests are administered to only one person at a time. In individual tests, the test administrator is usually more involved than in group tests. Typically, tests that require some kind

Stop & Review

What are three facets of validation that are important for employee screening tests?

Employee Screening and Assessment 105 of sophisticated apparatus, such as a driving simulator, or tests that re-

quire constant supervision are administered individually, as are certain intelligence and personality tests. Group tests are designed to be given simultaneously to more than one person, with the administrator usually serving as only a test monitor. The obvious advantage to group tests is the reduced cost for administrator time. More and more, tests of all types are being administered online, so the distinction between individual and group testing are becoming blurred, as many applicants can complete screening instruments online simultaneously.

Speed versus power tests—Speed tests have a fixed time limit. An important focus of a speed test is the number of items completed in the time period provided. A typing test and many of the scholastic achievement tests are examples of speed tests. A power test allows the test-taker sufficient time to complete all items. Typically, power tests have difficult items, with a focus on the percentage of items answered correctly.

Paper-and-pencil versus performance tests—“Paper-and-pencil tests” refers to both paper versions of tests and online tests, which require some form of written reply, in either a forced choice or an open-ended, “essay” format. Many employee screening tests, and nearly all tests in schools, are of this format. Performance tests, such as typing tests and tests of manual dexterity or grip strength, usually involve the manipulation of physical objects.

As mentioned, many written-type tests are now administered via computer (usually Web-based), which allows greater flexibility in how a test can

Some employment tests involve sophisticated technology, such as this flight simulator used to train and test airline pilots.

be administered. Certain performance-based tests can also be administered via computer simulations (see box “On the Cutting Edge,” p. 116).

Although the format of an employment test is significant, the most important way of classifying the instruments is in terms of the characteristics or attributes they measure such as biographical information (biodata instruments), cognitive abilities, mechanical abilities, motor and sensory abilities, job skills and knowledge, or personality traits (see Table 5.1 for examples of these various tests).

TABLE 5.1

Some Standardized and Well-Researched Tests Used in Employee Screening and Selection

Cognitive Ability Tests

Comprehensive Ability Battery (Hakstian & Cattell, 1975–82): Features 20 tests, each designed

to measure a single primary cognitive ability, many of which are important in industrial settings. Among the tests are those assessing verbal ability, numerical ability, clerical speed and accuracy, and ability to organize and produce ideas, as well as several memory scales.

Wonderlic Cognitive Ability Test (formerly the Wonderlic Personnel Test) (Wonderlic, 1983):

A 50-item, pencil-and-paper test measuring the level of mental ability for employment, which is advertised as the most widely used test of cognitive abilities by employers.

Wechsler Adult Intelligence Scale-Revised or WAIS-R (Wechsler, 1981): A comprehensive group

of 11 subtests measuring general levels of intellectual functioning. The WAIS-R is administered individually and takes more than an hour to complete.

Mechanical Ability Tests

Bennett Mechanical Comprehension Test (Bennett, 1980): A 68-item, pencil-and-paper test of

ability to understand the physical and mechanical principles in practical situations. Can be group administered; comes in two equivalent forms.

Mechanical Ability Test (Morrisby, 1955): A 35-item, multiple-choice instrument that

measures natural mechanical aptitude. Used to predict potential in engineering, assembly work, carpentry, and building trades.

Motor and Sensory Ability Tests

Hand-Tool Dexterity Test (Bennett, 1981): Using a wooden frame, wrenches, and screwdrivers, the

test-taker takes apart 12 bolts in a prescribed sequence and reassembles them in another position. This speed test measures manipulative skills important in factory jobs and in jobs servicing mechanical equipment and automobiles.

O’Connor Finger Dexterity Test (O’Connor, 1977): A timed performance test measuring fine

motor dexterity needed for fine assembly work and other jobs requiring manipulation of small objects. Test-taker is given a board with symmetrical rows of holes and a cup of pins. The task is to place three pins in each hole as quickly as possible.

Job Skills and Knowledge Tests

Minnesota Clerical Assessment Battery or MCAB (Vale & Prestwood, 1987): A self-administered

battery of six subtests measuring the skills and knowledge necessary for clerical and secretarial work. Testing is completely computer-administered. Included are tests of typing, proofreading, filing, business vocabulary, business math, and clerical knowledge.

Employee Screening and Assessment 107

Biodata instruments

As mentioned earlier, biodata refers to background information and personal characteristics that can be used in a systematic fashion to select employees. Developing biodata instruments typically involves taking information that would appear on application forms and other items about background, personal interests, and behavior and using that information to develop a form of forced-choice employment test. Along with items designed to measure basic biographical information, such as education and work history, the biodata instrument might also involve questions of a more personal nature, probing the applicant’s attitudes, values, likes, and dislikes (Breaugh, 2009; Stokes, Mumford, & Owens, 1994). Biodata instruments are unlike the other test instruments we will discuss because there are no standardized biodata instruments. Instead, biodata instruments take a great deal of research to develop and validate. Because biodata instruments are typically designed to screen applicants for one specific job, they are most likely to be used only for higher- level positions. Research indicates that biodata instruments can be effective screening and placement tools (Dean, 2004; Mount, Witt, & Barrick, 2000;

biodata

background information and personal characteristics that can be used in employee selection

Purdue Blueprint Reading Test (Owen & Arnold, 1958): A multiple-choice test assessing the

ability to read standard blueprints.

Various Tests of Software Skills. Includes knowledge-based and performance-based tests of basic computer operations, word processing, and spreadsheet use.

Personality Tests

California Psychological Inventory or CPI (Gough, 1987): A 480-item, pencil-and-paper inventory

of 20 personality dimensions. Has been used in selecting managers, sales personnel, and leader- ship positions.

Hogan Personnel Selection Series (Hogan & Hogan, 1985): These pencil-and-paper tests

assess personality dimensions of applicants and compares their profiles to patterns of successful job incumbents in clerical, sales, and managerial positions. Consists of four inventories: the prospective employee potential inventory, the clerical potential inventory, the sales potential inventory, and the managerial potential inventory.

Sixteen Personality Factors Questionnaire or 16 PF (Cattell, 1986): Similar to the CPI, this

test measures 16 basic personality dimensions, some of which are related to successful job performance in certain positions. This general personality inventory has been used extensively in employee screening and selection.

Revised NEO Personality Inventory or NEO-PI-R (Costa & McCrae, 1992). A very popular

personality inventory used in employee screening and selection. This inventory measures the five “core” personality constructs of Neuroticism (N), Extraversion (E), Openness (O), Agreeableness (A), and Conscientiousness (C).

Bar-On Emotional Quotient Inventory (EQ-I; Bar-On, 1997) and the Mayer–Salovey–Caruso

In document PIO - Riggio 2013.pdf (Page 120-134)