Soalan Latihan Validity n Reliability

(1)

VALIDITY AND R

VALIDITY AND RELIABILITYELIABILITY 1. Which of

1. Which of the following is the best dthe following is the best definition of reliability?efinition of reliability?

a. Reliability refers to whether the data collection process measures what it is a. Reliability refers to whether the data collection process measures what it is supposed to measure.

supposed to measure.

b. Reliability refers to the degree to

b. Reliability refers to the degree to which the data collection process covers the which the data collection process covers the entireentire scope of the content it is supposed to cover.

scope of the content it is supposed to cover. c. Reliability refers to whether or

c. Reliability refers to whether or not the data collection process is appropriate for thenot the data collection process is appropriate for the people to whom it will

people to whom it will be administered.be administered.

d. Reliability refers to the consistency with w

d. Reliability refers to the consistency with which the data collection processhich the data collection process measures whatever it measures.

measures whatever it measures.

2. Mary took a test on w

2. Mary took a test on which she received a score of 75. The teacher's house burnedhich she received a score of 75. The teacher's house burned down, and the tests

down, and the tests were destroyed. Mary took the same test over were destroyed. Mary took the same test over again the next dayagain the next day and again received a score of 75.

and again received a score of 75. a. There is

a. There is evidence to suggest that Mary's test was revidence to suggest that Mary's test was reliable.eliable. b. There is

b. There is evidence to suggest that Mary's test was unreliable.evidence to suggest that Mary's test was unreliable. c. There is no

c. There is no evidence upon which to base even a tentative judgment about theevidence upon which to base even a tentative judgment about the reliability of the test.

reliability of the test.

3. Marvin's final exam was scored by his teacher, who gave him a 64. This would 3. Marvin's final exam was scored by his teacher, who gave him a 64. This would have caused him to fail the

have caused him to fail the course. He protested to the course. He protested to the school officials, and two otherschool officials, and two other teachers scored the same test

teachers scored the same test. One of them gave him a . One of them gave him a 75, and the other an 85.75, and the other an 85. a. There is

a. There is evidence that Marvin's test was scored reliably.evidence that Marvin's test was scored reliably. b. There is

b. There is evidence that Marvin's test was scored unreliably.evidence that Marvin's test was scored unreliably. c. There is no

c. There is no evidence upon which to base even a tentative judgment about theevidence upon which to base even a tentative judgment about the reliability of the scoring process.

(2)

4. Ella May's teacher rated her

4. Ella May's teacher rated her behavior as indicating that she was quite popular. Thebehavior as indicating that she was quite popular. The teacher's aide assigned to the same classroom rated Ella May as

teacher's aide assigned to the same classroom rated Ella May as being uncooperativebeing uncooperative when given assignments.

when given assignments. a. There is

a. There is evidence that the rating process was reliable.evidence that the rating process was reliable. b. There is

b. There is evidence that the rating process was unreliable.evidence that the rating process was unreliable. c. There is no

c. There is no evidence upon which to base even a tentative judgment about theevidence upon which to base even a tentative judgment about the reliability of the rating process.

reliability of the rating process.

5. Donald was rated by

5. Donald was rated by his teacher as being unable to his teacher as being unable to perform the mathematicaperform the mathematical skillsl skills necessary for the next math un

necessary for the next math unit. Because of this low rit. Because of this low rating, Donald received a specialating, Donald received a special programmed unit to help him review his skills. A day later, after completing the

programmed unit to help him review his skills. A day later, after completing the programmed materials, Donald was rated by the same teacher as able to

programmed materials, Donald was rated by the same teacher as able to perform theperform the skills necessary for the next unit.

skills necessary for the next unit. a. There is

a. There is evidence that the teacher's rating system was reliable.evidence that the teacher's rating system was reliable. b. There is

b. There is evidence that the rating system was unreliable.evidence that the rating system was unreliable. c. There is no

c. There is no evidence upon which to base even a tentative judgment about theevidence upon which to base even a tentative judgment about the reliability of the rating process.

reliability of the rating process.

Questions 6 through 8 go together. Questions 6 through 8 go together.

6. Miss Curtis was planning to teach a unit on English grammar to her ninth graders. 6. Miss Curtis was planning to teach a unit on English grammar to her ninth graders. She planned to give one

She planned to give one test as a pretest, and test as a pretest, and another as a posttest. Then she panother as a posttest. Then she plannedlanned to compare the two sets of scores to determine whether or not

to compare the two sets of scores to determine whether or not the students hadthe students had profited from the unit. Ex

profited from the unit. Examine the following list of statements and indicate whichamine the following list of statements and indicate which ones suggest that her

ones suggest that her tests lacked reliability. (Choose more than one answer.)tests lacked reliability. (Choose more than one answer.) a. Each form of

a. Each form of the test contained 50 items, worth two the test contained 50 items, worth two points apiece.points apiece. b. When the tests were

b. When the tests were scored by two separate persons, the rscored by two separate persons, the results were exactly theesults were exactly the same.

same.

c. Thirty-five of the items on each of the alternate forms of the test were answered c. Thirty-five of the items on each of the alternate forms of the test were answered correctly by everyone.

(3)

d. Rather than giving highly structured

d. Rather than giving highly structured instructions, she allowed the students to ask instructions, she allowed the students to ask questions as they went along, and

questions as they went along, and provided information as it was requested.provided information as it was requested. e. The average score on

e. The average score on the posttest was substantially higher than the average score onthe posttest was substantially higher than the average score on the pretest.

the pretest.

7. After Miss Curtis had administered both forms of her

7. After Miss Curtis had administered both forms of her English grammaEnglish grammar test, sher test, she decided to revise it. By doing this, she hoped to make it a more reliable test the next decided to revise it. By doing this, she hoped to make it a more reliable test the next year. Examine the following list of

year. Examine the following list of statements and indicate which ones would bestatements and indicate which ones would be likely to increase the reliability of the

likely to increase the reliability of the test. (Choose more than one.)test. (Choose more than one.) a. She wrote out

a. She wrote out a detailed set of instructions based on a detailed set of instructions based on the questions which had arisenthe questions which had arisen this time, and she attached these instructions to the

this time, and she attached these instructions to the tests.tests. b. She increased the length of the test from 50 to 75

b. She increased the length of the test from 50 to 75 items on each form.items on each form. c. She eliminated several of the items wh

c. She eliminated several of the items which everyone had answered correctly,ich everyone had answered correctly, because she found that these had

because she found that these had included irrelevant clues which enabled the studentsincluded irrelevant clues which enabled the students to get them right.

to get them right. She replaced these with items that she felt contained no irrelevantShe replaced these with items that she felt contained no irrelevant clues.

clues.

d. She eliminated each of the

d. She eliminated each of the items which had been missed by 40items which had been missed by 40-60% of the students-60% of the students on the pretest and by 70-90% of the

on the pretest and by 70-90% of the students on the posttest.students on the posttest. e. She decided to base many

e. She decided to base many of her new of her new items on contemporary music, since nearly allitems on contemporary music, since nearly all the students seemed to be interested in su

the students seemed to be interested in such music.ch music.

8. Miss Curtis decides to

8. Miss Curtis decides to compute statisticcompute statistical reliability to help determine the degree of al reliability to help determine the degree of reliability her tests possess. The tests are

reliability her tests possess. The tests are multiple choice/tmultiple choice/true-false in format. Sherue-false in format. She considers them to be criterion-referenced rather than norm-referenced tests. Her main considers them to be criterion-referenced rather than norm-referenced tests. Her main concern is that the decisions she would make on the basis of the results would be concern is that the decisions she would make on the basis of the results would be based on actual abilities of the students rather than

based on actual abilities of the students rather than on unique aspects of the testingon unique aspects of the testing situation. She is also concerned that any differences between the pretest and

situation. She is also concerned that any differences between the pretest and thethe posttest should indicate real differences, rather

posttest should indicate real differences, rather than merely differences between thethan merely differences between the two tests. Which of the

two tests. Which of the following types of statistical reliability would help Miss Curtisfollowing types of statistical reliability would help Miss Curtis make useful decisions about her tests? (Choose as many

make useful decisions about her tests? (Choose as many as necessary.)as necessary.) a. Test-retest

(4)

b. Equivalent-forms reliability. b. Equivalent-forms reliability. c. Internal

c. Internal consistency reliability.consistency reliability. d. Interscorer reliability.

d. Interscorer reliability. e. Interobserver

e. Interobserver agreementagreement

9. Which of

9. Which of the following types of statistical reliability require that the same test bethe following types of statistical reliability require that the same test be administered to the same persons two times? (Choose as

administered to the same persons two times? (Choose as many as necessary.)many as necessary.) a. Test-retest

a. Test-retest reliabilitreliability.y.

b. Equivalent-forms reliability. b. Equivalent-forms reliability. c. Internal

c. Internal consistency reliability.consistency reliability. d. Interscorer reliability.

d. Interscorer reliability. e. Interobserver

e. Interobserver agreementagreement

10. Which of the

10. Which of the following is a following is a major weakness of the statistical techniques formajor weakness of the statistical techniques for estimating reliability? (Choose only one.)

estimating reliability? (Choose only one.)

a. When respondents give different answers because of chance factors such as

a. When respondents give different answers because of chance factors such as healthhealth problems or luck, this lowers

problems or luck, this lowers the statistical estimatthe statistical estimate of reliability.e of reliability.

b. When a large number of persons master a skill and therefore get the answer right, b. When a large number of persons master a skill and therefore get the answer right, this lowers

this lowers statisticastatistical reliability.l reliability.

c. Changes in the directions as they are given or as they are perceived by the c. Changes in the directions as they are given or as they are perceived by the respondents will lower the

respondents will lower the statistical reliabilitstatistical reliability.y.

d. Essay tests receive lower estimates of statistical reliability than more objective d. Essay tests receive lower estimates of statistical reliability than more objective tests, because there are more likely to be

tests, because there are more likely to be subjective factors influencinsubjective factors influencing the scoringg the scoring process.

(5)

11. If

11. If a teacher has access to two a teacher has access to two tests which attempt to measure the same researchtests which attempt to measure the same research variable, she should almost always choose the test which

variable, she should almost always choose the test which is the more reliableis the more reliable (provided they take about the same amount of

(provided they take about the same amount of time to administer and score).time to administer and score). a. True.

a. True. b. False. b. False.

12. Which of the following is the best definition of validity? 12. Which of the following is the best definition of validity? a. Validity deals with whether the data

a. Validity deals with whether the data collection process actualcollection process actually measures what itly measures what it purports to be

purports to be measuring.measuring.

b. Validity deals with whether the

b. Validity deals with whether the data collection process is designed at thedata collection process is designed at the appropriate level of difficulty.

appropriate level of difficulty.

c. Validity deals with whether the data collection process

c. Validity deals with whether the data collection process is consistent in measuringis consistent in measuring whatever it measures.

whatever it measures. d. Validity deals with the

d. Validity deals with the question of how subjectivity can best be controlled in question of how subjectivity can best be controlled in thethe scoring process.

scoring process.

e. Validity deals with the

e. Validity deals with the standardizastandardization of procedures for tion of procedures for administadministering, scoring, andering, scoring, and interpreting data collection processes.

interpreting data collection processes.

13. All but one of the following

13. All but one of the following are factors which directly inflare factors which directly influence the validity of auence the validity of a data collection process. Choose the

data collection process. Choose the exception.exception. a. The

a. The logical appropriatenlogical appropriateness of ess of the operational definition.the operational definition. b. The match between the

b. The match between the tasks in the data collection process and the operationaltasks in the data collection process and the operational definition.

definition.

c. The difficulty of

c. The difficulty of the data collection process.the data collection process. d. The reliability of the

(6)

14. Mr. Gomez wants to

14. Mr. Gomez wants to help his students become familiar with educationalhelp his students become familiar with educational television. He defines "familiarity" with

television. He defines "familiarity" with educatieducational television as meaning onal television as meaning that thethat the students will be able to

students will be able to name several of the shows on name several of the shows on the local educational televisiothe local educational televisionn station. To measure this research variable, he asks the

station. To measure this research variable, he asks the students one day to write students one day to write downdown the name of as many shows as they can think of w

the name of as many shows as they can think of which were on the local educationalhich were on the local educational channel the night before. He then

channel the night before. He then concludes that the students who can name moreconcludes that the students who can name more shows are more familiar with

shows are more familiar with educatioeducational television than those who name few nal television than those who name few or noor no shows accurately. What is the most obvious reason why

shows accurately. What is the most obvious reason why this measurement strategy isthis measurement strategy is likely to be invalid?

likely to be invalid?

a. It is likely to be unreliable. a. It is likely to be unreliable. b. The task doesn't

b. The task doesn't match the operational definition.match the operational definition. c. The o

c. The operational definition is logically inappropriate.perational definition is logically inappropriate. d. The task r

d. The task requires that the students be familiar with equires that the students be familiar with the local educational televisithe local educational televisionon station.

station.

15. Miss Chesterton is teaching her

15. Miss Chesterton is teaching her students to use English grammar correctly. Shestudents to use English grammar correctly. She operationally defines using English grammar correctly as meaning that they will operationally defines using English grammar correctly as meaning that they will

follow all the rules of normal English grammar in the compositions they write. On the follow all the rules of normal English grammar in the compositions they write. On the exam, she determines how well the students have met

exam, she determines how well the students have met this goal by requiring them tothis goal by requiring them to diagram twenty sentences of varying levels of complexity. What is the

diagram twenty sentences of varying levels of complexity. What is the most obviousmost obvious reason why this measurement strategy is likely to

reason why this measurement strategy is likely to be invalid?be invalid? a. It is likely to be unreliable.

a. It is likely to be unreliable. b. The task doesn't

b. The task doesn't match the operational definition.match the operational definition. c. The o

c. The operational definition is logically inappropriate.perational definition is logically inappropriate.

d. The task requires that the students be capable of following the rules of English d. The task requires that the students be capable of following the rules of English grammar in their writing.

grammar in their writing.

16. Professor Carter wants her

16. Professor Carter wants her students to develop a genuine appreciation of students to develop a genuine appreciation of Shakespeare'

Shakespeare's plays. She operationally defines this to s plays. She operationally defines this to mean that the students will bemean that the students will be able to recall lines of the plays from memory. She measures this by giving the

(7)

students several important scenes with lines omitted and having them fill in students several important scenes with lines omitted and having them fill in thethe missing lines. What is the most

missing lines. What is the most obvious reason why this measurement strategy isobvious reason why this measurement strategy is likely to be invalid?

likely to be invalid?

a. It is likely to be unreliable. a. It is likely to be unreliable. b. The task

b. The task doesn't match the operational definition.doesn't match the operational definition. c. The o

c. The operational definition is logically inappropriate.perational definition is logically inappropriate. d. The students may not be able to recall the lines. d. The students may not be able to recall the lines.

Questions 17 through 20

Questions 17 through 20 are based on the following information.are based on the following information. Mrs. Green wants to

Mrs. Green wants to measure Kathy's reading comprehension by having her read ameasure Kathy's reading comprehension by having her read a story and then relate it

story and then relate it to her own to her own experience. Examine eacexperience. Examine each of h of the followingthe following statements (assuming they are all true), and indicate whether each would or

statements (assuming they are all true), and indicate whether each would or would notwould not weaken the validity of Mrs. Gr

weaken the validity of Mrs. Green's testing strategy.een's testing strategy. 17. Even outside reading situations, Kathy has a

17. Even outside reading situations, Kathy has a great deal of trouble relating anygreat deal of trouble relating any stories at all to

stories at all to her personal life.her personal life. a. Weakens the validity of the

a. Weakens the validity of the data collection process.data collection process. b. Does not

b. Does not weaken the validity of the data collection process.weaken the validity of the data collection process.

18. Kathy has

18. Kathy has trouble understanding the passage.trouble understanding the passage. a. Weakens the validity of the

19. Kathy becomes anxious because she has to

19. Kathy becomes anxious because she has to take the test aloud in front take the test aloud in front of the class,of the class, and anxiety makes her perform poorly.

and anxiety makes her perform poorly. a. Weakens the validity of the

(8)

b. Does not

20. The passage is

20. The passage is extremelextremely short.y short. a. Weakens the validity of the

b. Does not weaken the validity of the data weaken the validity of the data collecticollection process.on process.

21. Ms. Monroe has

21. Ms. Monroe has developed a questionnaire to measure her students' attitudesdeveloped a questionnaire to measure her students' attitudes toward the practicum in her nursing

toward the practicum in her nursing training program. She is concerned about whethertraining program. She is concerned about whether the questions apply proportionately to all

the questions apply proportionately to all the aspects of the program. What tool the aspects of the program. What tool forfor estimating aspects of validity would help Ms. Monroe make a

estimating aspects of validity would help Ms. Monroe make a sound judgment in thissound judgment in this regard? regard? a. Content validity. a. Content validity. b. Criterion-related validity. b. Criterion-related validity. c. Construct validity. c. Construct validity. d. None of the above. d. None of the above.

22. Mr. Shepard has

22. Mr. Shepard has developed a criterion-referencedeveloped a criterion-referenced test on d test on basic mathematbasic mathematicic abilities. He wants to be

abilities. He wants to be sure it gives appropriate coverage to all the sure it gives appropriate coverage to all the topics coveredtopics covered during the semester. What tool for

during the semester. What tool for estimating aspecestimating aspects of vts of validity would help Mr.alidity would help Mr. Shepard make a sound judgment in

Shepard make a sound judgment in this regard?this regard? a. Content validity. a. Content validity. b. Criterion-related validity. b. Criterion-related validity. c. Construct validity. c. Construct validity. d. None of the above. d. None of the above.

(9)

23. Professor DuParc has developed an observational strategy to measure a person's 23. Professor DuParc has developed an observational strategy to measure a person's "independence from peer pressure." What tool for estimating aspects of

"independence from peer pressure." What tool for estimating aspects of validity wouldvalidity would help Professor DuParc to

help Professor DuParc to demonstrate that his strategy really measures demonstrate that his strategy really measures "independe"independencence from peer

from peer pressure" rather than some pressure" rather than some other characteristicother characteristic?? a. Content validity. a. Content validity. b. Criterion-related validity. b. Criterion-related validity. c. Construct validity. c. Construct validity. d. None of the above. d. None of the above.

24. Mrs. Masters has

24. Mrs. Masters has been admitting persons into her Advanced Composition coursebeen admitting persons into her Advanced Composition course on the basis of

on the basis of their performance in Introductory English. She decides that she couldtheir performance in Introductory English. She decides that she could make better selections if she would

make better selections if she would have the applicants take a special test, and thenhave the applicants take a special test, and then successful candidate

successful candidates would be s would be those who scored highest on those who scored highest on the test. What tool forthe test. What tool for estimating aspects of validity would help Mrs. Masters demonstrate that her new estimating aspects of validity would help Mrs. Masters demonstrate that her new procedure is better than the old

procedure is better than the old one?one? a. Content validity. a. Content validity. b. Criterion-related validity. b. Criterion-related validity. c. Construct validity. c. Construct validity. d. None of the above. d. None of the above.

(10)

Review Quiz Review Quiz

1.

1. (d). This is a paraphrase of the definition given in the textbook. If you (d). This is a paraphrase of the definition given in the textbook. If you chosechose (a), you selected the definition of validity.

(a), you selected the definition of validity. 2.

2. (a). This doesn't conclusively prove that the test is entirely valid, but it does(a). This doesn't conclusively prove that the test is entirely valid, but it does give some evidence in

give some evidence in that direction, since it that direction, since it demonstrates consistencdemonstrates consistency.y. 3.

3. (b). If (b). If the measurement process were reliablethe measurement process were reliable, Marvin should be , Marvin should be evaluateevaluatedd about the same on each occasion. He has

about the same on each occasion. He has gotten three different scores on threegotten three different scores on three occasions.

occasions. 4.

4. (c). The teacher and the aide are (c). The teacher and the aide are rating different characteristicrating different characteristics (popularity ands (popularity and cooperative

cooperativeness), and so ness), and so it is reasonable that the ratings may be it is reasonable that the ratings may be different. Theredifferent. There has been no attempt here to

has been no attempt here to measure the same thing twice. There is no measure the same thing twice. There is no evidenceevidence on which to

on which to base a judgment regarding consistency.base a judgment regarding consistency. 5.

5. (c). Donald's score changed; but since he had (c). Donald's score changed; but since he had training between the two testingtraining between the two testing occasions, there is no reason to expect the

occasions, there is no reason to expect the ratings to remain stable. Althoughratings to remain stable. Although there has been an attempt here to

there has been an attempt here to measure the same thing twice, there was measure the same thing twice, there was nono reason to expect the ratings to r

reason to expect the ratings to remain the same. There is not enough emain the same. There is not enough evidenceevidence on which to base

on which to base a judgment regarding consistency.a judgment regarding consistency. 6.

6. (c) and (d). (c) and (d). Statement (c) indicates that several of the items were excessivelyStatement (c) indicates that several of the items were excessively easy. Statement (d) indicates that the instructions might change on different easy. Statement (d) indicates that the instructions might change on different testing occasions (because students might ask different questions). Statement testing occasions (because students might ask different questions). Statement (a) indicates a strength (50

(a) indicates a strength (50 items is a reasonably large number of items is a reasonably large number of questions).questions). Statement (b) suggests consistency among people

Statement (b) suggests consistency among people scoring the tests scoring the tests (interscorer(interscorer reliability). Statem

reliability). Statement (e) gives no ent (e) gives no real evidence: the scores changed, but wereal evidence: the scores changed, but we would expect them to change after instruction. Since (except for the

would expect them to change after instruction. Since (except for the test results,test results, which would involve circular reasoning) we don't know

which would involve circular reasoning) we don't know whether the instructionwhether the instruction was effective or not, we don't know whether the test was reliable.

was effective or not, we don't know whether the test was reliable. 7.

7. (a), (b), and (a), (b), and (c). Statement (a) described a way to standardize the measurement(c). Statement (a) described a way to standardize the measurement process, thereby eliminating some extraneous influences. Statement (b) would process, thereby eliminating some extraneous influences. Statement (b) would increase reliability by expanding the sample of items -

increase reliability by expanding the sample of items - assuming that theassuming that the additional 25 items were related to the same outcomes as

additional 25 items were related to the same outcomes as the original 50.the original 50. Statement (c) would increase reliability by increasing the number of effective Statement (c) would increase reliability by increasing the number of effective items, since excessively easy items do not add

items, since excessively easy items do not add to the reliability of a to the reliability of a test.test. Statement (d) describes a bad

Statement (d) describes a bad strategy; eliminatinstrategy; eliminating items of g items of medium difficultymedium difficulty on the pretest would actually reduce the reliability of

on the pretest would actually reduce the reliability of the test. Statement (e)the test. Statement (e) may be a good idea, but it is irrelevant to the concept of reliability.

(11)

8.

8. (a) and (b). (a) and (b). Since she is concerned about unique aspects of the Since she is concerned about unique aspects of the testing situation,testing situation, she needs test-retest reliability. Since she wants to compare posttest results to she needs test-retest reliability. Since she wants to compare posttest results to pretest results, she would like to

pretest results, she would like to have parallel tests, and so she also have parallel tests, and so she also needsneeds equivalent-forms reliability.

equivalent-forms reliability. 9.

9. (a). Test-retest is the only one (a). Test-retest is the only one of those listed that fits of those listed that fits this description.this description. Equivalent-forms requi

Equivalent-forms requires given two forms of the same test to one group res given two forms of the same test to one group of of people. Internal consistenc

people. Internal consistency requires giving the test y requires giving the test just once and thenjust once and then analyzing those results with

analyzing those results with coefficiecoefficient alpha. Innt alpha. Interscorer reliability requiresterscorer reliability requires giving the test just once and

giving the test just once and then having two persons score it. Ithen having two persons score it. Interobservernterobserver agreement requires two persons to observe the same set

agreement requires two persons to observe the same set of behaviors and toof behaviors and to compare their results to see if

compare their results to see if the agreed on what they observed.the agreed on what they observed. 10.

10.(b). Restrictions in the range of (b). Restrictions in the range of scores lower statistical reliabilitscores lower statistical reliability. Sincey. Since restrictions sometim

restrictions sometimes occur for es occur for good reasons (e.g., student mastery of good reasons (e.g., student mastery of thethe information), this would be considered a possible weakness of their

information), this would be considered a possible weakness of their use.use. Statement

Statements (a), (s (a), (c), and (d) describe strengths of c), and (d) describe strengths of reliability coefficiereliability coefficients, sincents, since reliability is supposed to notice and rule

reliability is supposed to notice and rule out these extraneous factors.out these extraneous factors. 11.

11.(b). Validity (not reliability) is (b). Validity (not reliability) is the most important factor in test design the most important factor in test design andand selection. It is easy to

selection. It is easy to develop tests with high reliability that lack validity.develop tests with high reliability that lack validity. 12.

12.(a). This is a paraphrase of the textbook's definition of validity. If you chose(a). This is a paraphrase of the textbook's definition of validity. If you chose (c), you chose the definition of

(c), you chose the definition of reliability.reliability. 13.

13.(c). The difficulty of (c). The difficulty of the data collection process may be an importantthe data collection process may be an important

consideration, but it does not directly influence validity. The other three are the consideration, but it does not directly influence validity. The other three are the factors listed by the textbook as

factors listed by the textbook as influencing validity.influencing validity. 14.

14.(a). The measurement process is likely to unreliable (and hence, invalid),(a). The measurement process is likely to unreliable (and hence, invalid), because Mr. Gomez has used a

because Mr. Gomez has used a single question focusing on a single night. single question focusing on a single night. HeHe should sample several nights, ask more questions, or use

should sample several nights, ask more questions, or use multiple operatimultiple operationalonal definitions. The operational definition seems appropriate (naming shows definitions. The operational definition seems appropriate (naming shows sounds close to familiarity), and the task matched the operational

sounds close to familiarity), and the task matched the operational definition (Hedefinition (He asked students to name shows). To

asked students to name shows). To the extent that statement (d) is true, the extent that statement (d) is true, Mr.Mr. Gomez would have evidence of validity, not invalidity, since it states exactly Gomez would have evidence of validity, not invalidity, since it states exactly what he is trying to measure.

what he is trying to measure. 15.

15.(b). Diagramming sentences is not even remotely synonymous with following(b). Diagramming sentences is not even remotely synonymous with following the rules of normal English

(12)

16.

16.(c). Few people would seriously argue that (c). Few people would seriously argue that recalling lines from a play isrecalling lines from a play is synonymous with appreciating that play. The tasks do match the operational synonymous with appreciating that play. The tasks do match the operational definition, and there is no

definition, and there is no reason to believe that the test is reason to believe that the test is unreliable; but theunreliable; but the faulty operational definition ruins the validity of this data collection process. faulty operational definition ruins the validity of this data collection process. 17.

17.(a). If Kathy has problems relating stories in general to her own life, then(a). If Kathy has problems relating stories in general to her own life, then making her perform this task

making her perform this task to indicate reading comprehension is invalid. Sheto indicate reading comprehension is invalid. She would be doing two

would be doing two tasks: (a) comprehending (at which she might succeed) andtasks: (a) comprehending (at which she might succeed) and (b) relating to her own life (at which she might fail). By failing at the second (b) relating to her own life (at which she might fail). By failing at the second task, she would look like she had failed at the first. Other

task, she would look like she had failed at the first. Other students, who had nostudents, who had no trouble relating stories to their personal lives, would be

trouble relating stories to their personal lives, would be performing only oneperforming only one real task (comprehending), and so failure at that

real task (comprehending), and so failure at that task would indicate a lack of task would indicate a lack of comprehension.

comprehension. 18.

18.(b). This does not weaken the validity of the test. Quite the contrary, it's(b). This does not weaken the validity of the test. Quite the contrary, it's evidence that the test is measuring what it's

evidence that the test is measuring what it's supposed to be measuring.supposed to be measuring. 19.

19.(a). The test is (a). The test is supposed to require her to supposed to require her to comprehend (presumacomprehend (presumably underbly under normal circumstance

normal circumstances). The task shs). The task she is actually required to perform is e is actually required to perform is toto comprehend under conditions of extreme anxiety.

comprehend under conditions of extreme anxiety. 20.

20.(a). This is likely to weaken the reliability of the test, because it is an(a). This is likely to weaken the reliability of the test, because it is an inadequate sample of behavior. Weakening reliability is one

inadequate sample of behavior. Weakening reliability is one way to way to weakenweaken validity.

validity. 21.

21.(a). She is (a). She is concerneconcerned that the data d that the data collection process covers the entire range of collection process covers the entire range of what it should cover. This is a good paraphrase of the definition of content what it should cover. This is a good paraphrase of the definition of content validity.

validity. 22.

22.(a). He is (a). He is concerned that the data collection process covers the entire range of concerned that the data collection process covers the entire range of what it should cover. This is a good paraphrase of the definition of content what it should cover. This is a good paraphrase of the definition of content validity.

validity. 23.

23.(c). "Independence from peer (c). "Independence from peer pressure" is an pressure" is an internalized concept (construct)internalized concept (construct) that Professor DuParc wants to measure. Construct validity would help him that Professor DuParc wants to measure. Construct validity would help him demonstrate that he has done so correctly.

demonstrate that he has done so correctly. 24.

24.(c). She is (c). She is trying to predict performance in the Advanced Composition course.trying to predict performance in the Advanced Composition course. Predictive validity (a form of criterion-related validity) wold help her

Predictive validity (a form of criterion-related validity) wold help her determinedetermine whether the special test was useful for

(13)