• No results found

Some Guidelines for Academic Quality Rankings

N/A
N/A
Protected

Academic year: 2021

Share "Some Guidelines for Academic Quality Rankings"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Some Guidelines for Academic Quality Rankings

MARGUERITE CLARKE

While rankings are a popular method for comparing the relative quality of higher education institutions, there is much confusion and debate over which indicators to use and how to present the information in ranked format. This article offers some guidelines in both areas as well as questions to consider at each stage of the ranking process.

Nonetheless, just as democracy, according to Winston Churchill, is the worst form of government except for all the others, so quality rankings are the worst device for comparing the quality of … colleges and universities, except for all the others (Webster, 1986, p. 6).

Academic quality rankings are a controversial, but enduring, part of the educational landscape—controversial because not everyone agrees that the quality of a school or a programme can be quantified (Casper, 1996), and enduring because of the lack of other publicly attractive methods for comparing institutions (Sanoff, 1998). While the lack of appealing alternatives has legitimated the use of rankings in the eyes of many (but not all), there is still a lively debate over the issue of how to rank (Hattendorf, 1993).

This article addresses the “how to” issue by offering some general guidelines for academic quality rankings. The discussion is presented in three parts and is couched in an American context. The first section of the paper outlines the conceptual categories that underpin many ranking efforts. It describes the strengths and limitations of some of the methods used to collect information on each, and offers some guidelines for the selection of ranking indicators. The second section examines the popular weight-and-sum approach to presenting this information in ranked format and explores its limitations for the evaluation of educational quality.1The third section draws on the findings of the previous two and presents a list of things to consider at each stage of the ranking process.

Before proceeding, it is worth defining what an academic quality ranking means. According to Webster (1986), an academic quality ranking:

[M]ust be arranged according to some criterion or set of criteria which the compiler(s) of the list believed measured or reflected academic quality[; and] it must be a list of the best colleges, universities, or departments in a field of study, in numerical order according to their supposed quality, with each school or department having its own individual rank, not just lumped together with other schools into a handful of quality classes, groups, or levels (p. 5).

This definition reveals two characteristics of academic quality rankings that are the key to the discussion that follows. The first characteristic is that the choice of indicators rests with those doing the ranking. Consequently, while certain normative views of academic quality exist, the set of indicators used will vary according to the value system of the person

1

While this section may seem more of a “how not to” than a “how to” one, the discussion raises important issues that should be kept in mind no matter what ranking approach is used.

ISSN 0379-7724 print/ISSN 1469-8358 online/02/040443-172002 UNESCO DOI: 10.1080/0379772022000071922

(2)

or group doing the ranking. The second characteristic is that schools or programmes must be placed in order on the basis of their relative performance on these indicators. Thus, when more than one indicator is involved, the information must either be combined in order to produce a single ranked list of schools, or schools must be ranked separately on each indicator. The next section focuses on the first of these ranking characteristics.

CONCEPTUALIZING AND MEASURING ACADEMIC QUALITY

The conceptualizations of quality that underpin most ranking efforts can be organized into three categories: student achievements, faculty accomplishments, and institutional academic resources. These categories are portrayed in Table 1 along with some of the methods traditionally used to collect information on each. The data obtained through these methods are organized into indicators. For instance, the information obtained through a survey of reputation is usually recorded in the form of a score for each institution. Together, these scores form a “reputation” indicator that may be used—either alone, or in conjunction with other indicators—to rank schools.

As indicated in the third column of Table 1, each indicator has strengths and weaknesses that should be kept in mind when determining the appropriateness of its use (see also Hattendorf, 1993). For example, while surveys of reputation can produce rankings with high public credibility, they can also be misleading, if the overall reputation of an institution clouds the evaluation of a particular department by a rater (e.g., if a university has a strong reputation, this fact might lead a rater to give a higher than deserved score to a weak department within the university).

Another strength of surveys of reputation is that they tend to be good at identifying the strongest programmes in a particular area. For instance, if one wanted to identify the top-ten graduate schools of business, a survey of reputation might serve the purpose very well. However, these surveys are not effective when trying to sort out allthe programmes in a particular area since not all schools have well-known reputations. Consequently, using this approach to rank all 352 accredited business programmes in the United States could be problematic since raters may have little or no knowledge of many of the programmes they are being asked to evaluate. The point to keep in mind is that every indicator comes with strengths and weaknesses that need to be considered before including the given indicator in a particular ranking effort.

The set of indicators shown in Table 1 does not reflect some of the more recent changes in higher education (e.g., distance learning and computers in the classroom) that may affect how quality is conceived and measured. Thus, it is useful, when choosing indicators, to think in terms of the more general and desirable measurement properties of validity (does the indicator measure what it purports to measure?), reliability (does it do so in a consistent/error-free fashion?), and comparability (can it be interpreted in a similar way across different kinds of programmes or institutions?) (Linn, 1993). Basically, if an indicator is to be included in a ranking, there should be some evidence (either in the form of existing data or data collected by those doing the ranking) that shows that it is appropriate—i.e., valid, reliable, and comparable—for the intended purpose. It is not easy to establish these properties since there can be conflicting evidence, and opinion, as to the appropriateness of an indicator for a given purpose. For example, among the indicators portrayed in Table 1, the validity of the test scores of incoming students as a measure of institutional quality has been criticized on the grounds that it measures only what students bring with them and not what the institution does for students (Seaman, 1998). However, others argue that it is a valid, albeit indirect, indicator of institutional quality since “higher

(3)

TABLE1. Categories of academic quality

Category Method/indicator Advantage

Faculty Surveys of reputation They produce results with face validity,i.e.,

accomplishments (e.g., ratings of faculty or results that most nearly match what the educated

programme reputation) public considers the hierarchy of colleges and

universities to be Disadvantage

The overall reputation of an institution may influence the assessments by raters of the particular department(s) they are being asked to rank

Counts of faculty awards, They are useful for ranking the best or the

honours, and prizes better institutions

Disadvantage

They may be years behind or ahead of reality

Counts of faculty citations Useful in assessing the influence and

in citation indexes importance of the publications of faculty

members, and not just their sheer volume Disadvantage

The citation indexes on which the rankings are based do not distinguish between “good”, “neutral”, or “bad” citations

Student Distinguished alumni While only a small percentage of colleges

achievements and the achievements and universities have faculties that produce

of graduates after much research, almost all of them attempt to

graduation prepare their students for

rewarding careers in later life Disadvantage

Usually years, if not decades, behind reality

The scores of incoming The data are easy to obtain and are a measure

students on standardized by which most institutions can be ranked

tests Disadvantage

Based on the academic abilities of students before they enter college and thus fail to consider anything that these institutions do to educate their students once

they enroll

Institutional Compilation of measures The data are easy to obtain and are a measure

academic of institutional resources, on which all institutions can be compared

resources including educational Disadvantage

expenditure per student, Offers little or no information about how

faculty–student ratios, often and how beneficially students use

and library resources these resources

Source: Adapted from Webster (1986).

quality” institutions tend to attract the most talented students, and one way of measuring this attraction is through the scores of incoming students on standardized tests (Morse and Flanigan, 2001).

Another way of thinking about the choice of indicators is in terms of inputs, processes, and outputs. All else being equal, process (e.g., teaching quality) and output (e.g., effectiveness of graduates in the workplace) measures are preferred since they are better

(4)

indications of the quality of the instruction, preparation, and resources offered by an institution. The measures in Table 1 can be grouped into inputs (e.g., the scores of incoming students on standardized tests and library resources) and outputs (e.g., faculty citations and awards). The lack of process measures in this table, and more generally, is explained by their being more difficult to identify and more difficult and costly to measure. However, when available, they can provide very useful information on aspects such as classroom environment and teaching effectiveness. For instance, until recently, the rankings of The Times Higher Education Supplement of British universities included a “teaching quality” measure that indicated the effectiveness of instruction in different undergraduate depart-ments at these universities (see ⬍http://www.thesis.co.uk/⬎).

Another guideline to consider when choosing indicators is the objective or subjective nature of the indicators themselves. Objective indicators are those not dependent on the person doing the counting. For example, if two people were asked to compute the student–faculty ratio or the number of books in the library of a particular institution, they would come up with the same result (if given the same formula and no computational errors were made). Subjective indicators are those that can vary depending on who is responding. For example, if two people were asked to rate the overall academic quality of a particular institution (as in a survey of reputation) using the same set of criteria, they might come up with two very different scores because their subjective opinion would enter the process. Not surprisingly, reliability can be more of an issue with subjective indicators.

Other guidelines could be added, but the above list covers some of the more important ones to keep in mind. In addition to these considerations, standardized procedures should be used to collect, store, analyze, and present the information. These controls are necessary because errors at any stage of the process will reduce or eliminate the usefulness of an indicator as a potential measure of academic quality. The next section addresses the second characteristic of rankings outlined at the start of this article—i.e., the choice of method for presenting the information in ranked format.

PRESENTING THE INFORMATION IN RANKED FORMAT

Once the set of indicators has been chosen, a method must be selected for presenting the information in a ranked format. Several options exist, some of which are discussed in other articles in this issue of Higher Education in Europe. Instead of re-examining those approaches, this section explores the popular weight-and-sum approach and discusses its limited usefulness for representing the relative quality of institutions or programmes (see Clarke, 2002a; also Clarke, 2002b). In doing so, some general issues are raised that should be kept in mind when presenting ranked information.

The Weight-and-Sum Approach

The weight-and-sum approach involves assigning a weight to each indicator according to its perceived importance and then using the weights to combine the indicator information into an overall score (Scriven, 1991). While the result of this process is one easy-to-digest number, critics have pointed out several problems, including the fact that the choice of weights is itself a value judgment and thus can vary depending on who is making the decision (Camilli and Firestone, 2000). Depending on the number of criteria and their weights, one dimension may dominate all the others, or several trivial dimensions may

(5)

swamp more crucial ones (Evaluation News, 1981). Nonetheless, the approach works quite well in certain contexts and has been used for years in the area of product evaluations (see

⬍www.consumerreports.org⬎ for an example).

Despite the popularity of this method for ranking cars and toasters, there is divided opinion as to whether it works for the evaluation of educational quality (Hattendorf, 1993; Scriven, 1991). This lack of consensus is partly due to the basing of product ratings on easily observable and quantifiable parts of the product or its performance (e.g., in a car, fuel economy, the warranty, acceleration, and safety features), but academic quality ratings tend to be based on institutional components that are harder to observe or to quantify (e.g., reputation and distinguished alumni). Consequently, it is not as easy to reach consensus as to which indicators to use and how to measure them when assessing academic quality as it is in the case of assessing car performance.

Even if one is conceptually comfortable with using this approach to rank institutions, there is little research on the relationships that exist among the various indicators of academic quality, or on their relative importance in the assessment of institutional or programme quality. Therefore, it is difficult to know how to weight them in order to come up with an overall score. This problem is less pronounced for product ratings since there is a clearer sense of the relative importance of different aspects of product performance as well as how these relate to each other. In addition, it is easier to validate a formula used to rank products since there are more easily observable outcome indicators to rely on—e.g., do cars that are ranked highest actually perform better/break down less frequently?

Limitations of the Weight-and-Sum Approach for the Evaluation of Educational

Quality

One way to look at the implications of these issues for academic quality rankings is by examining rankings that use the weight-and-sum approach. The data presented here are from the weekly magazine US News and World Report. It is one of the most popular sources of college and graduate school rankings in the United States and has been ranking schools since 1983.2While the indicators used for each of the college and graduate school rankings vary, the methodology is the same: i.e., the chosen indicators are standardized, weighted, and summed to produce an overall score on which to rank schools or programmes in each category against their peers (Garrett, 2002; Morse and Flanigan, 2001). No research is cited as the basis for the indicators and weights used, and there is no indication that the properties of the indicators or how they are related to each other have been examined. However, the importance of obtaining this type of information, particularly if the indicators are going to be combined, can be seen by looking at Tables 2 and 3. These tables are based on data from theUS News and World Report2001 calendar year rankings of the top-fifty business schools and the top-fifty schools of education in the United States (data tend not to be published for schools below the top-fifty). In each table, the set of indicators used for that ranking is presented across the top and down the side of the table. The numbers in the table represent the strength of the relationships (or correlations) among the various indicators. The magnitude of the correlation can vary between 0 and 1, with 0 being the weakest and 1 the strongest relationship. In addition, the direction of the relationship 2The college rankings can be obtained at http://www.usnews.com/usnews/edu/college/rankings/rankindex.

htm⬎. The graduate school rankings can be obtained at ⬍http://www.usnews.com/usnews/edu/beyond/bcrank. htm⬎.

(6)

T ABLE 2. Correlations for 2001 business school indicators* Employed Average Average Reputation Reputation Starting Employed at three months GMAT undergraduate Percent (academic) (recruiter) salary graduation later scores GPA accepted Reputation (academic) 1 0.90 0.90 0.32 0.24 0.81 0.57 ⫺ 0.73 Reputation (recruiter) 1 0.86 0.32 0.18 0.70 0.50 ⫺ 0.68 Starting salary 1 0.52 0.43 0.80 0.49 ⫺ 0.68 Employed at graduation 1 0.49 0.30 0.03 ⫺ 0.20 Employed three months later 1 0.24 0.04 ⫺ 0.05 Average GMAT scores 1 0.66 ⫺ 0.69 Average undergraduate GPA 1 ⫺ 0.58 Percent accepted 1 * Based on data for the top-fifty schools and rounded to two decimal places.

(7)

T ABLE 3. Correlations for 2001 education school indicators* Number of Verbal Ratio of doctoral Proportion Research Reputation Reputation GRE Quantitative Percent students degrees in doctoral Total per faculty (academic) (superintendent) scores GRE scores accepted to faculty granted programmes research member Reputation (academic) 1 0.78 0.27 0.45 ⫺ 0.28 0.16 0.39 0.44 0.30 0.001 Reputation (superintendent) 1 0.21 0.23 ⫺ 0.15 0.13 0.21 0.39 0.16 0.05 Verbal GRE scores 1 0.61 ⫺ 0.46 0.14 ⫺ 0.29 ⫺ 0.04 ⫺ 0.08 0.14 Quantitative GRE scores 1 ⫺ 0.41 0.01 ⫺ 0.10 0.10 0.09 0.04 Percent accepted 1 ⫺ 0.34 ⫺ 0.11 0.04 ⫺ 0.48 ⫺ 0.46 Ratio of students 1 0.10 0.02 0.09 0.19 to faculty Number of 1 0.21 0.49 0.01 doctoral degrees granted Proportion in 1 ⫺ 0.12 ⫺ 0.13 doctoral programmes Total research 1 0.63 Research per 1 faculty member * Based on data for the top-fifty schools and rounded to two decimal places.

(8)

can be positive or negative. A positive number means that as values for one indicator increase, the values for the other indicator also tend to increase. A negative number means that as values for one indicator increase, the values for the other indicator tend to decrease. The strength of the relationship between any two indicators in Table 2 or 3 can be determined by finding the number at which the column location for one and the row location for the other intersect. For example, in Table 2 the correlation between “Starting salary” and “Average Graduate Management Admissions Test (GMAT) scores” is 0.8. This means that schools with higher average GMAT scores tend to have graduates with higher starting salaries. The correlation between “Average GMAT scores” and “Percent accepted” is ⫺0.69, which means that schools that accept fewer applicants tend to have incoming students with higher average test scores (note that the relationship is not as strong as that between “Starting salary” and “Average GMAT scores”).

The most obvious difference between Tables 2 and 3 is in the sizes of the correlations. Specifically, the correlations among the business indicators tend to be larger than those among the education indicators, meaning that there are stronger relationships among the indicators used to assess quality in graduate schools of business. A business school that performs well on one of these quality indicators is also likely to perform well on the rest. It is harder to make this prediction with schools of education since the relationships are not as strong. For instance, a school of education with high average “Quantitative Graduate Record Examination (GRE) scores” for its entering students may or may not also have high “Total research” expenditures (this refers to the amount of funded research being conducted at the school). It is hard to predict, since the relationship between these indicators is quite weak (i.e.,r⫽0.09). The reason for the different correlation patterns in Tables 2 and 3 is not clear. For example, these differences could be the result of variations in what quality looks like in a school of educationversuswhat it looks like in a school of business, or they may be due to the differential availability of quality-related information for these school types.

Either way, these correlations have implications for the weighting of the indicators used to produce an overall score. For instance, the relative sizes of the weights assigned to each of the business indicators are less likely to have an impact upon the final ordering of schools in this ranking since schools tend to perform similarly on each indicator (i.e., if they do well on one, they also tend to do well on another). Therefore, whether the heaviest weight is given to, for example, “Reputation (academic)” (the reputation scores given to schools by academics) or “Starting salary” will have little effect on the final outcome since schools tend to perform similarly in regard to both indicators (r⫽0.9). The same cannot be said for the education indicators where, depending on which indicator receives the most weight, there can be a very different rank ordering of schools. For example, giving the largest weight to “Reputation (superintendent)” (the reputation scores given to schools by super-intendents)versus “Total research” (the amount of funded research being conducted at the school) could produce quite different rankings since schools may not perform similarly on both (r⫽0.16).

These differences also raise questions about the choice of ranking indicators in general. Should the indicators have high or low correlations? Are high correlations a sign of validity (i.e., are they all measuring quality)—or of redundancy (are they all measuring the same thing, and are therefore not all needed?) Are low correlations a sign of invalidity (i.e., are some or all of the indicators not measuring quality?) or unique information (i.e., are they all measuring different aspects of academic quality?). In addition, if efficiency is of value, and if the indicators are highly correlated, should some of the indicators be dispensed with since doing so would probably not affect the final ordering of schools? For instance, since

(9)

“Reputation (academic)” correlates highly with most of the other indicators used to rank business schools, one could theoretically dispense with the seven other indicators and still arrive at a very similar ordering of schools.

Another implication of the relationships shown in Tables 2 and 3 is seen in the sensitivity of each ranking to changes in the ranking formula.US News and World Report, like many organizations involved in rankings efforts, makes frequent changes in its ranking formulae. These changes are meant to improve the rankings and may involve the addition or removal of indicators, changes to the methodology/definition for an indicator, or changes to the weights used. Any of these changes can have a significant impact on the relative ordering of schools, an impact that is quite separate from real change in the relative performance on the indicators of the school. The impact is magnified with the weight-and-sum approach, particularly when the indicators used are not highly correlated, since this method produces only one score on which to rank schools.

For example, the correlation between the top-fifty business schools according to the US News and World Report in 1995 and 2001 is 0.88, while the correlation between the top-fifty schools of education in 1995 and 2001 is only 0.78. Thus, the list of the top-fifty business schools in 1995 is quite similar to the list in 2001, but the list of the top-fifty schools of education in 1995 varies somewhat from the list published in 2001. There were quite a few changes to the formula for the education rankings during this six-year period (twenty changes in total), far more than to the formula for the business rankings (eight changes in all).

While this phenomenon is one of the reasons behind the greater movement of schools in the education rankings, it is not the only one. For instance, theUS News and World Report ranking formula for law schools also experienced a large number of changes during this time period (fourteen changes in all), but the list of the top-fifty law schools in 1995 is very similar to that for 2001 (r⫽0.92). The reason for this situation is that the indicators used for the law school rankings tend to be quite highly correlated. Thus, formula changes tend not to affect the final outcome substantially in terms of the relative ordering of schools. The weight-and-sum approach attempts to capture in one number an aggregate evaluation of the worth of an institution relative to others. But this overall score can be misleading for several reasons. It implies a comprehensive measure of the quality of the colleges or programmes being ranked even though no currently available data offer sufficient coverage to accomplish this task (Lombardi et al., 2001). In addition, the indicators used may be problematic in terms of their validity, reliability, and comparability for the schools being ranked (Cantor, 1996). All of these issues are magnified when weights are applied to create an overall score (Casper, 1996). The usefulness of the information is further reduced when frequent changes to the formula make it difficult to interpret movement in the relative positions of schools in the rankings from year to year.

One way to reveal the uncertainty behind this overall score is shown in Tables 4 and 5. The formulae used to produce these tables are given in the explanation that follows them, and more detail is provided in Clarke (2002a). Basically, the tables show what happens to the score awarded a school by US News and World Reportwhen slight changes are made to the indicators and weights being used. The amount of movement in the score awarded to a school owing to these changes is used to generate a “standard error” band around the score. This error band can then be used in a t-test in order to assess whether the overall score of the school is significantly different statistically from that of another.

The results of these school-by-school comparisons for each top-fifty ranking (i.e., business and education) are shown in Tables 4 and 5. In each table, schools are ordered by their US News and World Reportoverall score across the heading and down the rows (the

(10)

T ABLE 4. Business school ratings in 2001 as per the US News and World Report methodology

(11)
(12)

T ABLE 5. Education school ratings in 2001 as per the US News and World Report methodology

(13)
(14)

overall score is located in parentheses after the name of each school). One must read across the row for a school in order to compare its performance with that of the schools listed in the heading of the chart. The symbols indicate whether or not the overall score of the school in the row is significantly higher than that of the comparison school in the heading (arrow pointing up), significantly lower than that of the comparison school in the heading (arrow pointing down), or if there is no statistically significant difference between the two schools (shaded cell with circle). A blank diagonal represents the comparison of a school against itself.

Regarding Tables 4 and 5: The Jackknife Technique

A regression model is substituted for theUS News and World Reportformula by using the overall scores for schools in a ranking as the outcome variable and the indicators as the predictor variables (this process basically replaces one linear model with another). The jackknife procedure then removes one indicator at a time from the regression model, recalculating the overall score for each school with the remaining indicators before replacing the indicator and repeating the procedure. The jackknife standard error for a school is obtained from these values, using the following formula (Efron and Tibshirani, 1993): seˆjacknife⫽

n⫺1 n

ˆ (i)⫺

ˆ(i) n

(2) ,

where n is the number of regression models to be estimated and ˆiis the predicted score

for a school from the ith regression model with one indicator removed.

This standard error can be used in at-test to assess whether the score of one school is significantly different statistically from that of another. In order to control for the increased probability of Type I error (finding a significant difference when there is none) owing to the number of comparisons being made, the Bonferroni method for multiple comparisons is used (see Glass and Hopkins, 1996).

If there were no errors around the overall scores for schools, Tables 4 and 5 would consist only of arrows pointing up and down, except for instances in which two schools have the same overall score and are tied for rank. Such is not the case, as evidenced by the amount of shaded area in each table. The fact that there is more shaded area in Table 5 means that it is harder to find “real” differences between the overall scores for schools of education since the scores of these schools can change quite a bit depending on the indicators and weights being used. The fact that there is less shaded area in Table 5 means that it is easier to find “real” differences between the overall scores for business schools since the scores of these schools tend to be fairly stable. Even so, it is evident that there are far fewer “real” differences between business schools than is implied by the overall score attributed by the US News and World Report.3

CONCLUSION

Any effort at creating an academic quality ranking system must grapple with the difficulty of trying to quantify the intangibles of a set of complex teaching, learning, resource, and 3While the grouping patterns in Tables 4 and 5 are an artifact of the number of schools included in the analyses

(e.g., if data for all 341 business schools surveyed were included, slightly different groupings would result), it illustrates the “false precision” of the overall score produced using a weight-and-sum approach.

(15)

research phenomena. It is evident that the choice of indicators is a non-arbitrary process that should be guided by, among other things, a knowledge of the strengths and limitations of the indicators being considered as well as their validity, reliability, and comparability for the schools or programmes to be ranked. The choice of a method for presenting this information in ranked format must be guided by, among other things, an understanding of the nature of quality in the schools or programmes being ranked as well as the relationships among the quality indicators that are to be used. In addition to these guidelines, the following questions are offered as a way to think about the methodological issues raised at each stage of the ranking process:

What is Being Ranked?

Are institutions, departments, or programmes being ranked? Depending on the answer, not only will the choice of indicators vary; so too will the unit of measurement. For instance, if graduate schools of business are being ranked, then the indicators used should reflect information on graduate schools of business, and not on the institutions in which they are located. Thus, a student–faculty ratio indicator should be based on information for the business school and not for the institution as a whole. In addition, it should be recognized that all schools in a particular area may not be similar and therefore should not be “lumped” into the same ranking. For example, among schools of education there are schools the mission of which is primarily teacher training, and others, the primary mission of which is research. Combining both kinds of schools into the same ranking may be very misleading in terms of showing their relative quality. Two separate rankings would be better.

Why is it Being Ranked? Who is the Intended Audience?

The reasons for rankings are many. For example, the purpose of the ranking may be to inform, to act as a spur for improvement, or to provide benchmarks. The audience for rankings also varies, and may include students, parents, institutions, or the general public. Depending on the reason for a ranking and its intended audience, the choice of indicators and approach will vary. For instance, a college ranking that is meant to help students decide where to go to school will most likely be different from a ranking that is meant to provide information to college administrators or to higher education policy-makers.

What Can I Do to Improve the Quality of My Indicators?

In addition to the above guidelines, information utility and reliability can be improved by using multi-year data. Doing so tends to reduce any anomalies (e.g., spikes or dips) in the performance of an institution that may throw off its ranking. Using score ranges (e.g., percentile ranges) rather than point estimates (e.g., averages) can also provide a better picture of the spread of performance in a particular institution or programme.

Indicator quality may also be adversely affected if institutions manipulate the information being used. For instance, schools may inflate their application pools in order to look more selective, thereby improving their performance on a selectivity indicator. Standardized data collection, processing, and reporting techniques can reduce the occurrence of some of these problems. However, the bigger issue to be addressed may be how to reduce the pressures that institutions feel to do well in rankings.

(16)

How Will I Present the Information to My Audience?

The choice of a specific ranking methodology is dependent on the particular context and goals of the person or group doing the ranking. Nonetheless, the usefulness of the ranking will be increased if the chosen methodology allows the values and needs of the eventual user to be incorporated into the outcome. For example, the indicator information could be made available on a Web site that allows the user to decide on a formula for creating a final ranked list of schools. In addition to allowing the user to specify the indicators and their relative importance in the final outcome, there could also be an option that allows the user to specify a minimum required performance level on some or all of the indicators.

How Often Will I Present the Information to My Audience?

Rankings can be as frequent or as infrequent as one wishes. However, since institutions tend to be stable rather than unstable systems, and since their quality tends to change slowly over time, it is doubtful that annual rankings offer any educational value to the average consumer.

How Often Will I Change My Approach?

Rankings need to be responsive to changes in education. Thus, as education changes, so too should the indicators used to represent it. The rise of distance learning and classroom-based technology are examples of developments that have also changed some aspects of how we think about and measure academic quality. Nonetheless, it is useful to maintain a subset of ranking indicators that are constant across years so that users can gain some sense of the degree of stability in relative institutional quality over time.

As a final consideration, it should be remembered that transparency is essential to the success of any ranking system. Thus, the openness of the process in terms of how the indicators were chosen, the approach taken to present this information in ranked format, and access to the original data should always be maintained.

REFERENCES

CAMILLI, G., and FIRESTONE, W. A. “Values and State Ratings: An Examination of the State-by-State

Education Indicators in Quality Counts”, Educational Measurement: Issues and Practice 18 4

(2000): 17–25.

CANTOR, G. “Universities Increasingly Question the Criteria of College Rankings”,The Detroit News (1 December 1996): 3.

CASPER, G. “Letter to the Editor of US News and World Report” (23 September 1996), retrieved

17 February 1999 from ⬍http://www-portfolio.stanford.edu:8050/documents/president/961206gc

fallow.html⬎.

CLARKE, M. “Quantifying Quality: What Can theUS News and World ReportRankings Tell Us about

the Quality of Higher Education?”, Education Policy Analysis Archives10 16 (2002a), ⬍http://

epaa.asu.edu/epaa/v10n16/⬎.

CLARKE, M. “News or Noise: An Analysis of US News and World Report’s Ranking Scores”,

Educational Measurement: Issues and Practice 21 4 (2002b): 39–48.

EFRON, B., and TIBSHIRANI, R. J.An Introduction to the Bootstrap. New York: Chapman and Hall,

1993.

Evaluation News 2 1 (February 1981): 85–90.

GARRETT, G. “Our Method Explained”, US News and World Report Best Graduate Schools 2003 Edition. Washington, D.C.: US News and World Report, 2002, pp. 34, 35.

(17)

GLASS, G. V., and HOPKINS, K. D.Statistical Methods in Education and Psychology, 3rdedn. Needham Heights, Massachusetts: Simon & Schuster, 1996.

HATTENDORF, L. C., ed.Educational Rankings Annual. Detroit: Gale Research, 1993.

LINN, R., ed.Educational Measurement, 3rd edn. Washington, D.C.: American Council on Education,

1993.

LOMBARDI, J., CRAIG, D., CAPALDI, E., GATER, D., and MENDON, S. The Top American Research

Universities. Gainesville, Florida: The Center, University of Florida, 2001.

MORSE, R. J., and FLANIGAN, S. M. “How We Rank Schools”,US News and World Report America’s

Best Colleges 2002 Edition. Washington, D.C.: US News and World Report, 2001, pp. 67–70. SANOFF, A. “Rankings Are Here to Stay: Colleges Can Improve Them”, Chronicle of Higher

Education45 2 (4 September 1998): A96.

SCRIVEN, M.Evaluation Thesaurus, 4thedn. Newbury Park: Sage, 1991.

SEAMAN, B. “What Makes a Good College”, Time152 17 (26 October 1998): 1–2.

WEBSTER, D. S. Academic Quality Rankings of American Colleges and Universities. Springfield: Charles C. Thomas, 1986.

(18)

References

Related documents

National Conference on Technical Vocational Education, Training and Skills Development: A Roadmap for Empowerment (Dec. 2008): Ministry of Human Resource Development, Department

We have audited the consolidated fi nancial statements prepared by DO Deutsche Offi ce AG (formerly: Prime Offi ce AG), Cologne, comprising the Consolidated Statement of Income,

To solve the problems existing lease accounting standards, IASB and the FASB initiated a joint project to develop a new approach to lease accounting, and issued the exposure

Goldfish care Planning your aquarium 4-5 Aquarium 6-7 Equipment 8-11 Decorating the aquarium 12-15 Getting started 16-17 Adding fish to the aquarium 18-19 Choosing and

Minors who do not have a valid driver’s license which allows them to operate a motorized vehicle in the state in which they reside will not be permitted to operate a motorized

An analysis of the economic contribution of the software industry examined the effect of software activity on the Lebanese economy by measuring it in terms of output and value

Topic Number of reactions per topic Water quality 50 Focus farms 74 Nitrate residue 51 Nitrogen fertilisation standards 84 Phosphorus fertilisation standards 217

This work is protected against unauthorized copying under Title 17, United States C o d e M icroform Edition © ProQuest LLC.. ProQuest