This review has been accessed times since October 21, 2004

Zwick, R. (Ed.). (2004). Rethinking the SAT: The Future of Standardized Testing in University Admissions. N. Y: RoutledgeFalmer.

Pp. xxi + 367
$90 (Cloth)     $25 (Paper)     ISBN 0-415-94835-5.

Reviewed by Ethan Arenson
University of California, Santa Cruz

October 21, 2004

As is the case with nearly all colleges and universities, more high school seniors apply to postsecondary institutions than there are spaces available. Not only is it necessary to have a systematic decision-making process regarding which students are offered admission, but such a process needs to be equitable with respect to subgroups, such as ethnicity, gender, and socioeconomic status. Standardized test scores, like the SAT and ACT, are key inputs in this process. Numerous studies have shown, however, that standardized tests are not impartial among these subgroups. How can colleges incorporate these imperfect indicators to admit students who are most likely to succeed and, simultaneously, ensure that they admit students equitably with respect to these subgroups?

To answer this question, the University of California (UC) hosted a conference that addressed the role of standardized tests in the college admissions process. Rethinking the SAT is a collection of papers presented at this conference. Among the attendees were people from ACT, the Educational Testing Service (ETS), the College Board, and UC. For this volume, Zwick’s goal was to produce a collection of ideas, from the conference, that would be accessible to people from a wide variety of backgrounds: education administrators, teachers, and counselors; parents; members of the press; and policymakers.

Zwick organizes the presentations from this conference into four parts:

  • College admissions testing in the United States.
  • Admissions testing, as it affects the UC system.
  • The relationship between admissions test scores, race, and class.
  • The predictive value of admissions tests.

The remainder of this review consists of key points from each of the four parts of the book, followed by my overall impression of the text.

College Admissions Testing in the United States

Nicholas Lemann, who documented the history of the SAT in his book The Big Test: The Secret History of the American Meritocracy, writes the first chapter. Lemann discusses the origins of the SAT as an aptitude test that James Bryant Conant, who was president of Harvard University in the early 1900s, hoped would allow more capable students to attend college, irrespective of their academic experiences. This chapter launches an “achievement versus aptitude” argument that is common to most of the chapters in this book, and is reflected in Richard Atkinson’s chapter, “Achievement Versus Aptitude in College Admissions.” The essence of the argument is that scores from achievement tests, such as the SAT II: Subject Tests, are a better predictor of academic success (typically defined as grade-point average in the freshman year of college) than are aptitude test scores.

This observation motivated, at least in part, Atkinson’s decision to question the “conventional wisdom” of the SAT I’s ability to correct for grade inflation and the variability in the quality of education in American high schools (p. 15), and recommend that UC eliminate this test as an admission requirement. An important point deserves mention here: Atkinson’s recommendation does not represent, as critics have mistakenly interpreted, a position against standardized testing (p. 16).

Richard Ferguson, chief executive officer of ACT, presents the ACT as an alternative assessment that could be augmented to meet UC’s needs. The ACT is an achievement test that covers “the academic knowledge and skills typically taught in high school. . . and required for success in the first year of college” (p. 26). The ACT consists of four tests: English, mathematics, reading, and science. High school and college faculty, in a national curriculum study that ACT conducts every three years, determine the content and skills that the ACT measures.

Howard Everson, vice-president of the College Board, discusses some of the research and development activities underway for future versions of the SAT. One major change, which ETS’s Ida Lawrence mentions in her chapter with colleagues Tom Van Essen and Carol Jackson, along with former College Board vice-president Gretchen Rigol, is the inclusion of a writing test for 2005. The writing test will consist of multiple-choice questions on grammar and usage, as well as a student essay in which students will be asked to argue their position on a particular issue, and support their position with reasons and evidence. Current research for the future SATs includes developing methods to measure cognitive abilities, learning skills, and multiple intelligences.

David Lohman argues in his chapter that aptitude is a multi-faceted construct. He describes the common notion of aptitude as “reasoning,” and points out that achievement is, in fact, an “important aptitude for future learning” (p. 54). The SAT I over the years has become more “achievement like.” Lohman asserts that “the problem with the current SAT I [with analogy and sentence completion items removed] may not be that it is an aptitude test, but that it is not enough of an aptitude test” (p. 50). In essence, some measure of aptitude deserves consideration to help identify “diamonds in the rough:” students who may not have scored well on achievement tests, yet may be able to succeed in college.

In his commentary on the first section of this book, Michael Kirst points out that a major flaw in the college admissions process is the disconnectedness between the elementary, secondary, and postsecondary levels of education. The lack of a K-16 system of standards and assessments has resulted in the development of K-12 standards that fail to meet, or even contradict, the expectations that colleges have of incoming freshmen: While the emphasis in higher education is an “upward trajectory” of pupils, the primary concern at the secondary level is the satisfaction of state and federal standards (p. 94).

Kirst also identifies “senioritis” as a side effect of the lack of a K-16 system. Because high school students submit college applications in the fall of their senior year, admissions decisions are based on grades from their first three years of secondary school. Many high school seniors then choose to take less challenging courses in their senior year; they have no incentive to take rigorous courses. Sometimes this results in certain students actually being underprepared for college. A more integrated system, Kirst argues, would help ensure that high school students take appropriate and challenging courses up to, and including, their senior year.

Kirst closes his commentary by describing some promising attempts to bridge K-12 standards with postsecondary expectations. One such project is called Standards for Success (S4S). S4S has two key goals (http://www.s4s.org):

  • To identify what knowledge and skills students need to know and be able to do in order to succeed in entry-level university courses.
  • To produce an information database on (state) high school assessments that would improve the relationship between state standards and university expectations.

Georgia, California, and Illinois also have projects that are either currently in place or under development.

The Impact of Admissions Testing on the University of California

In the same vain that UC President Atkinson’s advocating the elimination of the SAT I made waves across the nation, the future direction that UC takes regarding standardized testing will impact the admissions practices of many colleges nationwide. UC is state-mandated to offer admission—into the university system, but necessarily to any particular campus—to the top 12.5% of high school graduates (p. 108). The dilemma facing UC is that admitting this top 12.5% group results in inequitable admissions practices with respect to certain subgroups (e.g., ethnicity and class). Conversely, an admissions policy that is equitable to these subgroups would result in the need to admit students below the top 12.5% of high school graduates.

The chapter by Dorothy Perry, Michael Brown, and Barbara Sawrey represents UC’s position, developed by the UC Academic Senate Board on Admissions and Relations with Schools (BOARS), on the role of standardized testing in the UC admissions process. At present UC requires one test of language arts and mathematics and three achievement tests. The language arts and mathematics test is satisfied by either the ACT or the SAT I. The achievement tests are satisfied by the SAT II subject tests, formerly known as the College Entrance Examination Board Achievement Tests. There are subject tests in 21 different fields, including language, literature, writing, and several natural and social science subjects. Students are required to submit SAT II scores in writing, one of two levels of mathematics tests, and a third subject of his or her choice. UC combines test scores with grade point average (GPA) in UC-required courses as a requirement into the UC system. Presently, this combined score is a sufficient criterion for admission to three of the nine UC campuses. The other six campuses employ additional criteria that vary according to campus, but are consistent with university-wide guidelines that are approved by the faculty (p. 108).

BOARS justifies the need for admissions test scores based on results from predictive validity studies. Similar to the validity studies that appear later in this volume, linear regression methods are used to predict freshman GPA (FGPA) based on high school GPA (HSGPA) and test scores. Their findings suggest that, based on the pool of 77,893 students who applied and attended UC as freshmen between Fall 1996 and Fall 1999, SAT II scores predict FGPA at least as well as SAT I scores, and that “. . . the empirical evidence does not appear so compelling that it should drive a decision to prefer one type of test to another” (p. 117).

Using the same data, Saul Geiser and Roger Studley found the math, writing, and elective SAT II achievement test scores to be better predictors of FGPA than scores from the SAT I. In addition, they reached two other significant conclusions. First, that the SAT II is less sensitive to socioeconomic status than the SAT I. In particular, when family income and parental education are kept constant, “. . . the predictive power of the SAT II is undiminished, whereas the relationship between the SAT I and UC freshman grades virtually disappears” (p. 150). Second, replacing SAT II scores with those from the SAT I would have little impact on UC admission rates among different ethnic groups.

In her commentary, Eva Baker compares the UC scenario with the A-level exams that students in the United Kingdom must take for college admission. She points out that, despite the differences between the admissions processes of both countries, the goals for the assessments are the same: to use technically sound assessments that result in fair admissions decisions. From a technical perspective, the ideal assessment would allow the university to make fair and appropriate admissions decisions, while simultaneously connecting university entrance requirements to content standards in the elementary and secondary levels. This dual-purpose for admissions tests, if it can be achieved, would help in establishing the K-16 standards, which Kirst addresses.

The Relationship Between Admissions Test Scores, Race, and Class

It is not uncommon for students from disadvantaged ethnic or socioeconomic backgrounds to be identified as at-risk (of not completing high school). Across the country, schools often participate in one of a variety of “bridge” programs, designed to motivate and prepare students for college. Among these programs are Upward Bound and Advancement via Individual Determination (AVID). Patricia Gándara compares the effectiveness of such programs in preparing students for postsecondary education. She points out that these programs have little, if any, effect on academic achievement. In part, this is due to the fact that such programs are supplementary to the K-12 school system, and that if these programs are to have any substantial impact, they need to change how K-12 schools interact with students: “. . . [S]uccessful programs work to emulate the features of good high schools. . . , but they only do it for part of the day, and often outside of school time” (p. 184).

The chapter by Amy Elizabeth Schmidt and Wayne Camara compares group differences in standardized test scores with other educational measures. Among their findings is the presence of “racial gaps,” not only with standardized test scores, but also with high school and college grades, as well as with college graduation rates. They point out that the persistence of and reasons for these gaps deserve further study. Possible reasons include the quality of schools in minority neighborhoods, poverty, parenting practices, and family background.

Rebecca Zwick’s chapter looks at standardized tests from a different perspective, by addressing whether the SAT a “wealth test.” She examines closely two hypotheses, frequently offered by testing critics, that explain the association between family income and standardized test scores. The first hypothesis is that middle-class students are, in a sense, conditioned to do well on standardized tests. The second one is that test takers from wealthier families are more likely to receive coaching (such as test preparation classes), and as a result score better than do test takers from lower- and middle-class families. Zwick’s analyses, which included scores from the SAT I, SAT II, ACT, the National Assessment of Educational Progress, and the California High School Exit Exam, suggest that differences by socioeconomic status are consistent across all of these measures: “So, is the SAT a wealth test? Only in the sense that every measure of previous educational achievement is a wealth test. . . . And contrary to what is often believed, grades and course completion, like test scores, typically show substantial disparities among socioeconomic groups” (p. 213). In short, the data do not support these claims.

In his chapter, Derek Briggs evaluates the effect of coaching on SAT I scores. Contrary to the claims that test preparation companies boast, Briggs’s findings suggest that the gains from commercial coaching programs are between 3 and 20 points for the verbal section, and between 10 and 28 points for the mathematics section. In both cases, this gain is less than one standard deviation—slight, but not as large as these programs claim.

In his commentary on this section, Michael Martinez identifies two central themes in these four papers. First is that the achievement gap among ethnic and socioeconomic groups is multifaceted, and cannot be resolved with simple policy solutions. Second, this achievement gap is consistent over many achievement measures. He suggests exploring factors that enhance learning outcomes. By factors, he refers to school quality and home environment, each of which is complex. He also suggests a broadening of the construct validity consideration of standardized tests to include cognitive as well as affective qualities: “A complete theory requires an understanding of the relevant repertoire of dispositions that make for success in admission to college, and success in the university and beyond” (p. 242).

The Predictive Value of Admissions Tests

How well do standardized test scores predict academic success beyond high school? The typical approach in which this question is answered is to perform linear regression analyses with FGPA as the dependent variable and with standardized test scores among the predictors. Among their findings, Jennifer Kobrin, Camara, and Glen Milewski report that the differences in predictive validity of SAT II test scores relative to scores from the SAT I vary among ethnic groups.

In their chapter, Brent Bridgeman, Nancy Burton, and Frederick Cline conclude that substituting SAT II test scores in place of those from the SAT I would result in a threefold impact: effects on the diversity within institutions, effects on the fairness of admission policies, and effects on pre-college instruction. Under this substitution, for Latino students who took the SAT II Spanish test, larger numbers of Latino students would be admitted. While the percentage of incoming Latino freshmen would not change, “from the perspective of the Latino applicants, the change would be more dramatic, noticeably increasing their likelihood of admission” (p. 286). A second argument in favor of permitting SAT II Spanish test scores is that any advantage from these scores would offset other disadvantages that would be present in the case that SAT II Spanish test scores were not permitted. Finally, they point out that use of SAT II scores in place of SAT I scores could show promise: “Requiring subject tests for admission could lead students to seek more legitimate instruction in the specific subject fields, but it remains to be seen whether these hopes actually would be realized” (p. 287).

John Young examines the relationship of differential validity with respect to race and gender with prediction. He advocates a holistic admissions process rather than results based on traditional predictors, which vary across races. He also points out that one weakness in the use of FGPA as a criterion is the fact that this measure is not necessarily based on the same set of courses: “Because men and women generally differ in the courses in which they enroll, primarily because of the requirements for different majors, the courses that make up FGPA and cumulative GPA differ by sex” (p. 297).

Rather than consider SAT scores, Julie Noble in her chapter looks at admissions decisions for varying ethnic groups when ACT composite scores are combined with HSGPAs. In addition to investigating differential prediction, she also investigates the differential effects on the probability of academic success, defined as obtaining an FGPA of 2.5 or higher. She concludes that relying on academic measures alone is likely to deny access to capable students from minority backgrounds. She cautions, however, that a process that relies does not rely enough on standardized test scores would most likely admit students who are not academically prepared for university-level work.

In his chapter, Roger Studley recommends an admissions policy based on a student’s potential achievement, rather than achievement realized through observed indicators. He defines potential achievement as “the maximum achievement he could attain if circumstances were optimal” (p. 248). Potential achievement is based on a statistical adjustment to SAT scores and HSGPA that corrects for other factors such as socioeconomic status (e.g., family income and education, home zip code, and high school attended) and other effects of circumstance. He offers a systematic admissions process as an alternative to the subjective and ad hoc considerations of student circumstance that colleges and universities generally implement. While such a system reduces ethnic disparities, he cautions that this process does not eliminate these differences.

Christina Perez uses the common criticisms of standardized test scores—such as weak predictive power and little information on long-term success in college—to address the appropriateness of standardized tests. She argues that, as some colleges and universities have already done, all institutions should drop test score requirements.

In his commentary, Robert Linn summarizes these contributions by saying that while results from these studies are consistent with those from previous results, both the context underlying these results, as well as their implications, have changed. The two major changes stimulating these studies are the elimination of affirmative action policies in certain states, and President Atkinson’s proposal to replace SAT I scores with those from the SAT II subject tests. It is also worth mentioning that he finds Studley’s measure of potential achievement deserving of further exploration.

Discussion

In this volume Zwick presents a collection of chapters that is readable to people from a wide range of backgrounds. On the back cover of the text, Mark Reckase is quoted as saying “This book should be required reading for all college and university admissions staff.” I agree with Reckase on this. I would also put this book on a list of required readings for graduate students in sociology and in education. This book documents the beginning of an important transition in the history of higher education and in educational testing. In this volume, Everson describes how technology can help us create what Randy Bennett describes as “intelligent assessment:” an integration of constructed-response testing, artificial intelligence, and model-based measurement (p. 81). The development of psychometric methods in cognitive assessment (Nichols, Chipman, & Brennan, 1995) is leading us towards the development of tests that not only provide summative information (as standardized tests presently do), but also towards the development of tests that are diagnostic in nature and can improve learning outcomes.

I have three comments regarding the methodology of the studies in this volume. In all of the studies, linear regression analyses were conducted in which test scores and HSGPA were used covariates to predict FGPA. The primary purpose of admissions criteria is to select students who are most likely to succeed in college. Is FGPA an appropriate measure of success in college? As some of the studies indicate, whether students graduate from college should be the criterion, but FGPA is used as a proxy because (1) student graduation data are not readily available, and (2) FGPA is the criterion that nearly all other studies have used. Rather than assuming that FGPA is a proxy for graduating from college, I think that efforts need to be made to incorporate student graduation indicators into existing databases so that this relationship, between these covariates and whether students graduate from college, can be properly explored.

My second comment regards the underprediction and overprediction results observed in several of these studies. The idea that predicted FGPA underestimates some subgroups (and overestimates others) based on data from an entire population should come as no surprise. The comparison of predicted FGPAs for subgroups is based on the assumption that these estimates are appropriate estimates of the variability (i.e., the distribution) of FGPA. Mislevy (1983) as well as Lord (1969) demonstrate this point in the context of ability estimation. It seems reasonable to apply their results to a student’s potential of succeeding in college, for this potential is also a latent trait.

My final comment pertains to the process in which the regression estimates are obtained. In each of these studies the authors obtained their estimates based on the entire sample. A major flaw in whole-sample regression is that the sample data are used twice: once to compute the estimates, and a second time to compute how well the regression fits the data. A better approach is to average results from repeated random half-samples, in which half of the sample is used to compute regression estimates, and the second half is used to estimate the how well the regression model fits the data (Arenson, 1999; McLaughlin, Bandeira de Mello, Cole, & Arenson, 2000). Related to random half-sample validation is three-fold cross-validation, which uses repeated random third-samples: one to compute regression estimates, one to fine-tune the regression model, and one to estimate model fit (Draper & Krnjajic, 2004).

References

Arenson, E. A. (March, 1999). Statistical linkages between state educational assessments and the National Assessment of Educational Progress. Paper presented at the Annual Meeting of the Sacramento Statistical Association (Sacramento, CA, March 31, 1999).

Draper, D. & Krnjajic, M. (2004). 3-fold cross-validatoin as an approach to Bayesian model selection. In preparation.

Lord, F. M. (1969). Estimating true score distributions in psychological testing (An empirical Bayes problem). Psychometrika, 34, 259-299.

McLauglhin, D., Bandeira de Mello, V., Cole, S., & Arenson, E. A. (April, 2000). Comparison of National Assessment of Educational Progress (NAEP) and Statewide Assessment Results: Report to Maryland on 1996 and 1998 Assessments. Palo Alto, CA: American Institutes for Research.

Mislevy, R. J. (1993). Some formulas for use with Bayesian ability estimates. Educational and Psychological Measurement, 53, 315-328.

Nichols, P. D., Chipman, S., & Brennan, R. (Eds.) (1995). Cognitively diagnostic assessment. Hillsdale, NJ: Erlbaum.

About the Reviewer

Ethan Arensonis a graduate doctoral student in statistics at the University of California, Santa Cruz. His interests are in statistical education and educational testing. Prior to graduate school, he was a high school mathematics teacher, and also worked as an associate scientist for CTB/McGraw-Hill.

~ ER home | Reseņas Educativas | Resenhas Educativas ~
~ overview | reviews | editors | submit | guidelines | announcements ~