|
This review has been accessed times since October 21, 2004
Zwick, R. (Ed.). (2004). Rethinking the SAT: The
Future of Standardized Testing in University Admissions. N.
Y: RoutledgeFalmer.
Pp. xxi + 367
$90 (Cloth) $25 (Paper)
ISBN 0-415-94835-5.
Reviewed by Ethan Arenson
University of California, Santa Cruz
October 21, 2004
As is the case with nearly all colleges and universities, more
high school seniors apply to postsecondary institutions than
there are spaces available. Not only is it necessary to have a
systematic decision-making process regarding which students are
offered admission, but such a process needs to be equitable with
respect to subgroups, such as ethnicity, gender, and
socioeconomic status. Standardized test scores, like the SAT and
ACT, are key inputs in this process. Numerous studies have shown,
however, that standardized tests are not impartial among these
subgroups. How can colleges incorporate these imperfect
indicators to admit students who are most likely to succeed and,
simultaneously, ensure that they admit students equitably with
respect to these subgroups?
To answer this question, the University of California (UC)
hosted a conference that addressed the role of standardized tests
in the college admissions process. Rethinking the SAT is a
collection of papers presented at this conference. Among the
attendees were people from ACT, the Educational Testing Service
(ETS), the College Board, and UC. For this volume, Zwick’s
goal was to produce a collection of ideas, from the conference,
that would be accessible to people from a wide variety of
backgrounds: education administrators, teachers, and counselors;
parents; members of the press; and policymakers.
Zwick organizes the presentations from this conference into
four parts:
- College admissions testing in the United States.
- Admissions testing, as it affects the UC system.
- The relationship between admissions test scores, race, and
class.
- The predictive value of admissions tests.
The remainder of this review consists of key points from each
of the four parts of the book, followed by my overall impression
of the text.
College Admissions Testing in the United
States
Nicholas Lemann, who documented the history of the SAT in his
book The Big Test: The Secret History of the American
Meritocracy, writes the first chapter. Lemann discusses the
origins of the SAT as an aptitude test that James Bryant Conant,
who was president of Harvard University in the early 1900s, hoped
would allow more capable students to attend college, irrespective
of their academic experiences. This chapter launches an
“achievement versus aptitude” argument that is common
to most of the chapters in this book, and is reflected in Richard
Atkinson’s chapter, “Achievement Versus Aptitude in
College Admissions.” The essence of the argument is that
scores from achievement tests, such as the SAT II: Subject Tests,
are a better predictor of academic success (typically defined as
grade-point average in the freshman year of college) than are
aptitude test scores.
This observation motivated, at least in part, Atkinson’s
decision to question the “conventional wisdom” of the
SAT I’s ability to correct for grade inflation and the
variability in the quality of education in American high schools
(p. 15), and recommend that UC eliminate this test as an
admission requirement. An important point deserves mention here:
Atkinson’s recommendation does not represent, as critics
have mistakenly interpreted, a position against standardized
testing (p. 16).
Richard Ferguson, chief executive officer of ACT, presents the
ACT as an alternative assessment that could be augmented to meet
UC’s needs. The ACT is an achievement test that covers
“the academic knowledge and skills typically taught in high
school. . . and required for success in the first year of
college” (p. 26). The ACT consists of four tests: English,
mathematics, reading, and science. High school and college
faculty, in a national curriculum study that ACT conducts every
three years, determine the content and skills that the ACT
measures.
Howard Everson, vice-president of the College Board, discusses
some of the research and development activities underway for
future versions of the SAT. One major change, which ETS’s
Ida Lawrence mentions in her chapter with colleagues Tom Van
Essen and Carol Jackson, along with former College Board
vice-president Gretchen Rigol, is the inclusion of a writing test
for 2005. The writing test will consist of multiple-choice
questions on grammar and usage, as well as a student essay in
which students will be asked to argue their position on a
particular issue, and support their position with reasons and
evidence. Current research for the future SATs includes
developing methods to measure cognitive abilities, learning
skills, and multiple intelligences.
David Lohman argues in his chapter that aptitude is a
multi-faceted construct. He describes the common notion of
aptitude as “reasoning,” and points out that
achievement is, in fact, an “important aptitude for future
learning” (p. 54). The SAT I over the years has become more
“achievement like.” Lohman asserts that “the
problem with the current SAT I [with analogy and sentence
completion items removed] may not be that it is an aptitude test,
but that it is not enough of an aptitude test” (p. 50). In
essence, some measure of aptitude deserves consideration to help
identify “diamonds in the rough:” students who may
not have scored well on achievement tests, yet may be able to
succeed in college.
In his commentary on the first section of this book, Michael
Kirst points out that a major flaw in the college admissions
process is the disconnectedness between the elementary,
secondary, and postsecondary levels of education. The lack of a
K-16 system of standards and assessments has resulted in the
development of K-12 standards that fail to meet, or even
contradict, the expectations that colleges have of incoming
freshmen: While the emphasis in higher education is an
“upward trajectory” of pupils, the primary concern at
the secondary level is the satisfaction of state and federal
standards (p. 94).
Kirst also identifies “senioritis” as a side
effect of the lack of a K-16 system. Because high school students
submit college applications in the fall of their senior year,
admissions decisions are based on grades from their first three
years of secondary school. Many high school seniors then choose
to take less challenging courses in their senior year; they have
no incentive to take rigorous courses. Sometimes this results in
certain students actually being underprepared for college. A more
integrated system, Kirst argues, would help ensure that high
school students take appropriate and challenging courses up to,
and including, their senior year.
Kirst closes his commentary by describing some promising
attempts to bridge K-12 standards with postsecondary
expectations. One such project is called Standards for Success
(S4S). S4S has two key goals (http://www.s4s.org):
- To identify what knowledge and skills students need to know
and be able to do in order to succeed in entry-level university
courses.
- To produce an information database on (state) high school
assessments that would improve the relationship between state
standards and university expectations.
Georgia, California, and Illinois also have projects that are
either currently in place or under development.
The Impact of Admissions Testing on the
University of California
In the same vain that UC President
Atkinson’s advocating the elimination of the SAT I made
waves across the nation, the future direction that UC takes
regarding standardized testing will impact the admissions
practices of many colleges nationwide. UC is state-mandated to
offer admission—into the university system, but necessarily
to any particular campus—to the top 12.5% of high school
graduates (p. 108). The dilemma facing UC is that admitting this
top 12.5% group results in inequitable admissions practices with
respect to certain subgroups (e.g., ethnicity and class).
Conversely, an admissions policy that is equitable to these
subgroups would result in the need to admit students below the
top 12.5% of high school graduates.
The chapter by Dorothy Perry, Michael Brown, and
Barbara Sawrey represents UC’s position, developed by the
UC Academic Senate Board on Admissions and Relations with Schools
(BOARS), on the role of standardized testing in the UC admissions
process. At present UC requires one test of language arts and
mathematics and three achievement tests. The language arts and
mathematics test is satisfied by either the ACT or the SAT I. The
achievement tests are satisfied by the SAT II subject tests,
formerly known as the College Entrance Examination Board
Achievement Tests. There are subject tests in 21 different
fields, including language, literature, writing, and several
natural and social science subjects. Students are required to
submit SAT II scores in writing, one of two levels of mathematics
tests, and a third subject of his or her choice. UC combines test
scores with grade point average (GPA) in UC-required courses as a
requirement into the UC system. Presently, this combined score is
a sufficient criterion for admission to three of the nine UC
campuses. The other six campuses employ additional criteria that
vary according to campus, but are consistent with university-wide
guidelines that are approved by the faculty (p. 108).
BOARS justifies the need for admissions test
scores based on results from predictive validity studies. Similar
to the validity studies that appear later in this volume, linear
regression methods are used to predict freshman GPA (FGPA) based
on high school GPA (HSGPA) and test scores. Their findings
suggest that, based on the pool of 77,893 students who applied
and attended UC as freshmen between Fall 1996 and Fall 1999, SAT
II scores predict FGPA at least as well as SAT I scores, and that
“. . . the empirical evidence does not appear so compelling
that it should drive a decision to prefer one type of test to
another” (p. 117).
Using the same data, Saul Geiser and Roger Studley
found the math, writing, and elective SAT II achievement test
scores to be better predictors of FGPA than scores from the SAT
I. In addition, they reached two other significant conclusions.
First, that the SAT II is less sensitive to socioeconomic status
than the SAT I. In particular, when family income and parental
education are kept constant, “. . . the predictive power of
the SAT II is undiminished, whereas the relationship between the
SAT I and UC freshman grades virtually disappears” (p.
150). Second, replacing SAT II scores with those from the SAT I
would have little impact on UC admission rates among different
ethnic groups.
In her commentary, Eva Baker compares the UC
scenario with the A-level exams that students in the United
Kingdom must take for college admission. She points out that,
despite the differences between the admissions processes of both
countries, the goals for the assessments are the same: to use
technically sound assessments that result in fair admissions
decisions. From a technical perspective, the ideal assessment
would allow the university to make fair and appropriate
admissions decisions, while simultaneously connecting university
entrance requirements to content standards in the elementary and
secondary levels. This dual-purpose for admissions tests, if it
can be achieved, would help in establishing the K-16 standards,
which Kirst addresses.
The Relationship Between Admissions Test
Scores, Race, and Class
It is not uncommon for students from disadvantaged
ethnic or socioeconomic backgrounds to be identified as
at-risk (of not completing high school). Across the
country, schools often participate in one of a variety of
“bridge” programs, designed to motivate and prepare
students for college. Among these programs are Upward Bound and
Advancement via Individual Determination (AVID). Patricia
Gándara compares the effectiveness of such programs in
preparing students for postsecondary education. She points out
that these programs have little, if any, effect on academic
achievement. In part, this is due to the fact that such programs
are supplementary to the K-12 school system, and that if these
programs are to have any substantial impact, they need to change
how K-12 schools interact with students: “. . .
[S]uccessful programs work to emulate the features of good high
schools. . . , but they only do it for part of the day, and often
outside of school time” (p. 184).
The chapter by Amy Elizabeth Schmidt and Wayne
Camara compares group differences in standardized test scores
with other educational measures. Among their findings is the
presence of “racial gaps,” not only with standardized
test scores, but also with high school and college grades, as
well as with college graduation rates. They point out that the
persistence of and reasons for these gaps deserve further study.
Possible reasons include the quality of schools in minority
neighborhoods, poverty, parenting practices, and family
background.
Rebecca Zwick’s chapter looks at
standardized tests from a different perspective, by addressing
whether the SAT a “wealth test.” She examines
closely two hypotheses, frequently offered by testing critics,
that explain the association between family income and
standardized test scores. The first hypothesis is that
middle-class students are, in a sense, conditioned to do well on
standardized tests. The second one is that test takers from
wealthier families are more likely to receive coaching (such as
test preparation classes), and as a result score better than do
test takers from lower- and middle-class families. Zwick’s
analyses, which included scores from the SAT I, SAT II, ACT, the
National Assessment of Educational Progress, and the California
High School Exit Exam, suggest that differences by socioeconomic
status are consistent across all of these measures: “So, is
the SAT a wealth test? Only in the sense that every measure of
previous educational achievement is a wealth test. . . . And
contrary to what is often believed, grades and course completion,
like test scores, typically show substantial disparities among
socioeconomic groups” (p. 213). In short, the data do not
support these claims.
In his chapter, Derek Briggs evaluates the effect
of coaching on SAT I scores. Contrary to the claims that test
preparation companies boast, Briggs’s findings suggest that
the gains from commercial coaching programs are between 3 and 20
points for the verbal section, and between 10 and 28 points for
the mathematics section. In both cases, this gain is less than
one standard deviation—slight, but not as large as these
programs claim.
In his commentary on this section, Michael
Martinez identifies two central themes in these four papers.
First is that the achievement gap among ethnic and socioeconomic
groups is multifaceted, and cannot be resolved with simple policy
solutions. Second, this achievement gap is consistent over many
achievement measures. He suggests exploring factors that enhance
learning outcomes. By factors, he refers to school quality and
home environment, each of which is complex. He also suggests a
broadening of the construct validity consideration of
standardized tests to include cognitive as well as affective
qualities: “A complete theory requires an understanding of
the relevant repertoire of dispositions that make for success in
admission to college, and success in the university and
beyond” (p. 242).
The Predictive Value of Admissions Tests
How well do standardized test scores predict
academic success beyond high school? The typical approach in
which this question is answered is to perform linear regression
analyses with FGPA as the dependent variable and with
standardized test scores among the predictors. Among their
findings, Jennifer Kobrin, Camara, and Glen Milewski report that
the differences in predictive validity of SAT II test scores
relative to scores from the SAT I vary among ethnic groups.
In their chapter, Brent Bridgeman, Nancy Burton,
and Frederick Cline conclude that substituting SAT II test scores
in place of those from the SAT I would result in a threefold
impact: effects on the diversity within institutions, effects on
the fairness of admission policies, and effects on pre-college
instruction. Under this substitution, for Latino students who
took the SAT II Spanish test, larger numbers of Latino students
would be admitted. While the percentage of incoming Latino
freshmen would not change, “from the perspective of the
Latino applicants, the change would be more dramatic, noticeably
increasing their likelihood of admission” (p. 286). A
second argument in favor of permitting SAT II Spanish test scores
is that any advantage from these scores would offset other
disadvantages that would be present in the case that SAT II
Spanish test scores were not permitted. Finally, they point out
that use of SAT II scores in place of SAT I scores could show
promise: “Requiring subject tests for admission could lead
students to seek more legitimate instruction in the specific
subject fields, but it remains to be seen whether these hopes
actually would be realized” (p. 287).
John Young examines the relationship of
differential validity with respect to race and gender with
prediction. He advocates a holistic admissions process rather
than results based on traditional predictors, which vary across
races. He also points out that one weakness in the use of FGPA as
a criterion is the fact that this measure is not necessarily
based on the same set of courses: “Because men and women
generally differ in the courses in which they enroll, primarily
because of the requirements for different majors, the courses
that make up FGPA and cumulative GPA differ by sex” (p.
297).
Rather than consider SAT scores, Julie Noble in
her chapter looks at admissions decisions for varying ethnic
groups when ACT composite scores are combined with HSGPAs. In
addition to investigating differential prediction, she also
investigates the differential effects on the probability of
academic success, defined as obtaining an FGPA of 2.5 or higher.
She concludes that relying on academic measures alone is likely
to deny access to capable students from minority backgrounds. She
cautions, however, that a process that relies does not rely
enough on standardized test scores would most likely admit
students who are not academically prepared for university-level
work.
In his chapter, Roger Studley recommends an
admissions policy based on a student’s potential
achievement, rather than achievement realized through
observed indicators. He defines potential achievement as
“the maximum achievement he could attain if circumstances
were optimal” (p. 248). Potential achievement is based on a
statistical adjustment to SAT scores and HSGPA that corrects for
other factors such as socioeconomic status (e.g., family income
and education, home zip code, and high school attended) and other
effects of circumstance. He offers a systematic admissions
process as an alternative to the subjective and ad hoc
considerations of student circumstance that colleges and
universities generally implement. While such a system reduces
ethnic disparities, he cautions that this process does not
eliminate these differences.
Christina Perez uses the common criticisms of
standardized test scores—such as weak predictive power and
little information on long-term success in college—to
address the appropriateness of standardized tests. She argues
that, as some colleges and universities have already done, all
institutions should drop test score requirements.
In his commentary, Robert Linn summarizes these
contributions by saying that while results from these studies are
consistent with those from previous results, both the context
underlying these results, as well as their implications, have
changed. The two major changes stimulating these studies are the
elimination of affirmative action policies in certain states, and
President Atkinson’s proposal to replace SAT I scores with
those from the SAT II subject tests. It is also worth mentioning
that he finds Studley’s measure of potential achievement
deserving of further exploration.
Discussion
In this volume Zwick presents a collection of
chapters that is readable to people from a wide range of
backgrounds. On the back cover of the text, Mark Reckase is
quoted as saying “This book should be required reading for
all college and university admissions staff.” I agree with
Reckase on this. I would also put this book on a list of required
readings for graduate students in sociology and in education.
This book documents the beginning of an important transition in
the history of higher education and in educational testing. In
this volume, Everson describes how technology can help us create
what Randy Bennett describes as “intelligent
assessment:” an integration of constructed-response
testing, artificial intelligence, and model-based measurement (p.
81). The development of psychometric methods in cognitive
assessment (Nichols, Chipman, & Brennan, 1995) is leading us
towards the development of tests that not only provide summative
information (as standardized tests presently do), but also
towards the development of tests that are diagnostic in nature
and can improve learning outcomes.
I have three comments regarding the methodology of
the studies in this volume. In all of the studies, linear
regression analyses were conducted in which test scores and HSGPA
were used covariates to predict FGPA. The primary purpose of
admissions criteria is to select students who are most likely to
succeed in college. Is FGPA an appropriate measure of success in
college? As some of the studies indicate, whether students
graduate from college should be the criterion, but FGPA is used
as a proxy because (1) student graduation data are not readily
available, and (2) FGPA is the criterion that nearly all other
studies have used. Rather than assuming that FGPA is a proxy for
graduating from college, I think that efforts need to be made to
incorporate student graduation indicators into existing databases
so that this relationship, between these covariates and whether
students graduate from college, can be properly explored.
My second comment regards the underprediction and
overprediction results observed in several of these studies. The
idea that predicted FGPA underestimates some subgroups (and
overestimates others) based on data from an entire population
should come as no surprise. The comparison of predicted FGPAs for
subgroups is based on the assumption that these estimates are
appropriate estimates of the variability (i.e., the distribution)
of FGPA. Mislevy (1983) as well as Lord (1969) demonstrate this
point in the context of ability estimation. It seems reasonable
to apply their results to a student’s potential of
succeeding in college, for this potential is also a latent
trait.
My final comment pertains to the process in which
the regression estimates are obtained. In each of these studies
the authors obtained their estimates based on the entire sample.
A major flaw in whole-sample regression is that the sample data
are used twice: once to compute the estimates, and a second time
to compute how well the regression fits the data. A better
approach is to average results from repeated random half-samples,
in which half of the sample is used to compute regression
estimates, and the second half is used to estimate the how well
the regression model fits the data (Arenson, 1999; McLaughlin,
Bandeira de Mello, Cole, & Arenson, 2000). Related to random
half-sample validation is three-fold cross-validation, which uses
repeated random third-samples: one to compute regression
estimates, one to fine-tune the regression model, and one to
estimate model fit (Draper & Krnjajic, 2004).
References
Arenson, E. A. (March, 1999). Statistical linkages between
state educational assessments and the National Assessment of
Educational Progress. Paper presented at the Annual Meeting of
the Sacramento Statistical Association (Sacramento, CA, March 31,
1999).
Draper, D. & Krnjajic, M. (2004). 3-fold cross-validatoin
as an approach to Bayesian model selection. In preparation.
Lord, F. M. (1969). Estimating true score distributions in
psychological testing (An empirical Bayes problem).
Psychometrika, 34, 259-299.
McLauglhin, D., Bandeira de Mello, V., Cole, S., &
Arenson, E. A. (April, 2000). Comparison of National Assessment
of Educational Progress (NAEP) and Statewide Assessment Results:
Report to Maryland on 1996 and 1998 Assessments. Palo Alto, CA:
American Institutes for Research.
Mislevy, R. J. (1993). Some formulas for use with Bayesian
ability estimates. Educational and Psychological Measurement,
53, 315-328.
Nichols, P. D., Chipman, S., & Brennan, R. (Eds.) (1995).
Cognitively diagnostic assessment. Hillsdale, NJ:
Erlbaum.
About the Reviewer
Ethan Arensonis a graduate doctoral student in
statistics at the University of California, Santa Cruz. His
interests are in statistical education and educational testing.
Prior to graduate school, he was a high school mathematics
teacher, and also worked as an associate scientist for
CTB/McGraw-Hill.
~
ER home |
Reseņas Educativas |
Resenhas Educativas ~
~
overview | reviews | editors | submit | guidelines | announcements
~
| |