by James J. Hennessy and Jeanna Pagnotta, Fordham University
The use of test scores in making major decisions about admissions to teacher preparation programs, eligibility for certification, and eventual retention in teaching positions is the focus of great debate and controversy in the media, in the courts, in state education agencies, and in congress and the federal Department of Education. The debates generally are not directly about the need for testing, rather the concerns are focused on how much weight to give to scores in the decision-making process. The underlying assumption is that the tests are measuring important ‘factors” that have evidence-based links to the decisions of interest, that is to say, the test has been found to be valid for the specific uses of interest. Yet that assumption cannot be supported in most instances. The present paper focuses on the use of a specific measure, the Graduate Record Examination (GRE), as an element in admissions decisions in master’s level teacher education programs. The broader issues of the relationship between test scores and non-test behavior, such as qualification and competence in the P-12 classroom, remain to be considered.
The Council for the Accreditation of Educator Preparation (CAEP) has produced a set of standards against which teacher education programs will be assessed to determine their eligibility for accreditation by CAEP, and for continued eligibility to prepare candidates for teacher certification in a growing number of states. Its standard on admissions criteria to preparation programs requires the use of standardized, nationally-normed scholastic ability tests, unless there is a validated state examination that may be used until 2020. Beyond that year, the implication in Standard 3.2 is that graduate-level programs will be required to use the GRE in the admissions process, and further requires that the average GRE score for an admitted cohort be I the top third of the GRE score distribution. The standard is silent as to the basis for selecting any particular percentile as the cohort average, and the standard is indifferent to the distinctions between the distribution of GRE versus SAT and ACT scores. The standard treats the percentile ranks as equivalents, which as is explained below, is an invalid assumption.
CAEP and its work groups did not present any specific evidence linking performance on GRE and performance in teacher preparation programs. Our review of the literature, including detailed reviews of data compiled by ETS, could not find a single study that report an association between GRE and grade-point average in graduate teacher preparation programs, or between GRE and successful completion of said programs. Certainly, there were no studies that linked GRE and success or competence as a classroom teacher. The recent studied by Lankford and associates reported in the Educational Researcher (2014) suggests that SAT scores for certified and entering teachers in New York State have been rising over the past decade is one of only a few studies attempting to link SAT scores to teacher preparation. The burden of proof of the validity of standardized tests for selection purposes rests with CAEP; as yet, the proof has not been presented.
Even were there any proof of validity, the standard as developed cannot serve as one of the criteria for accreditation. If implemented in its current form, the impact on education schools, and more importantly on the teacher workforce, will be devastating. The outcome of implementation in current form would lead in a relatively short time to a teacher workforce that is more overwhelmingly white and female than is lamented in the Rationale for the standards. The basis for that can be found by taking almost literally the admissions standards delineated in Section 3.2.
Before providing that evidence, a comment must be made about the remarkable imprecision in expression found in the vague phrase “group average” that is used in reference to GPA and nationally normed admissions test. The failure to specify what average refers to obscures the meaning of the standard. To which “average” does the standard refer? Is it the mean? The median? The mode? Further, the failure to recognize that a high school GPA of 3.0 cannot be equated to a college GPA of the same value again clouds the meaning of the standard. If the standard is to be useful, greater precision in language is needed.
That stated, however, the greater concern with the standard is its emphasis on nationally-normed admissions assessment (aka abilities tests) in the selection and admissions process. The basis for setting a performance level of these tests seems to rest primarily in recommendations and suggestions in response to opinion polls and professional organizations, not on the basis of empirical findings. In the sole study cited to support the selection of the top third as the “average” on such tests, a score of 1120 on SAT/GRE-like tests is indicated as the separation point between the upper third and the rest of the distribution. The implications of setting the mean on SAT-like measures at that level are as follows:
As depicted in the figure, approximately 3,100,000 youth graduate from high school in the United States each year. Of that group, approximately 1,600,000 take SAT-like tests (52% of graduates). Of that group, approximately 800,000 eventually take GRE-like tests (53% of SAT takers; 26% of high school graduates). The population IQ on well designed measures centers the mean at 100, and for this example, a standard deviation of +/-15. Numerous studies over many years reported that the mean IQ of high school graduates is approximately 105. In the attached figure, the mean for high school graduates is placed at 100, thus somewhat underestimating the found values, but allowing for a less cluttered graphic model. The large outer curve in the figure represents the high school graduate IQ distribution.
When SAT scores are equated to IQ scores, the mean SAT (1000) equates to an IQ of 109; the distribution of SAT scores centered on an IQ mean of 109 is represented in the figure. When GRE scores are equated to IQ, the GRE mean (in the older scoring model) is 1000, which equates to an IQ of 115, or one full standard deviation unit above the overall population mean. Were the recommendations in the Standard accepted and a group average at the top third set as an indicator, an SAT score of 1120 would demarcate that point, which corresponds to an IQ of 117. Were the same conditions applied to the GRE, the demarcation point would be the equivalent of an IQ 120, at or above which fewer than 10% of the population scores. Is this what the Commission is proposing?
CAEP, as with many professional organizations, is committed to increasing the ethnic diversity of the teacher workforce. Yet Standard 3.2 may be working in the opposite direction. We sought to determine the racial/ethnic composition of the total pool of GRE test takers who can be expected to meet this standard, scoring at the 70th percentile or above on the GRE Verbal Reasoning. We note that had the Quantitative scale been used instead, the overall findings would not be affected. First, we referred to the ETS Verbal Reasoning Concordance Table to determine the numerical score that corresponds most closely to the 70th percentile. According to this table, a score of 156 corresponds to the 71st percentile on the GRE Verbal Reasoning.
Using a verbal reasoning score of 156 as the target score, we sought to determine the percentage of test takers within each racial/ethnic group scoring 156 or higher. We referred to the ETS document entitled, “A Snapshot of the Individuals who Took the GRE Revised General Test,” to determine the number of test takers, the mean verbal reasoning score, and the standard deviation of the mean verbal reasoning score for each racial/ethnic group who took the GRE.
The racial/ethnic groups included in the total sample of GRE test takers were as follows: American Indian, Asian, Hawaiian/Pacific Islander, Black, Hispanic (including Mexican, Puerto Rican, and Other Hispanic), White, Other, and No Response.
Information regarding the score distribution and skewness of each racial/ethnic group, was not available from ETS, thus we assumed a normal distribution for all groups. For each group, the mean verbal score was set as the 50th percentile. The 84th percentile was set as the mean value plus one standard deviation. Using these two scores, the percentile rank of the target score 156 was estimated along the distribution. For example, the mean verbal score for Black test takers was 147.0, with a standard deviation of 7.2. Therefore, the 84th percentile score for Black test takers was 154.2. Using these scores, the target score of 156 was estimated to correspond to the 88th percentile of black test takers, meaning that 12% of black test takers can be expected to score 156 or higher on the GRE Verbal.
This calculation was repeated for each racial/ethnic group to determine the number of GRE test takers within each group scoring at the 70th percentile or above (156) on the GRE Verbal Reasoning. These values were then added together to calculate an estimate for the total number of GRE test takers scoring at or above the 70th percentile on the GRE Verbal Reasoning.
The pie chart in the figure above was created to represent visually the racial/ethnic proportions of test takers scoring at or above the target score of 156. The graphic needs little other explanation.
In a recent conversation with a Commission member, a question was raised about those graduate level programs that do not require the GRE as part of the admissions process. He suggested that that should not be a problem because applicants likely took SAT-like tests when applying to undergraduate school, thus those early scores could be used. He seemed genuinely perplexed to learn that those scores are not part of an applicant’s record to graduate school, nor are they readily available save as recollected self-reports of applicants. Given the paucity of empirical evidence linking GRE to performance in graduate level teacher education programs, perhaps the Commission might reconsider its position on that measure as it rethinks the wisdom of relying on SAT-like tests in the determining the quality of teacher education candidates more generally.