Trends in U.S. mathematics and science achievement are mixed, but somewhat positive on the whole. Students are more often taking advanced courses in both subjects, and their performance is slightly improved from, or no
worse than, the performance levels set in the 1970s. Larger shares of students-including those from underrepresented racial and ethnic groups-are meeting basic levels of proficiency in both subjects than in past years, although wide gaps in
achievement remain between students from these groups as compared with whites and Asians. (See "Do Policies and Socioeconomic Factors Play a Role in Achievement?") [Skip Text
Performance differences among states may reflect any number of factors, including differences in educational policy and in demographic characteristics. The 1996 Policies and Practices Survey, conducted by the Council of Chief State School Officers, provides information on several useful indicators of instructional quality: number of mathematics and science credits required for graduation, status of standards implementation, and requirements for teacher licensing (CCSSO 1996). An examination of these variables revealed no systematic patterns that might account for performance differences among states.
In the area of social and economic factors, there are suggestions from some studies that differences in "opportunity" may be linked to differences in student background and other socioeconomic variables. Several studies have shown that poor and minority students are more likely to attend schools with severely limited resources and less well-prepared teachers, more likely to be sorted into low academic tracks that limit their access to advanced mathematics and science courses, and less likely to attend schools that offer these advanced courses (Oakes, Gamoran, and Page 1992).
Performance in mathematics and science may also be influenced by other demographic characteristics such as family background. A study that examined the relationship between increases in achievement and changes in family characteristics in the 1980s found that gains made by white students could be completely accounted for by improved family circumstances over the years examined, but only one-third of the gains made by black students-and virtually none of the gains made by Hispanic students-were explained by these factors (Grissmer et al. 1994).
Several studies have attributed differences in mathematics and science achievement to the types of courses students complete (Jones et al. 1992 and Gamoran 1986). Acting on the premise that more high-level courses will result in higher achievement, many states and school districts raised graduation requirements in mathematics and science (as well as in other core subjects) following publication of A Nation at Risk by the National Commission on Excellence in Education (1983). Two years before its release, only nine states required two or more years of science and two or more years of mathematics. Fifteen years later, 42 states had put these stricter graduation requirements into place (CCSSO 1996).
Comparisons of U.S. achievement with that of other countries provide another important perspective on how well students and schools are performing. International comparisons reveal that, although U.S. students are performing relatively well in
science compared with the rest of the world, there remains much room for improvement in mathematics. The performance of students in high-scoring nations demonstrates what is possible for students to achieve at the elementary, middle, and high school
levels in this or any country. And, in so doing, student performance overseas provides information educators and policymakers can use in setting appropriate policies, expectations, and goals. Unfortunately, there is no reliable way to determine if
the U.S. standing has improved or worsened in recent years. Comparisons with earlier assessments cannot be made because of methodological differences between the studies, differences in the content tested, and changes in countries participating in
these tests. (For further information on performance assessments in general, see "Assessing Student Performance.") [Skip Text Box]
Assessment-in the educational context-is the process of gathering evidence about a student's knowledge of, ability to use, and disposition toward some subject matter with the purpose of making inferences from that evidence for a variety of ends. A test is a measuring instrument for evaluating and documenting those outcomes. Simple enough to describe, assessments are not simple to devise nor have they proven easy to integrate effectively within the instructional programs of large education systems. At their conceptual base, assessments are a complex endeavor and the inferences that can be made from them for individual students, teachers, schools, as well as whole educational systems need to be considered with numerous caveats.
There are differences of opinion among educators, researchers, and policymakers about the design and use of standardized and performance-based assessments.
Traditional standardized tests-usually of the short answer variety that are administered, scored, and interpreted in a consistent manner wherever and to whomever given-are the tests that are most often now in place in states and at the national level. But they do not necessarily measure well those aspects of learning such as creativity, deep conceptual understanding, and the ability to apply learning in a number of contexts deemed important or appropriate by many of today's educators. Traditional tests of student performance (answering a question with a single correct short answer) are an efficient method to assess large numbers of students at low cost. However, traditional, norm-referenced, multiple-choice tests are criticized for not adequately measuring complex cognitive and performance abilities. Moreover, they have often been used to limit students' access to further learning opportunities (Darling-Hammond 1991, Glaser 1990, and Oakes 1985).
There are a variety of classroom, school and school district, state, and national tests used for numerous purposes. Their assessment functions include the following:
The National Assessment of Educational Progress has been conducted in mathematics and science learning since the late 1960s and early 1970s. NAEP uses a formal, systematic procedure to obtain a sample of students' knowledge over time and to make generalizations about how student populations are performing. NAEP has attempted to add performance items to its assessment approach in order to assist in measuring not only students' knowledge of mathematics and science, but also their ability to apply that knowledge and to articulate various aspects of problem solving.
Numerous alternative assessment experiments are being implemented and debated in schools and communities across the nation. Different testing alternatives include performance tasks, open-ended questions, portfolios, observation, and student journal writing and self-assessment.
In recent years there has been a conceptual shift in some research and policy circles as to what constitutes "good" assessments of achievement. Some current trends in measuring and analyzing student performance include:
Research findings suggest that achievement tests of any kind are not a good predictor of success. Many forms of bias affect performance on tests: the choice of items, responses deemed appropriate, and the content selected are the product of culturally and contextually determined judgments (García and Pearson in press, Gardner 1983, and Sternberg 1985).
The factors that influence test scores (e.g., opportunities to learn, poverty and social class, test motivation and testing skills, language ability, and educational experiences outside of the classroom) are well-documented. These factors sometimes occur jointly-sometimes at different times-in the test-taking process, making it impossible to track each systematically. As Oakes et al. (1990) point out, although individual effects can be identified for both race and social class, for example, it is the combination of the two-their multiplicative power-that needs to be examined and measured. But new forms of assessment do not themselves remedy these socioeconomic complexities.
Darling-Hammond (1994b) argues that changing test forms and formats without changing the ways in which assessments are used will not change the outcomes of education. The equitable use of performance assessments depends on both the designs of the tests themselves and how well the assessment practices are interwoven with the progress of school reform and the improvement of teaching.
However, an assessment that attempts to perform too many functions will inevitably do none well. Some functions must be passed over in favor of others, and it is at this point that the test development process can become roiled in miscommunication. It is vital to delineate appropriate roles-student diagnosis, curriculum planning, program evaluation, instructional improvement, accountability, and certification-for different assessments (Linn and Herman 1997). And importantly, whatever test is created must be credible in the eyes of the public.
In analyzing test results, their meaning must not be misunderstood. For example, the results of a test given at various grade levels should not be interpreted as if they were an assessment of the progress of the same students over time (i.e., longitudinal). The results of annual achievement data reflect a (cross-sectional) snapshot of progress at that given time. The tests administered as part of TIMSS provide rich information about the performance of U.S. students compared to those of other countries in mathematics and science, and provide connections for understanding performance within the context of curriculum and instruction at specific grade levels. However, TIMSS data are not longitudinal in nature, meaning that the same students are not being tested in the fourth grade and then, four years later, in the eighth.
Much more research is needed on the fairness and validity of new modes of assessment. In addition to these concerns, investigations into the effects of aligning assessments with rigorous standards for student achievement would benefit a multitude of local, state, and federal audiences. Nonetheless, it is not only the form of the tests that is important in determining the impact of an assessment program on students, teachers, and schools; it is the use to which the results are put (Messick 1989).
This discussion concentrates heavily on various concerns regarding the measurement of achievement at the elementary and secondary levels, where at least some actions have been taken to assess performance; this is in contrast to the postsecondary level, where gaps remain.
High school graduates in the 1990s are much more likely to have completed advanced courses in the sciences such as biology, chemistry, and physics. In 1994, 93 percent of graduates had taken biology compared with 77 percent of 1982 graduates. Similarly, more than half now take chemistry compared with less than one-third in 1982, and one in four now complete physics compared with about one in seven in 1982. Although they remain a minuscule fraction of the total, the proportion of students completing advanced placement courses in these science subjects has also increased.
Female graduates are more likely to have taken biology and chemistry in high school than male students, but less likely to have taken physics. This represents a change in the coursetaking patterns of young women as compared with young men. In 1982, female graduates were about as likely as males to have taken chemistry and substantially less likely than males to have taken physics. (See figure 1-1.)
Students from racial and ethnic groups underrepresented in science made substantial gains in the proportions taking advanced science courses. More than 90 percent of blacks, Hispanics, and Native Americans now complete high school having taken biology. In chemistry, the proportion of blacks completing the course doubled (from 22 to 44 percent), rates for Hispanics nearly tripled (from 16 to 46 percent), and completions by Native Americans rose by more than half (from 26 to 41 percent) between 1982 and 1994. Similarly, progress was made in physics coursetaking between 1982 and 1994, although the proportions of students from black and Hispanic groups remain less than 20 percent. The proportion of blacks taking physics almost doubled, and the percentage of Hispanics nearly tripled. No discernible increase in the proportion of Native Americans completing physics was detected over the 12-year period. All in all and despite the progress, there remains a substantial gap in the proportions of blacks, Hispanics, and Native Americans who take chemistry and physics compared with Asian Americans/Pacific Islanders and whites. (See figure 1-2.)
In the 1970s, science proficiency scores of elementary and secondary students remained largely flat, but-beginning in the mid-1980s-students began to show improvement. (See figure 1-3.) By the mid-1990s, 9-year-olds and 13-year-olds were scoring slightly higher than their counterparts of 1973, and the scores of 17-year-olds had rebounded to the higher 1973 levels.
Of all school subjects, science in particular has been a sticking point in comparisons of student performance between sexes and among racial and ethnic groups. The underrepresentation of women in the science, mathematics, and technology workplace makes sex-based achievement differences a continuing concern among educators. However, national assessments of educational progress reveal that there are no real differences in science proficiency between 9-year-old girls and boys. Thirteen- and 17-year-old boys edge out girls in science performance, but this difference is small and has narrowed for 17-year-olds since the early 1970s. (See appendix table 1-3.)
Of much more compelling concern at the moment are the racial and ethnic differences that remain in science achievement. The performance of black and Hispanic students at all age groups was far below that of whites in 1996, as has been the case for decades. And although the difference between black and white students has declined for 9-year-olds and 13-year-olds since the 1970s, the disparity for 17-year-olds remains virtually unchanged. There has been no change in the difference between Hispanic and white achievement at any age. Average test scores of Native American students based on a related 1996 science assessment were closer to the national average than is the case for black and Hispanic students. Lower achievement is thought to be one reason why minority students make different elective course choices or are screened out of opportunities for more advanced study in science (Oakes 1990).
It is also useful to examine achievement differences across states. Science proficiency was reported on a state-by-state basis for the first time in 1996. (See "The Making of a New Science Assessment.") Figure 1-4 shows how eighth grade students in each participating state compared to the national average. In general, most of the high-scoring states were in the Central, Western, and New England regions of the
country, while the majority of the lower performing states were in the Southeast. [Skip Text Box]
In 1996, in order to better measure the effects of current approaches to science education, the U.S. Department of Education made major changes to subject matter assessment in science through its National Assessment of Educational Progress. The new test represents a departure from earlier ones both in the science that is tested and in the way it is tested. First, factual knowledge is assessed within meaningful scientific contexts. Second, level of performance depends not only on knowledge of facts, but also on the ability of students to integrate this information into a larger body of knowledge, and the capacity of students to use the reasoning processes of science to develop their understanding of the natural world.
The 1996 assessment used a variety of methods for measuring student performance:
The framework from which the assessment was constructed was developed through a consensus process that brought together science teachers, curriculum experts, other educators, policymakers, members of the business community, and the general public. The framework divides science into three major fields: earth, physical, and life sciences. It also assesses such mental processes important for scientific thinking as conceptual understanding, practical reasoning, and investigation by experimentation.
Although the changes introduced in 1996 mark a meaningful and rich new source of information on student performance, comparisons cannot be made with results of earlier assessments. Consequently, this chapter relies on the NAEP trend assessments in science in making comparisons of student performance over time.
Across states, racial and ethnic differences in science proficiency were apparent, and these cross-state differences followed many of the same patterns as overall state-by-state test score differences. That is, students of all races and ethnicities tended to score more highly in states with high overall science performance than in states with consistently lower performance. However, the magnitude of the difference in average scores varied to a surprising degree from one state to another. Average science scores for Hispanic and black populations, for example, fluctuated enormously across different states.
Black students scored below the national average in science in all states. Blacks scored highest in Colorado, but this score was not as high as even the lowest average for whites of any state. The largest achievement gaps between black and white students were in Wisconsin, Connecticut, and New York. With the exception of New York, Hispanic students in states known for their large Latino populations-California, Texas, Florida, and New York-achieved the national overall average score for Hispanic science proficiency.
Notwithstanding the substantial cultural differences and variations in geographic settling patterns across these states and within the U.S. Hispanic population, it was most often in Southeastern states that Hispanic student achievement lagged farthest behind. The largest differences between averages for Hispanics and whites were found in Connecticut, New York, and four Southeastern states. (See appendix table 1-4 for science achievement scores for Asians/Pacific Islanders and Native Americans.)
In the recent international comparative study on mathematics and science achievement (TIMSS), U.S. students performed better in science than in mathematics and better at the fourth grade than at the eighth grade level. U.S. fourth graders performed very well on the science assessment-they answered 66 percent of the science items correctly (compared with the international average of 59 percent). The only nation to score significantly higher was South Korea. (See figure 1-5.) In addition, U.S. fourth graders earned scores higher than the international average in all four science content areas: earth science, life science, physical science, and environmental issues/nature of science. (See appendix table 1-5.)
U.S. eighth grade students performed less well relative to other countries in science than fourth graders, scoring just above the international average. Eighth graders in the United States answered 58 percent of the science items correctly, compared with an international average of 56 percent. (See figure 1-6.) Like U.S. fourth graders, scores of U.S. eighth grade students exceeded the international average in all science content areas: earth science, life science, physics, chemistry, and environmental issues/nature of science. (See appendix table 1-6.)
In the United States, boys scored slightly higher than girls in science at the fourth grade, but there was no difference between the sexes at the eighth grade. In other countries that participated in the study, boys outperformed girls in science in 40 percent of the countries at the fourth grade and in almost half of the countries at the eighth grade. (See appendix table 1-7.)
U.S. students are now much more likely to have taken advanced mathematics courses in high school than they were in years past. In 1994, close to 70 percent of seniors had completed geometry, 58 percent had completed algebra 2, and 9 percent had completed calculus. These figures represent a more than 20-point gain in the percentage of students taking algebra 2 and geometry, and about a 5-point increase in calculus since 1982. High school females are now more likely than males to have taken geometry and algebra 2, and about as likely to have completed calculus. (See figure 1-7.)
There remain substantial disparities across racial and ethnic groups in advanced mathematics coursetaking. This gap is apparent in geometry and algebra 2 as well as in the most advanced courses in the college preparatory sequence. In calculus, about one-quarter of Asian Americans/Pacific Islanders completed the course compared with about 10 percent of whites, 6 percent of Hispanics, and 4 percent each of blacks and Native Americans.
However, despite the unequal enrollments, progress has been made in the proportion of students in all racial and ethnic groups taking advanced mathematics. Half or more of white, Hispanic, and Asian American/Pacific Islander students in the class of 1994 completed algebra 2 and geometry, the so-called gatekeeper courses for advanced study in mathematics and science. Large gains were made in groups underrepresented in mathematics between 1982 and 1994. The proportion of black students taking geometry increased from 29 to 58 percent between 1982 and 1994. The proportion of Hispanics went from 26 to 69 percent, and the fraction of Native Americans taking geometry rose from 34 to 60 percent over the period. These groups also experienced 20 to 30 percentage point gains in algebra 2. (See figure 1-8.)
Mathematics performance of U.S. students remained fairly stable during the 1970s and began to improve in the 1980s. The most recent assessments indicate small but significant gains for 9-year-olds and 13-year-olds through 1996. (See figure 1-9.) On the other hand, performance of 17-year-olds remains at the 1973 level after recovering from a slight dip in the 1980s.
Although the achievement of U.S. students in mathematics has shown slight gains over time, there remains a large proportion of students unable to demonstrate anything more than basic levels of knowledge (often associated with NAEP's level 2
performance). (See "The Making of a New Mathematics Assessment.") This is particularly true at grade 12 where just one in six students performed at or above level 3 (level 4 being the highest). At grades 4 and 8, respectively,
approximately one in five and one in four students performed at this level. Despite the disappointing news, this is an improvement from 1990 when substantially fewer students demonstrated level 3 performance. [Skip
National Assessment for Educational Progress tests in 1990, 1992, and 1996 differed markedly from earlier assessments in that they were designed to reflect the relatively new content and teaching standards published by the National Council of Teachers of Mathematics (NCTM 1989 and 1991). These newer assessments included questions from the five core content areas defined by the mathematics standards:
The 1990, 1992, and 1996 mathematics assessments also attempt to measure students' cognitive abilities such as those emphasized in the standards: reasoning, problem solving, and communicating with and about mathematics.
At the same time that standards-based assessments were being developed, efforts were made to associate numerical scores on the test with descriptive labels and definitions that capture the levels of knowledge and skill demonstrated by students' overall responses to test items. Results from the 1990 assessment placed performance on a continuum that ranged from knowledge of "simple arithmetic facts" at the low end to knowledge of "multistep problem solving and algebra" at the high end. Results from the 1992 and 1996 NAEPs were reported at one of four proficiency levels that ranged from "below basic" to "advanced." The value and validity of these proficiency levels have been matters of debate since their introduction (U.S. GAO 1993). To permit comparability with reported results without conveying judgments about the capabilities a particular score represents, this chapter reports performance levels simply designated as levels 1 to 4. These levels correspond numerically to the score ranges used in 1990 and 1992 mathematics assessment reports. (See appendix table 1-10.)
However, considerable progress has been made in the 1990s in the proportion of students performing at least at level 2. Between 62 and 69 percent-depending on grade level-of students in 1996 were able to perform the more basic levels of mathematics, compared with 52 to 58 percent in 1990. (See figure 1-10.)
In 1996, there were no substantial differences between the proportions of male and female students performing at or above level 2 in mathematics at any grade level. A slightly higher proportion of males than females demonstrated the more advanced performance (level 3) in 4th and 12th grades, but not in 8th grade. (See appendix table 1-10.)
As in science, differences in the mathematics achievement across racial and ethnic groups have followed a consistent pattern over the years: white and Asian American/Pacific Islander students generally achieve at significantly higher levels than do black, Hispanic, and Native American students. Despite some gains between 1990 and 1996, the proportion of black, Hispanic, and Native American students who performed at level 2 or above lagged far behind that of whites and Asian/Pacific Islanders. There were about 40 points between the percentage of white students at level 2 and the percentage of black students, about a 30-point lag for Hispanics, and about 20 points for Native Americans. (See appendix table 1-10.)
Larger proportions of white students in all three grades were performing at or above levels 2 and 3 at the end of the six-year period of the assessment than they were in 1990. The percentage of black fourth graders who performed at level 2 or above increased by 13 points between 1990 and 1996. Hispanic and Native American students showed no statistically significant improvement at any grade or at any level of proficiency during that period.
Also between 1990 and 1996, there has been a striking rise in the number of states where 50 percent or more of eighth grade students scored at or above level 2 mathematics proficiency. In 1996, of the 40 states participating in the state-by-state analysis, only students in Alabama, Louisiana, Mississippi, and South Carolina failed to meet this performance criterion. In comparison, in 1992, only 23 of 35 states, and just half of 1990 participating states, could claim 50 percent or more of their students at or above level 2 performance. (See figure 1-11.) However, there were large differences among racial and ethnic groups across states in meeting the 50 percent criterion. In 1996, half or more of white eighth graders in all states achieved level 2 performance; only in Iowa, Montana, and North Dakota did half or more of Hispanic eighth grade students meet the basic level of proficiency; in no state did half or more of black students perform at this level.
Studies suggest that state economic conditions play some part in mathematics achievement, although a direct and powerful relationship has not been identified. Four states in which less than half of eighth graders functioned at or above level 2 in mathematics (Alabama, Louisiana, Mississippi, and South Carolina) were compared with the six states in which three-quarters or more of students achieved at this level. Comparisons were based on three key variables: poverty rate, educational expenditure, and the percentage of minority students in each state. Comparisons suggest an association between these indicators and mathematics performance. (See text table 1-1.)
As in science, performance in mathematics of U.S. fourth grade students in the 1995 TIMSS study was comparatively better than eighth grade performance, averaging 63 percent of items correctly answered compared with 59
percent internationally. (See figure 1-12.) But, unlike in science, U.S. mathematics performance at fourth grade was far behind that of Singapore, South Korea, Japan, and Hong Kong-whose fourth grade students
averaged 73 to 76 percent correct-and a host of other countries. (See figure 1-13.) U.S. eighth graders answered just over half of the items on the mathematics assessment correctly. This was below the international
average of 55 percent correct, and students in the highest performing nations-Singapore, South Korea, Japan, Hong Kong, and Flemish-speaking Belgium-averaged 65 percent correct or higher. In most countries-including the United States-there were no
differences between the sexes in mathematics performance at the fourth or eighth grade. (See "Mathematics and Science Achievement of the Highest Performers" and appendix table 1-14.)
[Skip Text Box]
Achievement can also be evaluated by comparing the top students in different nations. Often, the comparison is based on the proportion of each nation's students scoring in the top 10 percent of the international distribution. As would be expected on the basis of findings already presented, proportionately more students from Singapore, South Korea, and Japan came out on top in both subjects and at both the fourth and eighth grade levels. For example, at the eighth grade level, 45 percent of the students from Singapore scored in the top 10 percent of the international mathematics distribution and 31 percent scored at the top of the science distribution. A smaller percentage of U.S. students made the top cut. In science, 13 percent of eighth grade students and 16 percent of fourth grade students scored in the top 10 percent of their respective international distributions. In mathematics, only 5 percent of U.S. students in eighth grade and 9 percent of students in fourth grade reached the top 10 percent international benchmark. (See appendix table 1-15.)
The performance of students varied over mathematics content areas both within and among countries. In fourth grade mathematics, U.S. students performed at or above the international average in all areas
except measurement. (See appendix table 1-12.) U.S. eighth grade students performed best on algebra, fractions, and data representation/analysis, where performance was on a par with international averages.
They did less well on proportionality, geometry, and measurement. (See appendix table 1-13.)