LongTerm Trends in Math and Science Performance
Benchmarking of Mathematics Performance Against Standards
International Comparisons of Mathematics and Science Achievement
U.S. and internationally comparable achievement data result in a mixed report card for the
United States. Although performance on assessments of mathematics and science achievement by
the National Assessment of Educational Progress (NAEP) has improved since the 1970s, few students
are attaining levels deemed Proficient or Advanced by a national panel of experts, and the performance
of U.S. students continues to rank substantially below that of students in a number of other,
mostly Asian, countries. This crossnational achievement gap appears to widen as students progress
through school. This section describes progress in student performance, both longterm trends
based on NAEP curricular frameworks developed in the late 1960s and more recent trends that
track performance across items aligned with more current standards. International comparisons
are then used to benchmark U.S. performance in these subjects.
LongTerm Trends in Math and Science Performance
Generally, mathematics and science performance on the NAEP longterm trend assessment declined in the 1970s, increased during the 1980s and early 1990s, and has remained mostly stable since that time. (See sidebar, "The NAEP Trends Study.") NAEP mathematics achievement increased among 9, 13, and 17yearold students since the early 1980s, although most of these gains occurred before 1992. (See Figure 11 .) Although the average scale scores of 17yearolds declined by 6 points between 1973 and 1982, scores increased by 9 points between 1982 and 1992 and remained at about the same level through 1999 (National Center for Education Statistics (NCES) 2000e.) These gains since 1982 were substantial, equating to about a quarter of the difference between the mathematics scores of 13 and 17yearolds (an 8point difference is roughly equivalent to a year of schooling between these ages). Substantial gains were also made by 9 and 13yearolds between 1982 and 1999: 8 and 13 points, respectively.
NAEP science performance over the past three decades has generally mirrored that of math: scores
declined during the 1970s but increased in the 1980s and early 1990s. Because the first science
assessments occurred before the first math assessments (1969 for 17yearolds and 1970 for 13
and 9yearolds), science achievement can be tracked over a longer period. Results for 17yearolds
show an initial 22point decline between 1969 and 1982. In the decade between 1982 and 1992,
an increase in the average score erased about half of that decline; since 1992, scores have
been stable. (See figure 11 .) Although 17yearolds had higher science scores in 1999 than
their counterparts in 1982, the average 1999 score remained 10 points below the average score
in 1969. Gains since the early 1980s for 13 and 9yearolds in science have essentially returned
the average scores of these cohorts to levels similar to (for 13yearolds) or higher than (for
9yearolds) those posted in 1970.
A persistently wide gap in NAEP scores between low and highperforming students remains. For
example, the gap between the average mathematics scores of the highest and lowest performing
quartiles for 17yearold students was 73 points in 1999, a gap similar in size to the difference
between the average scale scores for 17 and 9yearolds in 1999 (roughly equivalent to eight
years of schooling). Similar gaps have persisted for 9 and 13yearolds as well. Efforts to
apply uniformly high standards to all children need to confront the large variation in performance
that currently exists in our schools.
Trends in Performance by Sex
Differences in the academic performance of female and male students on the NAEP longterm trend
assessment appear as early as age 9 and persist through age 17. Although girls have consistently
outperformed boys in reading and writing, gaps between the sexes in mathematics and science
performance in the early grades have been much narrower and have varied over time. In 1999,
9yearold girls had higher average reading scores than boys, although this gap has narrowed
since 1971 (NCES 2000e.) In mathematics, higher scores earned by girls in the 1970s shifted
to higher scores earned by boys in the 1990s. In 1999, however, the difference between the scores
of boys and girls was not statistically significant. In science, boys have tended to perform
better than girls at age 9, although, as observed in mathematics, the difference in 1999 was
not statistically significant.
Female and male achievement differences at age 9 remain nearly unchanged at age 13. For example,
in 1999, the average reading proficiency score for a 13yearold female was 12 scale points
higher than for a 13yearold male, and females scored at about the same level in math and 6
scale points lower than males in science (NCES 2000e.) When 17yearolds are assessed, female and male differences in reading persist. For example, in 1999, average reading proficiency for 17yearold females was 13 scale points higher than for males of the same age. This corresponds
to about 45 percent of the difference between the average scores of 13 and 17yearolds in 1999. In other words, the gap in reading proficiency between females and males at age 17 is roughly equivalent to between 1.5 and 2 years of schooling.
In mathematics and science, boys have tended to score higher than girls, although the gap is narrower. A gap favoring 17yearold males in mathematics narrowed from 8 points in 1973 to one that was statistically insignificant in 1999. (See figure 12 .) The gap in science at this age narrowed from 16 points in 1973 to 10 points in 1999 (about a year’s worth of science).
Trends in Performance by Race/Ethnicity
NAEP trend data on science and mathematics achievement of 17yearolds between 1973 and 1999 suggest that the gap between whites and their black and Hispanic peers has narrowed but remains large. Differences in percentile scores by race/ethnicity, that is, the score at which different percentages of a particular group (5, 25, 50, 75, or 95 percent) score at or below, provide an indication of the size of these gaps. (See figure 13 .) For example, in 1999, 75 percent of white 17yearolds scored 282 or above on the NAEP science test (the 25th percentile score), while only 25 percent of black 17yearolds and fewer than 50 percent of Hispanic 17yearolds scored at that level. In mathematics, the gap between blacks and whites appears to be somewhat narrower and the gap between whites and Hispanics somewhat wider. Gains by both high and low performing black and Hispanic students have narrowed the wide gaps that were in evidence since 1973, although there is little evidence that the gaps have continued to narrow in the 1990s, and some evidence that the gap between whites and blacks in mathematics has widened (NCES 2000e.)
Gaps in mathematics achievement between whites and other racial/ethnic groups exist before entering high school, but evidence shows that these gaps widen for some groups during high school. In mathematics, the overall differences in 8th to 12thgrade achievement gains show that blacks learn less than whites during high school, Hispanics and whites do not differ significantly, and Asians learn more than whites on average. However, when one compares blacks and whites completing the same number of math courses, the achievement gains during high school are not measurably (statistically) different. The Asian and white achievement gain differences are also generally reduced among students completing the same number of mathematics courses (NCES 1995.) These data do not suggest, however, that coursetaking patterns alone lead to similar outcomes. The level of achievement that students from different backgrounds have attained before entering particular courses makes a difference, because parallel gains among students taking the same courses cannot close the gap. For example, NAEP data show that racial/ethnic differences in mathematics persist even among students who have completed similar courses at the time of assessment. The gap in average scores was 21 points between white and black 17yearolds whose highest math course taken as of the 1996 assessment was algebra II; this gap is similar to the difference in scores observed between all 17yearolds whose highest math course was algebra II and those whose highest course was geometry (NCES 2000b.)
Benchmarking of Mathematics Performance Against Standards
In addition to the longterm trend data described above, NAEP periodically assesses the mathematics and science performance of students against more current frameworks of what students are expected to know in the 4th, 8th, and 12th grades (hereafter, referred to as the "National" NAEP.) Since 1990, the mathematics assessments have been based on a framework influenced by the National Council of Teachers of Mathematics (NCTM) Curriculum and Evaluation Standards for School Mathematics (NCTM 1989.) The assessment framework contains five content strands (number sense, properties, and operations; measurement; geometry and spatial sense; data analysis, statistics, and probability; and algebra and functions). In addition to the five content strands, the assessments examine mathematical abilities (conceptual understanding, procedural knowledge, and problem solving) and mathematical power (reasoning, connections, and communication). Student mathematics performance is summarized on the NAEP mathematics scale, which ranges from 0 to 500. In addition, results for each grade are reported according to three achievement levels developed by NAGB: Basic, Proficient, and Advanced. These achievement levels are based on collective judgments by NAGB about what students should know and be able to do in mathematics. The levels were defined by a broadly representative panel of teachers, education specialists, business and government leaders, and members of the general public. The Basic level denotes partial mastery of prerequisite knowledge and skills that are fundamental for proficient work at each grade. The Proficient level represents solid academic performance as determined by NAGB, and the Advanced level signifies superior performance. Although NCES still considers these proficiency levels
developmental, they are used in this section to benchmark student math achievement.
Mathematics Performance by Achievement Level
Although mathematics trends in the NAEP longterm trend study were relatively flat during the 1990s, mathematics performance on the National NAEP increased in the 4th, 8th, and 12th grades between 1990 and 2000. While the average scores of 4th and 8th graders made progress throughout the decade, the scores of 12th graders declined between 1996 and 2000, reducing some of the gain made between 1990 and 1996. The national average scale score for 4th graders in 2000 was 228, an increase of 15 points over the national average for 1990; the average scale score for 8th graders in 2000 was 275, an increase of 12 points; and the average scale score for 12th graders was 301, an increase of 7 points since 1990, but a decrease in 3 points since 1996 (NCES 2001f.) The crossdecade increases of 4th and 8th graders are between a third and almost half of a standard deviation in test scores for these grades, roughly equivalent to a gain of between 1.5 and 2 grade levels. While smaller, the 12thgrade gain was still substantial, between 0.5 and 1 grade level.
Although these increases suggest that some progress is being made across areas emphasized in the NCTM mathematics standards, relatively few students scored at the Proficient or Advanced levels set by NAGB for each grade, and more than 30 percent scored below the Basic level. (See figure 14 .) For 4thgrade students, the percentage performing at or above the Basic level was 69 percent in 2000 compared with 50 percent in 1990; for 8thgrade students, 66 percent compared with 52 percent; and for 12thgrade students, 65 percent compared with 58 percent. The percentages of students scoring at the Proficient and Advanced levels were much lower: 26 percent of 4th graders, 27 percent of 8th graders, and 17 percent of 12th graders scored at the Proficient level in 2000, and the percentage of students in these grades in 2000 scoring at the Advanced level were 3 percent, 5 percent, and 2 percent, respectively. From NAGB’s perspective, then, as many as onethird of students continue to score below a Basic level of mathematics achievement, and few score at levels considered to be Advanced.
Proficiency levels provide an additional metric to gauge how wide the gaps in scores are between different subgroups. The NAEP sample shows differences in the achievement of boys and girls, students from different racial and ethnic groups, students from different states and jurisdictions, and students receiving and not receiving Title I services.
Proficiency by Sex
Although similar proportions of boys and girls scored at the Basic level or above on the 2000 NAEP mathematics assessment, boys were more likely to score at the Proficient or Advanced levels than girls at the 4th, 8th, and 12th grades. For example, 20 percent of 12thgrade males scored at the Proficient level compared with 14 percent of girls, and the percentage of each group scoring at the Advanced level was 3 and 1 percent, respectively. (See text table 11 .)
Proficiency by Race/Ethnicity
At each grade level, a larger percentage of white and Asian/Pacific Islander students scored at the Basic, Proficient, and Advanced levels in 2000 than their black, Hispanic, and American Indian/Alaskan Native counterparts. For example, while 34 percent of Asian/Pacific Islander and 20 percent of white 12th graders scored at or above the Proficient level in 2000, only 4 percent of Hispanic, 3 percent of black, and 10 percent of American Indian/Alaskan Native 12th graders scored at that level. Furthermore, there was no evidence in the 2000 assessment of any narrowing of the racial/ethnic group score gaps since 1990. These differences, combined with higher dropout rates for Hispanic, black, and American Indian/Alaskan Native youth, point to considerable disparities in achievement across racial/ethnic groups. However, there is substantial variation for ethnic groups by country of origin (see sidebar, "Variation in Educational Achievement and College Attendance Rates of Asian and Hispanic 1988 8th Graders by Country of Origin") and time since immigration. (The sidebar, "Generational Status and Educational Outcomes Among Asian and Hispanic 1988 8th Graders" compares ethnic groups by timing of immigration.)
Proficiency by Type of Location
At the 4th, 8th, and 12th grades, students in the urban fringe/large town locations had higher scale scores on the NAEP national mathematics assessment than students in central city locations (NCES 2001f.) At grades 4 and 8, students in rural/small town locations also outperformed their counterparts in the central city locations. These differences were also reflected in proficiency scores. (See text table 11 .) For example, at grade 12, there were higher percentages of students at or above the Proficient level and at or above the Advanced level attending schools in urban fringe/large town locations (19 and 3 percent, respectively) than in rural school locations (12 and 1 percent, respectively). While 16 percent of 12th graders in central city locations scored at or above the Proficient level, only 60 scored at or above the basic level, lower than the 68 percent in urban fringe/large town locations.
Because of slight changes by the Census Bureau in the definitions of these categories, schools were not classified in exactly the same way in 2000 in terms of location type as in previous NAEP assessments. Therefore, comparisons to previous years are not possible (NCES 2001f.)
Proficiency by Free/ReducedPrice Lunch Eligibility
There is a wide gap between the NAEP mathematics scores of high and low income students, as measured by eligibility for the National School Lunch Program. At the 4th, 8th, and 12th grades, the scale scores for students who are not eligible for the Free/Reduced Price Lunch Program (i.e., those above the poverty guidelines) are significantly higher than the scores for the students who are eligible for the program. For example, lowincome 12thgrade students (those who were eligible for the Free/Reduced Price Lunch Program) had scale scores similar to highincome 8thgrade students (those who were not eligible for this program). The size of these gaps can also be seen by comparing the percentage of students in each group at or above the Proficient level. While 35 percent of highincome students scored at or above the Proficient level, only 10 percent of their lowincome counterparts did so. Furthermore, at each grade level, lowincome students were twice as likely or more to score below the Basic level of achievement than were highincome students (NCES 2001f.)
Proficiency by State
Wide variability exists across states in the proportion of public 8thgrade students performing above the Proficient level, and growth seen at the national level between 1996 and 2000 was not uniform across states. At grade 8, between 8 and 40 percent of students in the 39 states participating in State NAEP were at or above the Proficient level in 2000. As shown in text table 13 , thirty percent or more of public 8thgrade students scored at or above the Proficient level in Connecticut, Indiana, Kansas, Maine, Massachusetts, Minnesota, Montana, Nebraska, North Carolina, North Dakota, Ohio, Oregon, and Vermont, and 20 percent or less scored at that level in Alabama, Arkansas, California, Georgia, Hawaii, Louisiana, Mississippi, New Mexico, Oklahoma, South Carolina, Tennessee, and West Virginia. Between 1990 and 2000, the percentage of 8th graders performing at or above the Proficient level increased for 30 out of 31 jurisdictions participating in both years. Some states made more progress than others, however. For example, the percentage of public 8thgrade students scoring at the Proficient level tripled in North Carolina over this 10 year period (from 9 to 30 percent), while the percentage scoring at that level or higher in North Dakota remained stable (at about 30 percent).
Summary of NAEP Performance
Although science and mathematics achievement has improved since the late 1960s and early 1970s, the percentage of students scoring in mathematics at a level considered proficient is still only about a quarter at the 4th and 8th grades and one in six in 12th grade. The gap in math and science proficiency between whites and Asians/Pacific Islanders and their black, Hispanic, and American Indian/Alaskan Native counterparts is particularly wide, as is the gap between students from low and highincome backgrounds (as measured by eligibility for the National School Lunch Program). Although the gap between the scores of white and black students narrowed through the 1980s, there is evidence that the gap is now widening. The range between high and lowperforming students within a particular grade is particularly wide, pointing to a challenge for programs designed to hold all students accountable to high standards.
International Comparisons of Mathematics and Science Achievement
Internationally, U.S. student relative performance becomes increasingly weaker at higher grade levels. On the Third International Mathematics and Science Study (TIMSS), 9yearolds tended to score above the international average, 13yearolds near the average, and 17yearolds below it. Even the most advanced students at the end of secondary school performed poorly compared with students in other countries taking similar advanced mathematics and science courses. This section reviews the mathematics and science performance of U.S. students, drawing primarily on the 1995 TIMSS and the 1999 repeat of this study at the 8thgrade level (TIMSSR).
The 1995 TIMSS included assessments of 4th and 8thgrade students as well as students in their final year of secondary school. The study included several components: the assessments, analyses of curriculums for various countries, and an observational video study of mathematics instruction in 8thgrade classes in Germany, Japan, and the United States. In addition to updating the comparison of U.S. math and science achievement in the 8th grade, the design of TIMSSR made it possible to track changes in achievement and certain background factors from the earlier TIMSS study between the 4th and 8th grades. TIMSSR also indicates the pace of educational change across nations, informing expectations about what can be achieved (NCES 2000f.)
Achievement of 4th and 8thGrade American Students in 1995
U.S. 4thgrade students performed at competitive levels in 1995 in both science and mathematics. In science, they scored well above the 26country international overall average as well as the average in all content areas assessed: earth sciences, life sciences, physical sciences, and environmental issues/nature of science. Only students in South Korea scored at a higher level overall. The 4thgrade assessment in mathematics covered topics in whole numbers; fractions, and proportionality; measurement, estimation, and number sense; data representation, analysis, and probability; geometry; and patterns, functions, and relations. U.S. 4thgrade students scored above the international average on this assessment and performed comparatively well in all content areas except measurement (NCES 1997c.)
As with 4thgrade students, the TIMSS science assessment taken by 8thgrade students covered earth and life sciences and environmental issues, but it also included content in physics and chemistry. With a mean score of 534 in science, 8thgrade U.S. students scored above the 41country international average of 516. U.S. students performed at about the international average in chemistry and physics and above average in life sciences, earth sciences, and environmental issues (NCES 1996c.)
Mathematics was the weaker area of 8thgrade achievement relative to the performance of students in other countries. The assessment covered fractions and number sense; geometry; algebra; data representation, analysis, and probability; measurement; and proportionality. Overall, 8thgrade U.S. students performed below the 41country international overall average and at about the international average in algebra, data representation, and fractions and number sense. Performance in geometry, measurement, and proportionality was below the international average.
Change in Relative Performance Between 4th and 8th Grades
Change in the relative performance of U.S. students can be examined by comparing the average mathematics and science scores of U.S. 4th graders in 1995 and 8th graders in 1999 relative to the international average of the 17 nations that participated in 4thgrade TIMSS and 8thgrade TIMSSR. (See sidebar, "How Comparisons Between 4th Graders in 1995 and 8th Graders in 1999 Are Made.") Figure 15 compares the average scores of the 17 nations between 4thgrade TIMSS and 8thgrade TIMSSR with the international averages at both grades for each subject. The numbers shown in the figure are differences from the international average for the 17 nations. Nations are sorted into three groups: above the international average, similar to the international average, and below the international average.
The available evidence appears to confirm what had been suggested four years ago: the relative performance of U.S. students in mathematics and science is lower in 8th grade than in 4th grade among this group of nations. In mathematics, the U.S. 4thgrade score in 1995 was similar to the international average of the 17 nations incommon between the 4thgrade TIMSS and 8thgrade TIMSSR. At the 8thgrade level in 1999, the U.S. average in mathematics was below the international average of the 17 nations. Because U.S. 4th graders performed at the international average in 1995 and U.S. 8th graders performed below the international average in 1999 in mathematics, this suggests that the relative performance of the cohort of 1995 U.S. 4th graders in mathematics was lower relative to this group of nations four years later.
In science, the U.S. 4thgrade score in 1995 was above the international average of the 17 nations incommon between the 4thgrade TIMSS and 8thgrade TIMSSR. At the 8thgrade level in 1999, the U.S. average in science was similar to the international average of the 17 nations. Thus, U.S. 4th graders performed above the international average in 1995 and U.S. 8th graders performed at a level similar to the international average in 1999 in science. As in mathematics, this suggests that the relative performance of the cohort of U.S. 4th graders in science was lower relative to this group of nations four years later. The data also suggest that, in science, the relative performance of the cohort of 1995 4th graders in Singapore and Hungary was higher relative to this group of nations in 1999; the relative performance of the cohort of 1995 4th graders in Italy and New Zealand was lower relative to this group of nations four years later; and the relative performance of the cohort of 1995 4th graders in the 12 other nations was unchanged relative to this group of nations four years later.
Mathematics and Science Achievement of 8th Graders in 1999
For most of the 23 nations that participated in 8th grade in both TIMSS and TIMSSR, including the United States, there was little change in the mathematics and science average scores over the fouryear period. There was no change in 8thgrade mathematics achievement between 1995 and 1999 in the United States and in 18 other nations. (See text table 14 .) Three nations, Canada, Cyprus, and Latvia, showed an increase in overall mathematics achievement between 1995 and 1999. One nation, the Czech Republic, experienced a decrease in overall math achievement over the same period. In the United States and 17 other nations, there was no change in the science achievement score of 8th graders between 1995 and 1999; while it increased in four countries and decreased in one.
Students’ Achievement in the Final Year of Secondary School
Students’ performance in the final year of secondary school can be considered a measure of what students have learned over the course of their years in school. Assessments were conducted in 21 countries in 1995 to examine performance on the general knowledge of mathematics and science expected of all students and on more specialized content taught only in advanced courses.
Achievement on General Knowledge Assessments. The TIMSS general knowledge assessments were taken by all students in their last year of upper secondary education (12th grade in the United States), including those not taking advanced mathematics and science courses. The science assessment covered earth sciences/life sciences and physical sciences, topics covered in grade 9 in many other countries but not until grade 11 in U.S. schools. On the general science knowledge assessment, U.S. students scored 20 points below the 21country international average, comparable to the performance of 7 other nations but below the performance of 11 nations participating in the assessment. Only 2 of the 21 countries, Cyprus and South Africa, performed at a significantly lower level than the United States. Countries performing similarly to the United States were Germany, the Russian Federation, France, the Czech Republic, Italy, and Hungary.
A curriculum analysis showed that the general mathematics assessment given to students in their last year of secondary education covered topics comparable to 7thgrade material internationally and 9thgrade material in the United States. Again, U.S. students scored below the international average, outperformed by 14 countries but scoring similarly to Italy, the Russian Federation, Lithuania, and the Czech Republic. As on the general science assessment, only Cyprus and South Africa performed at a lower level. These results suggest that students in the United States appear to be losing ground in mathematics and science to students in many other countries as they progress from elementary to middle to secondary school.
Achievement of Advanced Students. On advanced mathematics and science assessments, U.S. 12th grade students who had taken advanced coursework in these subjects performed poorly compared with their counterparts in other countries, even though U.S. students are less likely to have taken advanced courses than students at the end of secondary school in other countries. The TIMSS physics assessment was administered to students in other countries who were taking advanced science courses and to U.S. students who were taking or had taken physics I and II, advanced physics, or advanced placement (AP) physics (about 14 percent of the entire age cohort). The assessment covered mechanics and electricity/magnetism as well as particle, quantum, and other areas of modern physics. Compared with their counterparts in other countries, U.S. students performed below the international average of 16 countries on the physics assessment. (See figure 16 .) The mean achievement scores of the United States (423) and Austria (435) were at the bottom of the international comparison (average = 501). Students in 14 other countries scored significantly higher than the United States. The subset of U.S. students taking or having taken AP physics scored 474 on the assessment, similar to scores of all advanced science students in nine other countries, and six countries scored higher (scores ranged from 518 to 581). Only Austria performed at a significantly lower level, with an average score of 435 (NCES 1998b.) However, U.S. AP physics students represented a much smaller proportion of the age cohort in the United States (about 1 percent of the relevant age cohort) than did the students taking the advanced physics assessment in most of the other countries.
For example, the physics assessment was taken by about 14 percent of the relevant age cohort in Canada, 20 percent in France, 8 percent in Germany, and 14 percent in Switzerland (NCES 1998b.)
The advanced mathematics assessment was administered to students in other countries who were taking advanced mathematics courses and to U.S. students who were taking or had taken calculus, precalculus, or AP calculus (about 14 percent of the relevant cohort). Onequarter of the items tested calculus knowledge. Other topics included numbers, equations and functions, validation and structure, probability and statistics, and geometry.
The international average on the advanced mathematics assessment was 501. U.S. students, scoring 442, were outperformed by students in 11 nations, whose average scores ranged from 475 to 557. No nation performed significantly below the United States; Italy, the Czech Republic, Germany, and Austria performed at about the same level. (See figure 16 .) U.S. students who had taken AP calculus had an average score of 513 and were exceeded only by students in France. Five nations scored significantly lower than the AP calculus students in the United States. Thus, the most advanced mathematics students in the United States (about 5 percent of the relevant age cohort) performed similarly to 10 to 20 percent of the age cohort in most of the other countries. In other words, U.S. calculus students performed at a level similar to a number of other countries, although the percentage of the relevant age cohort (e.g., 17yearolds) taking the test was significantly lower than in other countries.
Summary of International Assessment Results
Data from TIMSS and TIMSSR show that U.S. students generally perform comparatively better in science than in mathematics; that students in the primary grades demonstrate the strongest performance, especially in science; that students in grade 8 show weaker performance; and that those in grade 12 show weaker performance still, relative to their counterparts in other countries. Furthermore, while the United States tends to have fewer young people taking advanced math and science courses, students that do take them score lower on assessments of advanced mathematics and physics than do students who take advanced courses in other countries.
