One of the central goals of educators and legislators is increasing overall student achievement, with a special focus on increasing learning by low performers. Concern also centers on advancing U.S. performance in relation to that of other countries, especially in mathematics, science, and technical fields. The most commonly used tools for measuring changes in achievement are standardized assessments. (The terms achievement and performance are used interchangeably in this section when discussing scores on these tests.)
This section is divided into two parts. The first examines trends in mathematics and science achievement among public and private school students in the United States, using two kinds of national data. Longitudinal data follow the same group of students over several years, allowing observers to track how individual students learn over time. In some cases, longitudinal test data may also be linked to teaching practices and other factors thought to influence achievement. New test data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLS-K), collected in 2007, allow study of performance changes among a kindergarten cohort through eighth grade and of changes over time in initial achievement gaps among groups of students.
Cross-sectional data, in contrast to longitudinal data, provide information on particular groups' performance measured at different points in time. The National Assessment of Educational Progress (NAEP) data presented in the first section, for example, examine performance of fourth and eighth graders who were sampled in various years between 1990 and 2007. These data indicate whether and how achievement is changing over time for comparable groups of students.
The second part of this section compares student achievement in the United States with that in other countries. The latest Trends in International Mathematics and Science Study (TIMSS:2007) allows comparisons of U.S. fourth and eighth graders with their counterparts in other countries. The Program for International Student Assessment (PISA:2006) provides test score data for 15-year-olds in the same subjects. These international assessments are both cross-sectional studies.
Mathematics and Science Performance
as Students Progress Through Elementary
and Middle Grades
ECLS-K has followed a group of students who first entered kindergarten in fall 1998 over 9 school years. (The mathematics and science education of students who are homeschooled is not addressed in this chapter; see sidebar "Homeschooling in the United States.") The study concluded in spring 2007, when most students were in eighth grade. The sample used in this analysis included roughly 8,000 students. ECLS-K is unusual among major national and international data collections not only in its focus on the earlier years of schooling but also because it allows researchers to examine students' performance in light of variables likely to influence learning. Cognitive tests measured students' mathematics knowledge in kindergarten and grades 1, 3, 5, and 8 and tracked their science understanding in grades 3, 5, and 8. The study also collected demographic and family information from a parent and surveyed teachers and schools for information about school environments, teacher qualifications, and classroom practices.
Gains in Mathematics Test Scores and Gap Changes. Students begin kindergarten with differing levels of mathematics skills, and researchers have suggested several factors that may be related to these initial gaps. A body of research has focused in particular on initial gaps between white and black children. The early home environment, including how well parents prepare children for school (e.g., time spent reading to them) plays a role (Magnuson, Rosenbaum, and Waldfogel 2008; Jencks and Phillips 1998). Other reasons posited include income and education differences among parents (Magnuson, Rosenbaum, and Waldfogel 2008; Campbell et al. 2008), school segregation (Vigdor and Ludwig 2008), access to effective and well-trained teachers (Corcoran and Evans 2008), ability to listen and concentrate, and children's fine motor skills, which need to reach a certain level of development for young children to learn to write and draw (Grissmer and Eiseman 2008).
Students' mathematics achievement was measured on a
single scale ranging from 0 to 174 throughout the study, allowing
the tracking of achievement growth and comparisons
between groups as children progressed through elementary
and middle grades. The 1998–99 kindergarten cohort started
school with an average mathematics score of 26 and gained
113 points by the spring of eighth grade, to 139 (table
For most characteristics, gaps widened during the early
years of school (when the overall score changes were greater)
and then stabilized or even narrowed slightly starting at
grade 3 or 5, when the rate of overall growth also declined.
Students' relative achievement when starting school had an
influence on growth and eventual grade 8 scores, shown by
the trajectories of those scoring in the lowest, middle two,
and highest quartiles in kindergarten (figure
In another example, white children scored 29 on the test given in the fall of their kindergarten year and Asians scored 30, compared with 22 for both black and Hispanic children. The gaps between white and black students and Hispanic and Asian students reached a certain point and then stabilized after grade 3.
Gaps based on a few characteristics narrowed a little in
later grades: English proficiency in kindergarten, primary
language spoken at home, and the white–Hispanic gap. See appendix table
Proficiency in Different Skill Areas. The ECLS-K test
data also indicate whether students were proficient in nine
mathematics skill areas. (The skills are arranged in a hierarchy
such that proficiency in a given area presumes proficiency
in the areas below it. See sidebar "Mathematics Skills
Areas Assessed" for definitions.) By eighth grade, nearly all
students were proficient in ordinality and sequence, addition
and subtraction, and multiplication and division (appendix table
Substantial differences among groups appeared in the three
highest skill areas—rate and measurement, fractions, and area
and volume—and differences grew as the difficulty level increased.
For example, 63% of students whose mothers had
a bachelor's degree were proficient in fractions, compared
with 16% of students whose mothers had not completed high
Differences by initial math skills in kindergarten were
also considerable for the highest skill area in which students
had reached proficiency by eighth grade (table
Some early low achievers in kindergarten did reach proficiency in high skill areas, however: 24% achieved proficiency with rate and measurement, 7% with fractions, and 2% with area and volume, the highest skill area assessed. Thus, although most initial low-scoring students progressed relatively slowly, some managed to overcome obstacles they had at school entry.
High mathematics scores in kindergarten were also
strong predictors of proficiency with higher-level mathematical
concepts in eighth grade. By grade 8, 37% of those who scored in the highest quartile in kindergarten had achieved
proficiency in all of the skill areas shown in table
Gains in Science Test Scores and Gap Changes. ECLS-K
science assessments were given in grades 3, 5, and 8 and,
as with mathematics, were measured on a single scale, in
this case from 0 to 111. The average science score in grade
3 was 51 points, increasing to 83 by grade 8. In general,
growth patterns were similar to those found with mathematics
over these higher grades: few changes in gap size, and
those changes that did occur were minimal (appendix table
Trends in Mathematics and Science Performance in Grades 4 and 8 Through 2007
NAEP includes two assessment programs. The national (or main) NAEP assesses national samples of 4th and 8th grade students at regular intervals and 12th grade students occasionally. These assessments are updated periodically to reflect contemporary standards of what students should know and be able to do in various subjects, including science and mathematics. Student achievement measured by NAEP is documented in an ongoing series of reports, The Nation's Report Card, that first began in 1969. A second testing program, the NAEP Long-Term Trend (LTT), is based on nationally representative samples of 9-, 13-, and 17-year-olds. The mathematics content framework for NAEP LTT has remained the same since it was first given in 1973, permitting analyses of trends over more than three decades.
This section briefly summarizes NAEP science trends—reported in detail in Science and Engineering Indicators 2008 (NSB 2008) and then focuses on the new mathematics score data for fourth and eighth graders in 2007 and on trends in these scores from 1990 to 2007. New data are neither available for 12th grade mathematics nor for science in any grade. The NAEP LTT scores in mathematics are also updated through 2008, for three age groups.
NAEP rates students' performance in two ways: average scale scores and the percentage reaching various achievement levels. Scale scores place students along a continuous scale based on their overall performance on the assessment. A single mathematics scale of 0 to 500 points covers both grades 4 and 8. See sidebar "Development and Content of NAEP Mathematics Assessments" for further information on the assessments' content and design. The NAEP website has a searchable database of released NAEP test items (http://nces.ed.gov/nationsreportcard/itmrls).
Science Performance. No new NAEP science data are available for any grade; a science assessment was conducted in early 2009 and data will be available in early 2010, too late for inclusion in this volume. As reported in Science and Engineering Indicators 2008 (NSB 2008), average NAEP science scores increased for 4th graders, held steady for 8th graders, and declined for 12th graders between 1996 and 2005 (NCES 2006a). Rising scores among lower-performing and average fourth graders were the primary drivers of the increase. The proportion of students reaching the proficient level for their grade in science held steady at grades 4 and 8, and declined a bit at grade 12. Proficiency rates were lower among 12th graders than among students in the lower grades.
Mathematics Performance of Fourth and Eighth Graders. The upward achievement trends that occurred
through 2005 on the NAEP fourth and eighth grade mathematics
tests continued with the 2007 tests. Between 1990
and 2007, the average mathematics score for fourth graders
rose from 213 to 240, and for eighth graders from 263 to 281
At both grade levels, students' scores increased in each of
the five content areas tested (number sense, properties, and
operations; measurement; geometry and spatial sense; data
analysis, statistics, and probability; and algebra and functions)
(Lee, Grigg, and Dion 2007). Performance also improved
across the achievement distribution in both grades,
with scores at five selected percentiles of the score distribution
(10th, 25th, 50th, 75th, and 90th) all increasing consistently
over these years (figure
Achievement trends for nearly all demographic groups
reflected the same upward movement (table
The scores of fourth graders in each racial/ethnic group
with 1990–2007 data available rose consistently over those
17 years. Black fourth graders had the largest score increase,
at 34 points (figure
NAEP 2009 results, released as this volume was going to press, show that the upward trend in fourth grade mathematics scores has halted, that mathematics scores of eighth graders have continued to improve, and that score gaps among racial/ethnic groups are unchanged (NCES 2009a).
Gaps in Mathematics Performance. In most years,
boys had marginally higher mathematics scores than girls,
and these gaps remained about equal over the 17-year period
Most gaps among racial/ethnic groups that existed in 1990 remained in 2007, but some have narrowed, especially in recent years. The average score gap between white and black fourth graders decreased from 32 to 26 scale points between 1990 and 2007. Among eighth graders, the gap increased from 1990 to 2000 but then decreased from 2000 to 2007. Similarly, the gaps between white and Hispanic students in both grades narrowed from 2000 to 2007.
Score gaps related to family income, as indicated by student eligibility for subsidized lunches, also shrank between 1996 (the first year available) and 2007, as well as between 2000 and 2007 for fourth graders. For eighth graders, the gap between low-income and other students was about the same in 1996 and 2007, with some fluctuations in between. It showed a decrease from 2000 to 2007.
Achievement is also measured in a different way from
the scale scores discussed above: the percentages of students
scoring at or above the basic and proficient levels and reaching the advanced proficiency level set by the NAEP
governing board. Students also improved steadily from
1990 to 2007 on this measure (figure
Long-Term Trends in Mathematics Performance. The NAEP Long-Term Trend assessment program has tested students ages 9, 13, and 17 in mathematics for more than three decades. LTT assessments differ from the main NAEP assessment, whose frameworks and tests are revised over time to follow changes in common curriculum at targeted grade levels, in that the LTT assessment for each grade level has tested the same knowledge and skills over time.
Since this testing program began, 9- and 13-year-olds
raised their scores, while 17-year-olds' scores were essentially
flat, with no difference between the first test score
(304) in 1973 and the most recent (306) in 2008 (appendix table
In each age group, black students gained more points than white students over the earlier part of the period, narrowing the gaps with whites. The gap between blacks and whites for 9-year-olds narrowed from 35 points in 1973 to 26 in 2008. For 13-year-olds, the gap decreased substantially, from 46 to 28 points. For both of the younger age groups, this narrowing occurred mainly through 1986; after that, both racial groups increased their scores at roughly similar rates. Among 17-year-olds, the 1973 gap between blacks and whites of 40 points decreased to 26 points in 2008, with the smallest gap appearing in 1990.
Hispanic students at all three ages gained more points over time than did whites on the mathematics assessments, particularly 13- and 17-year-olds. The score gaps with their white peers thus appeared to decrease, but none of those changes was significant, in part due to relatively small Hispanic sample sizes in some years.
Parents' educational attainment, a measure of socioeconomic status, was collected from 13- and 17-year-olds. At all levels of parental education, 13-year-olds' achievement increased over the 35 years, while 17-year-olds' performance improved only among students whose parents had not finished high school.
Two recent assessments place U.S. student achievement in mathematics and science in an international context: the Trends in International Mathematics and Science Study and the Program for International Student Assessment. TIMSS and PISA differ in several fundamental ways; see sidebar "Differences Between TIMSS and PISA Assessments." Reports on TIMSS and PISA test results typically compare U.S. performance with that of all participating countries or with that of all members of the Group of Eight (G-8) or Organisation for Economic Co-operation and Development (OECD) (Gonzales et al. 2008; Miller et al. 2009; Gonzales et al. 2004; Baldi et al. 2007). The differences in the characteristics of countries that participate in these two studies, however, confound comparisons between the United States' relative standing on the two assessments.
This section compares U.S. performance to that of a subset of nations that either have advanced economies that compete globally in fields related to science, technology, engineering, and mathematics (STEM) or have developing economies with rapidly growing capabilities in these areas. Most of the selected countries were included because of their current capabilities in science and technology. A few Asian countries that are seeking to develop such capacity were also included to highlight student performance in these highly dynamic countries. (This geographic focus is maintained where possible in the international sections of other chapters.) Not all of the 28 selected nations participated in each assessment, so the number available for comparison with the United States differs by test. Scores for all participating nations are shown in appendix tables.
Results from the two assessments are contradictory: U.S. average scores on TIMSS tend to place the United States around the middle of the group of selected nations, and in mathematics, the United States improved over time. In contrast, U.S. scores on PISA were generally near the bottom of the group, and the U.S. standing relative to other nations declined in both mathematics and science. Some of these performance differences may be explained by the differences in the tests and which countries participate (see sidebars "Differences Between TIMSS and PISA Assessments" and "Sample Items From TIMSS and PISA Assessments").
Mathematics Performance of U.S. Fourth
and Eighth Graders on TIMSS
The fourth grade TIMSS mathematics exam covers three content areas: number, geometric shapes and measures, and data display. The eighth grade assessment addresses four content domains: number, algebra, geometry, and data and chance.
Performance Trends. Over the 12 years since the first
TIMSS mathematics assessments in 1995, U.S. fourth and
eighth graders raised their scores and international ranking
(Gonzales et al. 2008). The fourth grade average of 529 in
2007 was 11 points higher than in 1995. For eighth graders,
the U.S. average of 508 in 2007 reflected a 16-point rise over
1995's score (figure
Not only did U.S. fourth graders' mathematics scores
increase, but the U.S. position relative to selected other nations
also shifted upward from 1995 to 2007. Of the selected
nations whose fourth graders participated in both the 1995 and 2007 TIMSS, four outscored the United States in 1995,
compared with three in 2007 (figure
Performance on the 2007 TIMSS Mathematics Tests. The fourth grade tests focused on three content domains: number, geometric shapes and measures, and data display (about half the assessment emphasized the number domain, including introductory algebra). For eighth grade, the four content domains were number, algebra, geometry, and data and chance. The cognitive domains addressed in TIMSS are the same for both grades—knowing, applying, and reasoning.
U.S. fourth graders' average score on the 2007 TIMSS
mathematics assessment (529) was just below the combined
average for 14 selected nations (534)
The U.S. eighth grade average mathematics score of 508
was also below the combined average (514) for 16 selected
nations and below 5 nations' individual averages (table
Although U.S. students as a whole did not lead the world
in TIMSS mathematics, two U.S. states that participated individually
(Massachusetts and Minnesota) provide examples of
high performance (see sidebar "Two States' Performance on
TIMSS: 2007"). Scores at the 90th percentile present another
way to examine high-achieving students (those who scored
higher than 90% of all test takers). In mathematics, the 90th
percentile score for U.S. fourth graders was 625, lower than
that of six other nations (table
Science Performance of U.S. Fourth
and Eighth Graders: TIMSS
Performance Trends. In contrast to the mathematics trends, which showed improvement in both grades, the average scores of U.S. students on the TIMSS science assessment have remained flat since 1995. Fourth graders have lost ground internationally, whereas eighth graders slightly improved their position relative to other nations (Gonzales et al. 2008). At fourth grade, the United States outperformed six of seven selected nations in 1995 but only two of them in 2007. In addition, the single comparison nation that did better than the United States in 1995 (Japan) was joined by Singapore and Hong Kong in 2007.
The trend in U.S. standing of eighth graders was slightly upward: nations scoring higher than the United States on the science assessment dropped from eight in 1995 to six in 2007. In addition, the United States had not outperformed any of the 10 other nations in 1995 but outscored 2 of them in 2007 (Sweden and Norway).
Performance on the 2007 TIMSS Science Tests. The fourth grade science tests focused on three content areas: life, physical, and earth sciences; and on three main skills: knowing, applying, and reasoning. At eighth grade, content areas expanded to four: biology, chemistry, physics, and earth sciences. The cognitive domains underlying test development were the same for both grades: knowing, applying, and reasoning. The fourth grade tests emphasize knowing more than the eighth grade tests, while reasoning is a greater focus in eighth grade.
On the 2007 TIMSS science test for fourth graders, four
of the comparison nations scored higher and six scored
lower than the United States, putting the United States just
above the middle of the group (table
The U.S. 90th percentile score for fourth graders was 643, ranking lower than in 2 other nations and higher than in 8, or above the midpoint for these 15 nations (Gonzales et al. 2008). The difference between Singapore (whose fourth graders led all countries) and the United States at the 90th percentile was 58 points. In eighth grade, U.S. students at the 90th percentile in science scored roughly in the middle of the group—lower than in six other nations and higher than in five. See sidebar "Linking NAEP and TIMSS Results."
Mathematics Performance of U.S. 15-Year-Olds: PISA
Performance Trends. In contrast to the TIMSS results, U.S. 15-year-olds' performance consistently dropped on the PISA tests of mathematical and scientific literacy in relation to student performance in other nations. The U.S. mathematics average of 474 in 2006 is 19 points lower than in 2000, when the first PISA exams were given, but changes in the tests mean that the scores cannot be directly compared ( OECD 2001; Baldi et al. 2007). While the United States scored below 7 nations in 2000, it scored below 15 nations in 2006 (of 19 nations with data available for both years).
Performance on the 2006 PISA Mathematics Test. PISA assesses 15-year-old students in all OECD nations and a range of other nations every 3 years on literacy in mathematics, science, and reading. The mathematics test covers four content areas: space and shape, change and relationships, quantity, and uncertainty. A main mathematics skill tested is problem solving (explored in greatest depth in 2003, when math was PISA's main focus). Sjøberg (2007) and Goldstein (2004) discuss PISA's content, including challenges and critiques.
On the most recent PISA tests, the U.S. score was 474,
below 18 of the selected nations' scores (table
The U.S. score at the 90th percentile in mathematics was 593, lower than that in 18 other nations that participated in the PISA exam and higher than in another 3 nations (Thailand, Indonesia, and Brazil) (Baldi et al. 2007). None of the OECD member nations had a lower 90th percentile score than the United States.
Science Performance of U.S. 15-Year-Olds: PISA
Performance Trends. The U.S. rank among selected nations declined on the PISA scientific literacy test, as on the mathematics assessment. In 2000, the United States scored below 6 other selected nations (out of 19 participating in both years), but in 2006, that number doubled to 12 (figure
Performance on the 2006 PISA Science Test. To measure scientific literacy, PISA includes three skill areas: identifying and understanding scientific issues, explaining phenomena scientifically, and using scientific evidence. Students were tested on their grasp of essential scientific concepts and theories in four content areas: physical systems, living systems, earth and space systems, and technology systems. Test items probed whether students understood how scientists obtain evidence (scientific means of inquiry) and how scientists use data. The test scores range from 1 to 1,000, and the mean for the 2006 science test was set at 500. The score scale is divided into six distinct proficiency levels that measure competence in science concepts and reasoning; each proficiency level encompasses roughly 75 points (OECD 2007). To put score differences in context, the average gain from one grade to the next was 38 points, or roughly half a full proficiency level. (This one-grade gain was measured using data from nations with sufficient numbers of 15-year-olds in two consecutive grades.)
The science literacy performance of U.S. 15-year-olds in
2006 placed the United States below 15 of 24 other nations
and above 4, far below the midpoint (table
The U.S. 90th percentile score in scientific literacy was 628, below the corresponding score in 10 of the 24 nations with data, but above it in 9, putting U.S. top-scoring students just below the middle of the 90th percentile science score distribution for these selected nations. Thus, U.S. high achievers in science placed in a better position relative to other countries than did U.S. students on average.