Print this chapter (1.5MB)
Student Learning in Mathematics and Science
- Early Formal Learning: Kindergarten Through Third Grade
- Performance of U.S. Students in Grades 4, 8, and 12
- International Comparisons of Mathematics and Science Performance
The current performance of U.S. elementary and secondary students in mathematics and science is both encouraging and disappointing. Average mathematics scores on national assessments rose during the 1990s and early 2000s, and gains were widespread, with many demographic subgroups registering higher achievement. Performance in science has not improved recently, however. Substantial achievement gaps among some demographic subpopulations of students persist in both mathematics and science, and most 4th, 8th, and 12th grade students do not perform at levels considered proficient for their grade. On international assessments, recent data show that U.S. students performed above international averages that include scores from both developed and developing countries on tests closely aligned to the way mathematics and science are presented to them in the classroom. However, they performed below international averages for the 30 Organisation for Economic Co-operation and Development (OECD) nations in applying mathematical and scientific skills to situations they might encounter outside of a classroom.
This section presents information from recent national and international studies of U.S. student achievement in mathematics and science and compares them with earlier study results. It begins with a discussion of student performance during the primary grades, followed by a review of assessment results for students in grades 4, 8, and 12. The section ends by placing U.S. student achievement in a broader international context.
The mathematics and science performance of U.S. students in upper-elementary and secondary grades has been reported since the late 1960s (Campbell, Hombo, and Mazzeo 2000). Much less has been known about student learning in these subjects during the first years of formal education, but this is changing with the release of initial findings from an ongoing study of students who began kindergarten in 1998 (Early Childhood Longitudinal Study, Kindergarten Class of 1998–99, ECLS–K).
Kindergarten: Mathematics Skills and Knowledge
Children begin formal schooling with varying levels of mathematics skills, and over the course of the kindergarten year, the percentage of students proficient in specific skill areas increases (West, Denton, and Germino-Hausken 2000; West, Denton, and Reaney 2000). In 1998, most beginning kindergartners (93%) could recognize single-digit numbers and basic shapes in the fall, and almost all (99%) demonstrated these skills in the spring (figure
Disparities among subpopulations of students were evident when they started kindergarten. Mathematics performance was related to several student background factors, and the association between social disadvantages and performance was cumulative. Lower proportions of black and Hispanic students were proficient at each skill level compared with their white and Asian/Pacific Islander peers (appendix table
As students progressed through kindergarten, gaps in basic mathematics skills decreased, but disparities in the more sophisticated skills increased. For example, by the end of kindergarten, blacks and Hispanics narrowed the proficiency gap with whites and Asians/Pacific Islanders in recognizing single-digit numbers and shapes and in comparing the relative size of objects (figure
The First 4 Years of School
Mathematics. After 4 years of formal schooling, when most students were at the end of third grade, some performance gaps had widened (Rathbun and West 2004) (figure
Other research has shown that widening achievement gaps as students progress through school is, at least in part, a result of differential learning growth and loss during the summer (Alexander, Entwisle, and Olson 2001; Borman and Boulay 2004; Cooper et al. 1996). For example, although lower- and upper-income primary grade students made similar gains in mathematics during the school year, lower-income students experienced declines in mathematics skills during summer breaks, whereas higher-income students experienced gains (Alexander, Entwisle, and Olson 2001). These findings have been attributed to greater ability among higher-income parents to provide their children with mathematically stimulating materials and activities during the summer.
Studies of upper-elementary and secondary students dating back to the late 1960s have documented some sex differences in science and mathematics performance (e.g., Campbell, Hombo, and Mazzeo 2000; NCES 2003a and 2003b). The ECLS–K study, the first national study of primary grade students, found no sex differences in average overall mathematics performance during the first 4 years of schooling (Rathbun and West 2004; West, Denton, and Germino-Hausken 2000; West, Denton, and Reaney 2000). However, at the end of third grade, boys were more likely than girls to demonstrate proficiency in the advanced mathematics skills of place value concepts and knowledge of rate and measurement to solve word problems (appendix table
The ECLS–K study examined associations between mathematics performance and two aspects of students' early school experiences: whether they attended public or private schools, and whether they attended full- or half-day kindergarten. Performance differences in mathematics by school type were evident as students started formal schooling (West, Denton, and Germino-Hausken 2000). Students beginning kindergarten in private schools had stronger mathematics skills than those at public schools. Although achievement differences persisted through the third grade, the growth rate in mathematics did not differ. Therefore, performance gaps between public and private school students did not increase (Rathbun and West 2004). Students in full-day kindergartens experienced greater gains in mathematics compared with their peers in half-day classes (Watson and West 2004). At the end of third grade, however, the benefit of full-day kindergarten could no longer be detected (Rathbun, West, and Germino-Hausken 2004).
Science. The ECLS–K study began assessing students in science in spring 2002, when most were in third grade. The assessment placed equal emphasis on life science, earth and space science, and physical science and asked students to demonstrate understanding of the physical and natural world, make inferences, and understand relationships (Rathbun and West 2004). Students were also required to interpret scientific data, form hypotheses, and develop plans to investigate scientific questions. Performance gaps observed in mathematics were also generally found in science (appendix table
Many of the same performance gaps in mathematics and science achievement found among primary students also exist among upper-elementary and secondary students. Although mathematics performance in particular improved through the 1990s and early 2000s for many subgroups, substantial achievement gaps persist and, as will be detailed below, in some cases, have grown wider.
The National Assessment of Educational Progress (NAEP), also known as the "Nation's Report Card," has charted the academic performance of U.S. students in the upper-elementary and secondary grades since 1969. This volume reports on recent trends, from 1990 to 2003 for mathematics and from 1996 to 2000 for science. Previous Science and Engineering Indicators described long-term trends in mathematics and science results dating back to the first NAEP assessments. Long-term trends in mathematics achievement from the 2004 administration were released too late for the text of this chapter but are reviewed briefly in the sidebar "Long-term Trends in Student Mathematics Achievement" at the conclusion of this section.
The NAEP assessments are based on frameworks developed through a national consensus process that involves educators, policymakers, assessment and curriculum experts, and the public. The frameworks are then approved by the National Assessment Governing Board (NAGB) (NCES 2003a). The mathematics assessment contains five broad content strands (number sense, properties, and operations; measurement; geometry and spatial sense; data analysis, statistics, and probability; and algebra and functions). It also assesses mathematical ability (conceptual understanding, procedural knowledge, and problem solving) and mathematical power (reasoning, connections, and communication). The science framework includes a content dimension divided into three major fields of science (earth, life, and physical), and a cognitive dimension covering conceptual understanding, scientific investigation, and practical reasoning (NCES 2001).
Student performance on the NAEP is measured with scale scores as well as achievement levels. The scale scores place students on a continuous ability scale based on their overall performance. For mathematics, the scale ranges from 0 to 500 across the three grades. For science, the scale ranges from 0 to 300 within each grade.
The achievement levels are set by NAGB based on recommendations from panels of educators and members of the public, and describe what students should know and be able to do at the basic, proficient, and advanced levels (NCES 2003a). The basic level represents partial mastery of the knowledge and skills needed to perform proficiently at each grade level. The proficient level represents solid academic performance and the advanced level represents superior performance. This review of NAEP results focuses on the proficient level (for definitions of the proficient level for grades 4, 8, and 12, see sidebars "Proficient Level in Mathematics in Grades 4, 8, and 12" and "Proficient Level in Science in Grades 4, 8, and 12").
Disagreement exists about whether NAEP has appropriately defined these levels. A study commissioned by the National Academy of Sciences judged the process used to set these levels "fundamentally flawed" (Pellegrino, Jones, and Mitchell 1998), and NAGB acknowledges that considerable controversy remains over setting achievement levels (Bourque and Byrd 2000). However, both the National Center for Education Statistics (NCES) and NAGB believe the levels are useful for understanding trends in achievement. Nevertheless, they warn readers to use and interpret the levels with caution (NCES 2003a).
In this section, the NAEP results are examined in a number of ways, including changes in average scores and the proportion of students reaching the proficient level, both overall and among subgroups of students. In addition, achievement gaps between demographic subpopulations and changes in those gaps are reviewed. Examining a set of measures reveals more about student performance than examining just one measure (Barton 2004). For example, without examining changes in achievement for high-, middle-, and low-achieving students, it would be impossible to know whether a rise in average scores resulted from increased scores among only high-achieving students or whether it reflects broader improvements.
The average mathematics scores of fourth and eighth grade students increased from 1990 (the first year in which the current assessment was given) to 2003 (NCES 2001, 2003a) (figure
Improvements in average mathematics scores were generally mirrored by increases in the percentage of students scoring at or above the proficient level for their grade (figure
Although gains in mathematics achievement are encouraging, despite the improvements, most students do not demonstrate solid mathematics skills and knowledge for their grade. In the latest NAEP mathematics assessments (2003 for grades 4 and 8, and 2000 for grade 12), only about one-third of 4th and 8th graders, and even fewer 12th graders (16%), reached the proficient level (figure
Recent trend lines for science are shorter than those for mathematics, and they suggest less improvement. Although average mathematics scores of fourth and eighth grade Students increased from 1996 to 2000 (appendix table
In results similar to the 2003 mathematics findings, only about one-third of fourth and eighth grade students reached the proficient level in science for their grade in 2000 (figure
Achievement Gaps Between Demographic Subgroups
Gender Achievement Gaps. The most recent NAEP assessments report only small sex differences in mathematics and science performance at grades 4, 8, and 12, with boys performing slightly better than girls (appendix tables
Racial/ethnic Achievement Gaps. Substantial performance gaps exist between some racial/ethnic subgroups. At each grade level, white and Asian/Pacific Islander students performed better than black, Hispanic, and American Indian/Alaska Native students in both mathematics and science, both in terms of average scores and in percentage of students reaching the proficient level (figure
More subtle racial/ethnic differences in achievement were also observed. For example, Asians/Pacific Islanders demonstrated slightly higher performance than whites in mathematics at each grade level, but the reverse was true for science at grades 4 and 8. In addition, in some instances, American Indian/Alaska Native and Hispanic students registered slightly higher performances than did black students (see sidebar "Projected School-Age Population of the United States").
Family Income Achievement Gaps. Mathematics and science performance also differed by family income (as measured by whether or not a student was eligible for the free or reduced-priced school lunch program) (figure
Two mathematics and science assessments conducted in 2003 place U.S. student achievement in these subjects in an international context: the Trends in International Mathematics and Sciences Study (TIMSS) and the Programme for International Student Assessment (PISA). Results from the two assessment programs paint a complex picture. As detailed below, U.S. students scored above international averages on the TIMSS assessment and below international averages on the PISA assessment. The two programs are designed to serve different purposes, and each provides unique information about U.S. student performance relative to other countries in mathematics and science (Scott 2004). The differences in design and purpose of the assessments should be kept in mind when reviewing these divergent results.
One such difference is the grade/age of the students assessed. TIMSS provides data on mathematics and science achievement of students in primary and middle grades (grades 4 and 8 in the United States). PISA reports the performance of students in secondary schools by sampling 15-year-olds, an age near the end of compulsory schooling in many countries.
Another difference between TIMSS and PISA is the relationship of the assessments to mathematics and science curriculum. TIMSS measures student mastery of curriculum-based knowledge and skills. Mathematics and science content experts and educators from many countries developed the framework behind the TIMSS assessment, and representatives from each participating country were asked to review and comment. The goal is to assess the mathematics and science content and skills that students are taught in school. It is important to note that many of the participating countries have centralized, nationally mandated curriculums, whereas in the United States, curriculum, in the form of content standards, is developed at the state and local levels (Schmidt et al. 2001).
PISA , on the other hand, places more emphasis on Students' ability to apply scientific and mathematical concepts and thinking skills to problems they might encounter, particularly in situations outside of a classroom. To some degree, PISA mathematics questions tend to demand more complex reasoning and problem solving skills than those in TIMSS (Neidorf et al. forthcoming) (see sidebar "Sample Mathematics and Science Items From the Curriculum-Based TIMSS Assessment and the Literacy-Based PISA Assessment").
A third difference is the composition of the participating countries. The 46 countries participating in the 2003 TIMSS include 13 highly industrialized nations, as well as many industrializing and developing ones. TIMSS international averages are based on all of these participating countries. In contrast, the PISA results reviewed in this chapter are based on average scores from 30 OECD countries. Thus, although the TIMSS averages include scores from both developed and developing countries, the PISA averages reflect only the performance of industrialized countries. In addition to comparing the performance of U.S. students to these two sets of international averages, the text and tables
TIMSS 2003 Results for Students in Grades 4 and 8: Curriculum-Based Knowledge in Mathematics and Science
Curriculum-Based Mathematics Performance. In 2003, the average curriculum-based mathematics score of U.S. fourth and eighth grade students exceeded the TIMSS international averages for these two grades, which included scores from both developed and developing countries (Gonzales et al. 2004) (appendix tables
TIMSS also was conducted in 1995, permitting an examination of changes in performance over time. The average mathematics score of U.S. fourth graders on this curriculum-based assessment did not change from 1995 to 2003, but eighth graders' scores improved (data not shown, see Gonzales et al. 2004). Based on these results and on changes in average performance in some of the other countries (both improvement and decline), the relative ranking of the United States in mathematics declined slightly at grade 4 but improved slightly at grade 8.
Curriculum-Based Science Performance. Examination of science results shows that in 2003, the average science score of U.S. fourth and eighth grade students was higher than the TIMSS international averages, which were based on scores from both developed and developing countries (Gonzales et al. 2004) (appendix tables
Mirroring results for mathematics, average science scores of fourth graders did not change from 1995 to 2003, but science performance among eighth graders improved over this period (data not shown, see Gonzales et al. 2004). The relative ranking of U.S. students in science fell slightly between 1995 and 2003 for grade 4 but rose slightly for grade 8.
PISA 2003 Assessments of Mathematics and Science Literacy of 15-Year-Olds
Although TIMSS measures how well students have mastered the mathematical and scientific content presented in school, PISA assesses students' literacy in these subjects (Lemke et al. 2004). PISA uses the term literacy to denote the program's goal of assessing how well students can apply their knowledge and skills to problems they might encounter, particularly in situations outside of a classroom.
In 2003, U.S. 15-year-olds performed below the OECD average in both mathematics and science literacy (appendix tables
U.S. students' average science literacy scores did not change from 2000, the first year PISA was administered, to 2003 (data not shown, see Lemke et al. 2004). However, several other OECD countries registered improvements in science, and as a result, the relative position of the United States compared with the OECD average declined. In 2000, the average score of U.S. 15-year-olds' science literacy did not differ from OECD averages, but in 2003, it was lower. U.S. performance in mathematics did not change from 2000 to 2003, and in both years, the U.S. average fell below the OECD average.
 The ECLS-K assessment measures students' overall mathematics achievement through both scale scores and their specific mathematics skills and knowledge as measured through a set of proficiency scores. The scale scores place students on a continuous ability scale based on their overall performance on the assessment, whereas the proficiency scores are based on clusters of items assessing particular skills and report whether students mastered those skills. When describing gains over the kindergarten year, this review focuses on proficiency in specific areas. When reporting on growth in achievement from kindergarten to third grade, scale scores are discussed. For more information on the ECLS assessment battery and scoring, including the Item Response Theory (IRT) methodology used, see Rathbun and West (2004) and West, Denton, and Reaney (2000).
 The studies reviewed in this chapter report combined results for Asians and Pacific Islanders. It is important to note that this category combines groups that have very different cultural and historical backgrounds, and whose achievement varies widely.
 In later years of the ECLS-K study, family income below the federal poverty level was substituted for the welfare assistance risk factor. Students were classified as having no family risk factors, one risk factor, or two or more risk factors.
 About 10% of the cohort was in second grade, and another 1% was in another grade. For the sake of simplicity, the students in the 2002 followup are referred to as third graders.
 Trends in mathematics and science performance by gender are not easily summarized, with girls outperforming boys in some age groups and boys outperforming girls in other cases. See Science and Engineering Indicators – 2004, page 1-7, for more details on long-term trends in mathematics and science performance of males and females. See sidebar in this issue "Long-Term Trends in Student Mathematics Achievement."
 Students were identified as attending private schools continuously, attending public schools continuously, or attending a combination of private and public schools between the beginning of kindergarten and the end of third grade. There were no statistically significant differences in gains in average mathematics scores across these three groups.
 Because students have been assessed in science only once in the ECLS, the study has thus far produced less information on science learning. As of yet, only science scale scores have been reported. As the study continues to follow these students, future reports will likely provide more detail on science achievement.
 NAEP consists of three assessment programs. The long-term trend assessment is based on nationally representative samples of 9-, 13-, and 17-year-olds. It has remained the same since it was first given in 1969 in science and 1973 in mathematics, permitting analyses of trends over three decades. A second testing program, the national or main NAEP, assesses national samples of 4th, 8th, and 12th grade students. The national assessments are updated periodically to reflect contemporary standards of what students should know and be able to do in a subject. The third program, the state NAEP, is similar to the national NAEP but involves representative samples of students from participating states.
 These recent trends are based on data from the national NAEP program. The current national mathematics assessment was first administered in 1990 and was given again in 1992, 1996, 2000, and 2003. In 2003, only fourth and eighth grade students were assessed. The current national science assessment was first administered in 1996 and was given again in 2000 and 2005. The 2005 results were not available in time for inclusion.
 The 2002 and 2004 volumes reviewed trends in science from 1969 to 1999 and in mathematics from 1973 to 1999. The long-term trend assessment in mathematics was administered again in 2004, but those data were not released in time to be included in the text of this chapter (see sidebar "Long-Term Trends in Student Mathematics Achievement"). The long-term trend assessment in science has not been given since 1999.
 NAEP is in the process of changing the way it includes students with disabilities and limited English proficiency in assessments. Before 1996, these students were not allowed to use testing accommodations (e.g., extended time, one-on-one testing, bilingual dictionary); as a result, many did not participate. In 1996 and 2000, the assessment was administered to split samples of "accommodations not permitted" and "accommodations permitted." In 2003, the NAEP mathematics assessment completed the transition to an "accommodations permitted" test.
 Using eligibility for the free or reduced-price lunch program as a proxy for family poverty is not as reliable in the higher grades because older students may attach stigma to receiving a school lunch subsidy.
 Sample size was insufficient to permit reliable mathematics estimates for American Indian/Alaska Natives prior to 1996 for grades 4 and 12 and prior to 2000 for grade 8.
 NCES did not publish 2000 science scores for fourth grade Asian/Pacific Islander students because of accuracy and precision concerns; therefore, those scores are not included.
 In science, the apparent difference at grade 12 in average scale scores by gender was not statistically significant. However, a greater proportion of 12th grade boys reached the proficient level in science than did girls.
 The primary grade assessed in each country was "the upper of the two adjacent grades with the most 9-year-olds" (Mullis et al. 2005). In the United States, and most other countries, this was the fourth grade. The middle grade assessed was defined as the "upper of the two adjacent grades with the most 13-year-olds." In the United States and most countries, this was the eighth grade. Students in their final year of secondary school (12th grade in the United States) were assessed with TIMSS in 1995. For a review of those results, see page 1-14 in Science and Engineering Indicators – 2004 or Takahira et al. (1998). Subsequent TIMSS administrations have focused on the middle grades.
 To be assessed in TIMSS, the specific content domains and topics had to be included in the curricula of "a significant number of participating countries" (Mullis et al. 2005). It is important to note that whereas the TIMSS program identified common mathematics and science curriculum across participating countries, there are many differences in the way countries delivered that curriculum and in their breadth of coverage (Sherman, Honegeger, and McGivern 2003).
 Of the 14 other countries that participated in both the 1995 and 2003 grade 4 TIMSS mathematics assessments, the United States was outperformed by four countries in 1995 and by seven countries in 2003. Of the 21 other countries that participated in both the 1995 and 2003 grade 8 mathematics assessments, 12 had average scores higher than the U.S. average score in 1995 and 7 had higher scores in 2003.
 Of the 14 other countries that participated in both the 1995 and 2003 grade 4 TIMSS mathematics assessments, only 1 had a higher average score than the United States in 1995, but 2 did in 2003. At grade 8, of the 21 countries that participated in both years, 9 had higher average scores than the United States in 1995, whereas 5 did in 2003.
 Forty-one countries participated in the 2003 PISA assessment—30 OECD member countries and 11 non-OECD countries. This section summarizes a report released by NCES (2004c) that presents PISA results from a U.S. perspective. That report omitted data from the United Kingdom because of low response rates and from Brazil because these data were not yet available. That report and this section compare U.S. averages first to OECD averages (i.e., average of national averages from the 29 OECD countries for which data were available, including the United States) and, second, to individual country averages (both OECD and non-OECD countries).
 Data for both 2000 and 2003 are available for 26 OECD countries, including the United States. Of these countries, nine improved their science scores and five registered declines.
 Comparing change in mathematics performance is complicated by the fact that the 2003 PISA assessment was more extensive than the 2000 assessment. In 2000, two content areas were assessed: space and shape and change and relationship. In 2003, those two areas, along with two additional content areas (quantity and uncertainty) were tested. Thus, change in mathematics performance can be examined only for the two content areas assessed in both years. The average scores for U.S. students did not change from 2000 to 2003 on either the space and shape or the change and relationship content areas. Of the 25 other countries that participated in both assessment years, 18 outperformed the United States in the space and shape area in 2003 compared with 19 in 2000. In the change and relationship area, 17 countries outperformed the United States in 2003, and 14 did in 2000.