[11] Survey items that test factual knowledge sometimes use readily comprehensible language even at the cost of some scientific imprecision. This may prompt some highly knowledgeable respondents to feel that the items blur or neglect important distinctions, and in a few cases may lead respondents to answer questions incorrectly. In addition, the items do not reflect the ways that even established scientific knowledge evolves as scientists accumulate new evidence. Although the text of the factual knowledge questions may suggest a fixed body of knowledge, it is more accurate to see scientists as making continual, often subtle, modifications in how they understand existing data in light of new evidence.

[12] Early NSF surveys used additional factual knowledge indicators, which were combined to form an aggregate indicator. Bann and Schwerin (2004) performed statistical analyses on this and other groups of indicators to produce shorter scales that involved fewer questions and required less time to administer, but were functionally equivalent to the scales that used additional items (e.g., had similar measurement properties and yielded performance patterns that correlated with similar demographic characteristics). For factual knowledge, Bann and Schwerin produced two alternative scales that, except for one item, used identical questions. One of these scales was administered in 2004, and the other was substituted in 2006. Appendix table 7-4 presents trend data using each scale. To enable aggregated comparisons of 2004 and 2006 results, it includes the average numbers of correct answers to the group of overlapping items from those 2 years.

[13] The two nanotechnology questions were asked only of respondents who said they had some familiarity with nanotechnology, and a sizable majority of the respondents who ventured a substantive answer (i.e., not "don't know") answered the questions correctly. To measure nanotechnology knowledge more reliably, researchers would prefer a scale with more than two questions.

[14] Even small, apparently nonsubstantive differences in question wording can affect survey responses. U.S. surveys, for example, have asked respondents whether or not it is true that "it is the father's gene that decides whether the baby is a boy or a girl." In contrast, the 2005 Eurobarometer asked whether or not it is true that "it is the mother's genes that decide whether the baby is a boy or a girl." To a scientifically knowledgeable respondent, these questions are equivalent. To other respondents, however, they may not be. Research has shown that some survey respondents have an "acquiescence bias"—when given the opportunity to do so, they tend to provide positive responses to questions and are therefore more likely to answer true than false (Schaeffer and Presser 2003). Thus, the U.S. question is probably easier to answer correctly than the Eurobarometer question; in other words, in two equally knowledgeable populations, more people would get the U.S. question right. Although Americans score better on this topic than Europeans, it is possible that this has as much or more to do with acquiescence bias as it does with scientific knowledge.

[15] In its own international comparison of scientific literacy, Japan ranked itself 10th among the 14 countries it evaluated (National Institute of Science and Technology Policy 2002).

[16] Early NSF surveys used additional questions to measure understanding of probability. Through a process similar to that described in endnote 12, Bann and Schwerin (2004) identified a smaller number of questions that could be administered to develop a comparable indicator. These questions were administered in 2004 and 2006, and appendix tables 7-9 and 7-10 record combined probability responses using these questions; appendix table 7-9 also shows responses to individual probability questions in each year.