Changing U.S. Output of Scientific Articles: 1988-2003

Methodological Issues


Numerous technical issues arise in counting the journal articles that nations and institutions produce. Three such issues are especially consequential: the journal database, expanding and fixed journal sets, and whole counts versus fractional counts.

The Journal Database

The first issue is which articles to count. This report presents counts of S&E articles, notes, and reviews published in scientific and technical journals tracked by Thomson ISI[4] and indexed in the Science Citation Index and Social Sciences Citation Index. Counts exclude all letters to the editor, news pieces, editorials, and other content whose central purpose is not presentation or discussion of scientific data, theory, methods, apparatus, or experiments.

Thomson ISI tracks a set of more than 5,000 internationally recognized journals that it has determined to be the most influential in the world. S&E articles in these journals build primarily on one another to form the scientific literature. These journals contain approximately 15 million citations, about 85% of which are to articles in journals in the Thomson ISI database. Coverage extends to electronic journals, including print journals with electronic versions and electronic-only journals. Journals of regional or local importance may not be covered, which may be especially salient for research in engineering/technology, psychology, the social sciences, the health sciences, and the professional fields,[5] as well as for nations with a small or applied science base. Thomson ISI covers non-English language journals, but only those that provide their article abstracts in English, which limits coverage of non-English language journals.

Relative to other bibliometric databases, Thomson ISI indexes a wider range of S&E fields and contains more complete data on the institutional affiliations of an article's authors. For particular fields, however, other databases provide more complete coverage. Although the body of this report relies exclusively on Thomson ISI data, appendix table 1 contains comparable data from several other bibliometric databases; brief descriptions of those databases are provided in the table notes. These databases exhibited generally similar trends to the Thomson ISI of flattening U.S. output and continued growth by the EU-15 and Asian countries starting in the mid-1990s.

Expanding and Fixed Journal Sets

A second issue is how to deal with changes in the set of journals tracked by Thomson ISI. Over time, many new journals emerge and attain influence, while a few older journals decline or stop publication. Because the global S&E research enterprise is growing, the net direction of change is toward more articles and more journals in the Thomson ISI database. The database grew from 4,460 journals in 1988 to 5,262 in 2001, and many of the journals indexed published more articles per issue and more issues per year toward the end of the period than they did in previous years.[6]

At any given time, the expanding set of journals tracked by Thomson ISI is the most suitable indicator of the mix of journals and articles. Patterns of authorship and citation in this set reflect the fields, nations, and institutions in which high-quality research is being produced. However, an expanding set of journals poses problems for trend analyses. Changes in the expanding set over time can result not only from changes in how and where scientists and engineers perform research, but from changes in the journals Thomson ISI chooses to include or the depth of its coverage in different fields or languages. In addition, an expanding universe of articles makes changes in the shares attributable to different parts of the research community less readily interpretable.

One alternative analytic strategy is to follow a fixed set of journals that existed throughout the period under study. Changes over time within this set are likely to reflect real output changes rather than variation in the depth of Thomson ISI's coverage of different sources of output. However, comparisons within a fixed set of journals have a major limitation. Because new research communities often spawn new journals to disseminate their research findings, a fixed journal set underrepresents, perhaps severely, the types of research that were not already well established at the outset of the period. The longer the period being studied, the less adequate a fixed journal set becomes as a representation of the world's articles throughout the period.

In view of this limitation, and because the expanding set is more representative of the universe of high-quality research articles in any given year, this report presents data on the expanding journal set. SRS has conducted parallel analyses on both journal sets and found very similar patterns and trends. Appendix table 2 presents trend data drawn from a fixed set of journals restricted to those that were in the Thomson ISI data set from 1985 through 2003.

Fractional and Whole Counts

The third issue is how to attribute articles to nations, institutional sectors, institutions, and fields. The Thomson ISI database contains data on the institutional affiliations of the researchers who receive authorship credit for articles in the journals it indexes. However, apart from names and institutional affiliations, it contains almost no information about the authors themselves—their disciplines, citizenship, age, sex, rank, or status within their institutions, and so forth. Although descriptions of the data in this report may refer to "authorship," unless specified otherwise, this means institutional authorship—that is, the institutional affiliation(s) of the individual(s) in the list of authors.

SRS engages a contractor, ipIQ, Inc., to extract Thomson ISI data into a Science Indicators database. From this database, ipIQ generates article and citation counts and authorship information. To assign credit for contributions, ipIQ records each institutional address listed in an article's author list. It also records the number of author names listed on the article but does not link author names with institutions. For U.S. institutions, ipIQ classifies addresses by institutional sector: academia, federal government, state government, industry, nonprofit, and federally funded research and development centers (FFRDCs). Articles with a foreign institutional address are attributed to their country of origin but are not allocated to particular institutions or sectors. Finally, ipIQ assigns an article to a field of research on the basis of the journal in which the article appears; the field classification of the journal, in turn, is based on the patterns of the journal's citations. ipIQ's field classification is used in most of the analyses presented in this report.

When only one institution is credited as author of an article, attribution is simple: regardless of how many people collaborated in writing the article, the institution receives one credit for it. When articles are the product of collaboration between authors at different institutions, institutional credit may be assigned in two different ways: whole counting and fractional counting. Each has different advantages, and both are used in the analyses presented in this report.

In whole counting, each institution that appears in the author list receives one credit for an article. When articles are authored by collaborating institutions from multiple countries, each country receives one count for its participation, regardless of the number of its collaborating institutions. For example, the United States and France would each receive one credit for an article coauthored by one French institution and two U.S. institutions. Thus, the number of credits for an article varies, depending on the number of institutional authors or the number of countries represented among the collaborating institutions. When institutions collaborate, a single article is counted more than once. As a result, the sum of the whole counts attributed to institutions in a given U.S. institutional sector does not equal the whole count for the sector itself. Likewise, the sum of U.S. sector counts exceeds the U.S. country count, and the sum of country counts exceeds the world count.

In fractional counting, each article receives a single credit, regardless of how many institutions earn authorship credit. When more than one institutional author is involved, credit for the article is divided equally among the institutions that appear in the author list.[7] Thus, in collaborations among different institutions, each institution receives a "fractional" credit that represents its share of an article. In the prior example, the United States and France would receive 2/3 and 1/3 of a credit, respectively, for their co-authorship of the article. The same logic applies in dividing credit for cross-sectoral and international collaborations. Because each article is counted only once, the fractional counts for institutions within a U.S. institutional sector sum to the fractional count for the entire sector, and the fractional counts for the various sectors sum to the count for the entire country.

Whole counts are useful indicators of how often an institution, sector, or country is involved in producing articles. Fractional counts are useful in highlighting patterns and trends in the shares of credit attributable to different institutions, sectors, or countries. Neither method adequately captures the many factors that affect how the research community allocates credit for articles. Taken together, the two counting methods provide related, although sometimes different, perspectives on recent trends in the production of S&E articles in the United States and abroad.

[4] Thomson ISI changed its name to Thomson Scientific in June 2006. This article uses the company's name at the time of the study. All data from the Thomson ISI database presented in this report derive from the Science Indicators database prepared for the National Science Foundation by ipIQ, Inc. (formerly CHI Research, Inc.).

[5] The professional fields include communication, education, information and library science, law, management and business, miscellaneous professional fields, and social work.

[6] Among the journals included in the Thomson ISI database since 1985, the average annual number of articles per journal rose from 102 in 1986 to 142 in 1999.

[7] This method is used even for articles with a very large number of institutional addresses. Thomson ISI does not truncate the list of institutional addresses. ipIQ did truncate after 25 addresses between 1988 and 1991, but this practice affected less than 1% of articles during this period.

