Comparability of Statistics

Current-Year Considerations
Revisions to Historical Statistics

Summarized in this section are the statistical revisions that have been made because of changes in survey procedures and practices[13]. This section is divided into two parts. The first focuses on the current-year survey with a discussion of recent survey improvements and the effects these have had on current-year and immediate prior-year statistics. The second part describes revisions made to statistics produced from pre-1992 surveys.

Current-Year Considerations

Recent Survey Improvements[14]
Before the 1992 survey, the sample of firms surveyed was selected at irregular intervals[15]. In intervening years a panel of the largest firms known to perform R&D was surveyed. For example, a sample of about 14,000 firms was selected for the 1987 survey. For the 1988 through 1991 studies, about 1,700 of these firms were annually resurveyed; the other firms did not receive another questionnaire, and their R&D data were estimated. This sample design was adequate during the early years of the survey because the performance of R&D remained concentrated in the manufacturing industries. However, as more and more firms began entering the R&D performing arena, the old sample design proved increasingly deficient because it did not capture births of new R&D performing firms; the entry of fledgling R&D performers into the marketplace simply was missed during panel years. Additionally, beginning in the early 1970s, the need for more detailed R&D information for nonmanufacturers was recognized. At that time, statistics for the broad industry classifications "miscellaneous business services" and "miscellaneous services" were added to the list of industry groups for which statistics were published. By 1975 about 3 percent of total R&D was performed by firms in nonmanufacturing industries.

During the mid-1980s there was evidence that an increasing number of nonmanufacturing firms were conducting a significant amount of R&D, and again the number of industries used to develop the statistics for nonmanufacturers was increased. Consequently, the annual reports in this series for 1987 and since have included separate R&D estimates for firms in the communication, utility, engineering, architectural, research, development, testing, computer programming, and data processing service industries; hospitals; and medical labs. Approximately 9 percent of the estimated industrial R&D performance during 1987 was undertaken by nonmanufacturing firms.

In addition to adding to the list of industries for which statistics were published, it became clear from these observations that the sample design itself should be changed to reflect the widening population of R&D performers among firms in the nonmanufacturing industries and small firms in all industries, to account better for births of R&D performing firms and to produce statistics that are generally more reliable. So, beginning with the 1992 survey, NSF decided to (1) draw new samples with broader coverage annually and (2) increase the sample size to approximately 23,000 firms[16]. As a result of the sample redesign, for 1992 the reported nonmanufacturing share was estimated to be 25 percent of total R&D[17].

Revisions to Immediate Prior-Year Statistics
As has been the practice throughout the history of the survey, results from the current-year survey are used not only to develop current-year statistics, but also to revise immediate prior-year statistics. Differences between originally developed statistics and revised statistics occur for three reasons: industry shifts, data revisions, and, of particular importance in the discussion of the 1992 survey results, the effects of a new sample. Table B-11 quantifies these effects for each industry and industry grouping.

Industry shifts. The movement of a company from one industry into another can be caused by several factors: changes in a company's payroll composition, which is used to determine the industry classification code (see discussion above under "Frame Creation"), changes in the industry classification system itself, and changes in the way the industry classification code is assigned or revised during survey processing. These are described below.

Payroll composition. A company's payroll composition changes because of a number of events. Among them are (1) the growth or decline of product or service lines; (2) the merger of two or more companies; (3) the acquisition of one company by another; (4) divestitures; or (5) the formation of conglomerates. With annual sampling, when it is determined that a company's payroll composition and therefore its industry classification has changed, the company's data are reclassified into the new industry beginning in the year of the change. Prior to annual sampling, firms were not subject to annual reclassification. Most of the shifts in R&D performance between industries detailed in table B-11 undoubtedly stemmed from changes in companies' payroll composition.

Industry classification system. From time to time the standard industrial classification (SIC) coding system, which is used by most Federal Government agencies that publish industry statistics, is revised to reflect the changing composition of U.S. industry. For statistics developed for 1988-91 from the 1988-91 surveys, companies retained the industry classifications assigned for the 1987 sample. These classifications were based on the 1977 SIC system. The last major revision of the SIC system was for 1987, so this new system was used to classify companies in the 1992 survey. Consequently, the 1992 statistics and revised 1991 statistics in this report were developed using the 1987 SIC system and minor data shifts are attributable to the system change. For example, the 1987 system expanded SIC 30, rubber products, to include a variety of specific plastic products that may have been classified elsewhere using the 1977 system.

Processing changes.Finally, in response to perceived changes in the amount and dispersion of R&D among industries and findings of various quality improvement initiatives and other research undertakings, the sponsor of the survey, in consultation with the compiling agent, from time to time seeks to improve the coverage of the survey by revising the method used to classify firms. Research has shown that there is no impact on the aggregated statistics because of these processing changes and the impact on individual industry estimates is minor[18]. The current method used to classify firms is discussed above under "Frame Creation." Methods used for past surveys are discussed in the technical paper cited below[19].

As table B-11 shows, in the aggregate, industry shifts had no effect on the revised 1991 estimate of total R&D. However, the effects are evident among the industry groupings. Most affected were statistics for the electrical equipment (SIC 36), transportation equipment (SIC 37), and nonmanufacturing industries. Approximately $6.9 billion of R&D previously reported for manufacturing industries was shifted to nonmanufacturing industries in the revised 1991 statistics.

Data revisions. Changes to reported data can come from two sources: from respondents (see discussion above under "Survey Questionnaires") and from analysts involved in survey and statistical processing. Respondents from companies that were in both the 1991 and 1992 surveys may have revised previously reported data for 1991. Analysts, while performing followup, may have corrected incorrectly reported or supplied missing 1991 data. Data revisions accounted for $1.0 billion or 6.8 percent of the $14.7 billion revision to the 1991 estimate of total R&D.

Sample design. Changes to the sample design can dramatically affect revisions to immediate prior-year estimates. By far the most profound influence on the revisions to the 1991 statistics was the new sample design. It accounted for $13.7 billion or 93.2 percent of the $14.7 billion revision to total R&D with most of this amount ($11.4 billion) attributable to the wider sampling of the nonmanufacturing industries[20].

To summarize, differences between originally published and revised 1991 statistics stem from industry shifts, data revisions, and the new sample. Of the three, the new sample had the largest effect because it included a larger number of industries, especially among nonmanufacturing classifications, and potentially a larger number of firms, especially small firms, in all classifications. Comparing the 1991 panel and the 1992 sample, the sample reflected changes in the universe that could not be accounted for by the panel. The frame for the sample included industries that were not represented previously under the assumption that companies in those industries contributed little or no R&D activity[21]. Further, data for small R&D performers were imputed for the panel years and used in the original 1991 statistics. The revised 1991 estimates from the 1992 survey included actually reported data for many more small companies.

Linking Current-Year Statistics with Statistics from Previous Surveys

Time Series Analyses
As discussed earlier, the statistics resulting from the survey are better indicators of changes in, rather than absolute levels of, R&D spending and personnel. Nevertheless, the statistics are often considered as a continuous time series that has been prepared using the same collection, processing, and tabulation methods. Such uniformity of perparation has not been the case. Since the survey was first fielded, improvements have been made to increase the reliability of the statistics and to make the survey results more useful. To that end, existing practices have been changed and new procedures have been instituted. Preservation of the comparability of the statistics has been an important consideration when improvements have been made, however. Changes to survey definitions, the industry classification system, and the procedure used to assign industry codes to multiestablishment companies[22] have had some, though not substantial, effects on the comparability of statistics[23]. The aspect of the survey that had a greater effect on comparability was the selection of samples at irregular intervals (i.e., 1967, 1971, 1976, 1981, 1987, 1992) and the use of a subset or panel of the last sample drawn to develop statistics for intervening years. As discussed above, this practice introduced cyclical deterioration of the statistics.

To compensate for this deterioration, periodic revisions have been made to the statistics produced from the panels surveyed between sample years. Early in the survey's history, various methods were used to make these revisions[24]. Since 1976, a linking procedure called "wedging" has been used[25]. Simply described, in wedging the 2 sample years on each end of a series of estimates serve as benchmarks in the algorithms used to adjust the estimates for the intervening years.

Wedging Methodology
For a full discussion of the mathematical algorithm used for the wedging process that linked statistics from the 1992 survey with those from the 1987 survey, see the technical memorandum cited below[26]. In general, the memorandum states that wedging

takes full advantage of the fact that in the first year of a new panel [when a new sample is selected], both current year and prior-year estimates are derived. Thus, two independent estimates exist for the prior year. The estimates from the new panel are treated as superior primarily because the new panel is based on updated classifications [the industry classifications in the prior panel are frozen] and is more fully representative of the current universe (the prior panel suffers from panel deterioration, especially a lack of birth updating). The limitations in the prior panel caused by these factors are naturally assumed to increase with time, so that in the revised series, we desire a gradual increase in the level or revision over time which culminates in the real difference observed between the two independent sample estimates of the prior year. At the same time, we desire that the annual movement of the original series be preserved to the degree possible in the revised series.

To that end, the wedging algorithm does not change estimates from sample years and adjusts estimates from panel years, recognizing that deterioration of the panel is progressive over time.

Wedged Versus Not-Wedged Statistics
One of the primary reasons for the decision to select a new sample annually rather than at irregular intervals was to avoid the necessity to apply global revision processes like wedging. Consequently, the 1992 survey is intended to be the last one for which wedging is an issue. For users who are interested, 18 of the detailed statistical tables in section A are reproduced below. Tables N-1 through N-18 are identical to the section A tables except that they contain statistics that are not wedged for 1988-90.

Revisions to Historical Statistics

Throughout the history of the survey, during regular survey processing, all immediate prior-year statistics have been subject to revision with results from the current year's survey. Changes to older statistics, however, usually have been limited to revisions because of changes in the industry classification of companies caused by changes in payroll composition detected when a new sample was drawn. Various methodologies have been adopted over the years to revise, or backcast, the data when revisions to historical statistics have become necessary.

Documented revisions to the historical statistics from post-1967 surveys are summarized in Research and Development in Industry: 1991 (NSF 94-325). Detailed descriptions of the specific revisions made to the statistics from pre-1967 surveys are scarce. However, summaries of some of the major revisions are included in the technical paper cited below[27].

