banner

Section B.
Technical Notes


Survey Methodology[1] top

Reporting Unit top

The reporting unit for the Survey of Industrial Research and Development is the enterprise, firm, or company, all used synonymously, and defined as a business organization of one or more establishments under common ownership or control. The survey includes two groups of enterprises: (1) companies known to conduct R&D, and (2) a sample representation of companies for which information on the extent of R&D activity is uncertain.

Frame Creation top

The Standard Statistical Establishment List (SSEL), a Bureau of the Census compilation that contains information on more than 3 million establishments with paid employees, was the target population from which the frames used to select the 1995 and 1996 survey samples were created (see tables B-1 and B-1a for target population and sample sizes). For companies with more than one establishment, data were summed to the company level. The firm was then assigned a single standard industrial classification (SIC) code based on the activity of the establishment(s) having the highest dollar value of payroll. This assignment was done on a hierarchical basis. The enterprise was first assigned to the economic division (manufacturing or nonmanufacturing) with the highest payroll, then to the two-digit SIC code with the highest payroll within the assigned division, then to the three-digit SIC code with the highest payroll within the assigned two-digit industry.

The frames from which the survey samples were drawn included all for-profit companies classified in nonfarm industries. For surveys prior to 1992, the frame was limited to companies above certain size criteria based on number of employees.[2] These criteria varied by industry. Also, some industries were excluded from the frame because it was believed that these industries contributed little or no R&D activity to the final survey estimates. For the 1992 sample, new industries were added to the frame[3] and the size criteria were lowered considerably and applied uniformly to firms in all industries. As a result, nearly 2 million enterprises with 5 or more employees were given a chance of selection. For comparison, the frame for the 1987 sample included 154,000 companies of specified sizes and industries. The frames used to select the 1995 and 1996 samples were similar to the ones used to select the 1992, 1993, and 1994 samples.

A fundamental change, initiated in 1995 and repeated in 1996, was the redefinition of the sampling strata. For the survey years 1992 through 1994, 165 sampling strata were established-each stratum corresponding to one or more three-digit level SIC codes. The objective was to select sufficient representation of industries to determine whether alternative or expanded publication levels were warranted. The strata for the 1995 and 1996 surveys were defined to correspond to publication level industry aggregations. A total of 40 such levels were defined, corresponding to the original 25 groupings of manufacturing industries used as strata in sample designs before 1992 and to 15 new groupings of nonmanufacturing industries. Companies were assigned to strata based on their three-digit SIC codes.

The criteria for identifying companies selected with certainty for the survey were further modified in 1996. With a fixed total sample size, there was some concern that the representation of the very large noncertainty universe by a smaller sample each year would be inadequate. Prior to 1994, companies with 1,000 or more employees had been selected with certainty, but it was observed that the level of spending varied considerably and that many of these companies reported no R&D expenditures each year. For these reasons, beginning in 1995, these companies were given chances of selection based upon the size of their R&D spending if they were in the previous survey or upon an estimated R&D value if they were not. To further limit the growth occurring each year in the number of certainty cases within the total sample, the certainty criterion (the size of their R&D spending) was raised for the 1996 survey from $1 million to $5 million.

The partitioning of the frame into "large" and "small" company components and the use of simple random sampling (SRS) for the small company partition were retained for 1995, but the method of partitioning was changed for 1996. This feature was first introduced in 1994 because of concern in a study of 1992 survey results, which showed that a disproportionate number of small companies were being selected for the sample, often with very large weights. These small companies seldom reported R&D activity. This disproportion was a result of the minimum probability rule (see below) used as part of the independent probability proportionate to size (PPS) sampling procedure employed exclusively prior to 1994. This rule increased the probabilities of selection for several hundred thousand of these smaller companies. With SRS, these smaller companies can be sampled more efficiently than with independent PPS sampling since there is little variability in their size.

For 1995, total company payroll was the basis for the split between "large/small" partitions. For each industry grouping, the largest companies representing the top 90 percent of the total payroll for the industry grouping were included in the PPS frame. The balance of smaller companies comprising the remaining 10 percent of payroll for the industry grouping were included in the SRS frame. A benefit of this design change was a reduction in the maximum allowable weight for selected companies (weighting and maximum weights are discussed below).

For 1996, total company employment was the basis for the split between partitions. The total company employment levels defining the partitions were based on the relative contribution to total R&D expenditures of companies in different employment size groups in both the manufacturing and nonmanufacturing sectors. In the manufacturing sector, all companies with total employment of 50 or more were included in the large company partition. In the nonmanufacturing sector, all companies with total employment of 15 or more were included in the large company partition. Companies in the respective sectors with employment below these values were included in the small company partition. The large company partition contained about 560,000 companies and the small company partition about 1.3 million companies. These counts were comparable to those in the 1995 partition (656,000 and 1.2 million, respectively).

One final modification in the frame development for 1996 was the designation of "zero industries" in the large company partition. Zero industries were those three-digit SIC industries having no R&D expenditures reported in the survey years 1992-94-the years when estimates by three-digit SIC industry were formed. It was decided to keep these industries in the scope of the survey, but to draw only a limited sample from them since it seemed unlikely that R&D expenditures would be reported. SRS was used to control the number of companies selected within these industries.

Sample Selection top

Probability Proportionate to Size top

For 1995, the distribution of companies by payroll and estimated R&D in the large partition of the sample was skewed as in earlier frames. Because of this skewness, pps sampling used in previous designs was an appropriate selection technique for this group. That is, large companies had a higher probability of selection than did small companies. For this survey it would have been ideal if company size could have been determined by its R&D expenditures. Unfortunately, except for the companies that were in a previous survey or for which there was information from external sources, it was impossible to know the R&D expenditures for every firm in the universe. Consequently, the probability of selection for most companies was based on estimated R&D expenditures.

Since total payroll was known for each company in the universe, it was possible to estimate R&D from payroll using relationships derived from 1995 survey data. Imputation factors relating these two variables were made for each industry grouping. To impute R&D for a given company, the imputation factors were applied to the company payroll in each industry grouping. A final measure was obtained by adding the industry grouping components. The effect, in general, was to give firms with large payrolls higher probabilities of selection in agreement with the assumption that larger companies were more likely to perform R&D.

Estimated R&D values were computed for companies in the small company partition as well. The aggregate of reported and estimated R&D from each company in both the large and small company partitions represented a total universe measure of 1995 R&D expenditures. However, assigning R&D to every company resulted in an overstatement of this measure. To adjust for the overstatement, the universe measure was scaled down using factors developed from the relationship of the universe measure of 1994 R&D and the 1994 survey estimate. These factors, computed at levels corresponding to published industry levels, were used to adjust originally imputed R&D values so that the new frame total for R&D at these levels approximated the 1994 published values. This adjustment provided for better allocation of the sample among these levels.

A significant revision in the procedure for selecting samples from the partitions changed the development and presentation of estimates from the 1996 survey. A sample of companies in the large company partition was selected using PPS sampling in each of the 40 strata as in 1995. The sample of companies in the small company partition was selected using SRS in only 2 strata rather than 40 as in 1995. Companies classified in manufacturing industries were selected to represent the group of all manufacturing industries rather than each manufacturing industry group. Likewise, companies classified in nonmanufacturing industries were selected to represent the group of all nonmanufacturing industries.

The purpose of selecting small companies from only two strata was to reduce the variability in industry estimates contributed from the random year-to-year selection of the companies in an industry and the associated high sampling weights. Consequently, estimates for industry groups within manufacturing and non-manufacturing are not possible from these two strata. The statistics for the detailed industry groups are based only on the sample from the large company partition. Estimates from the small company partition are included in statistics for total manufacturing, total nonmanufacturing, and all industries. For completeness, the estimates also are added to the categories "other manufacturing" and "other nonmanufacturing."

Simple Random Sampling top

Only two strata were defined for samples in the small company partition, manufacturing and nonmanufacturing. The use of SRS implied that each company within a stratum had an equal probability of selection. The total sample allocated to the small company partition was dependent upon the total sample specified for the survey and upon the total sample necessary to satisfy criteria established for the large company partition. Once determined, the allocation of this total by stratum was made proportionate to the stratum's payroll contribution to the entire partition.

Sample Stratification and Relative Standard Error Constraints top

The particular sample selected for each survey year was one of a large number of the same type and size that by chance might have been selected. Statistics resulting from the different samples would differ somewhat from each other. These differences are represented by estimates of sampling error. The smaller the sampling error, the more precise the statistic.

The large company partition was of primary concern, since it was believed that nearly all of the R&D activity would be identified from this sector. To control sampling error in the statistics resulting from this portion of the frame, parameters were specified to allocate the sample across various levels, or strata, that corresponded to the 40 industry groupings discussed earlier. These parameters permitted the sample size to be varied to achieve a desired level of sampling error for each stratum and were assigned so that estimated errors of total R&D expenditures for industries in these strata did not exceed certain levels. Sample sizes among the strata were constrained only by the limit placed on the total sample size dictated by the available budget.

The practice, first implemented in the 1995 survey and continued in the 1996 survey, of establishing sampling strata corresponding to published industry groupings meant that more efficient samples could be selected for these groups than had resulted when using the 165-strata design. Even the expansion of the number of nonmanufacturing publication groupings resulted in fewer sampling strata. The earlier designs defined 25 strata of three-digit-SIC manufacturing industries, but published only one category of nonmanufacturing industries. In the 1995 and 1996 designs, 15 nonmanufacturing strata were defined for sampling and for publication levels. Since there was no mandate in either year to make a major reduction in the 1994 sample size of 17,600 for the large company partition, it was possible to establish much tighter relative standard error constraints on the smaller number of sampling strata. Thus, in 1996, 33 strata were assigned a relative standard error constraint of 1 percent while 7 strata were assigned a relative standard error constraint of ½ percent. These constraints resulted in an expected sample size of about 8,900 companies from the large company partition. The minimum probability rule (see below) was adjusted so as to raise the expected sample size closer to the 18,000 level.

A limitation of the sample allocation process for the large partition should be noted. The sampling errors used to control the sample size in each stratum are based on a universe total that, in large part, was improvised. That is, as previously noted, an R&D value was assigned to every company in the frame, even though most of these companies actually may not have had R&D expenditures. The value assigned was imputed for the majority of companies in the frame and, as a consequence, the estimated universe total and the distribution of individual company values, even after scaling, did not necessarily reflect the true distribution. Estimates of sampling variability were nevertheless based on this distribution. The presumption was that actual variation in the sample design would be less than that estimated, because many of the sampled companies have true R&D values of zero, not the widely varying values that were imputed using total payroll as a predictor of R&D. Previous sample selections indicate that in general this presumption holds, but exceptions have occurred when companies with large sampling weights have reported large amounts of R&D spending. Thus, in general, the 1-percent and ½-percent error levels described earlier are conservative. See tables B-2 and B-2a for the actual standard error estimates for selected items by industry.

For the 1995 small company partition, the same 40 strata were identified. Also included was a separate stratum of approximately 6,260 companies that could not be classified into an SIC code and therefore could not be assigned to a stratum because of incomplete industry identification in the SSEL. As was done for 1994, a small number of companies was selected from this group in the hopes that an accurate industry identification could be obtained at a later point. The initial sample size specified for the small company partition was 5,500 com- panies. The sample initially allocated to a given stratum was proportionate to its share of total payroll for the small partition. For the 1996 small company partition, two strata (manufacturing and nonmanufacturing) were identified. As for 1994 and 1995, a small number of companies was selected from the group of unclassifiable companies. Ultimately, a final sample of 6,466 companies was selected from the small company partition. The sample initially allocated to the two strata was proportionate to its share of total payroll for the small company partition.

In addition to sampling error, the estimates are subject to nonsampling error. Errors are grouped into five categories: specification, coverage, response, nonresponse, and processing. For detailed discussions on the sources, control, and measurement of each of these types of error, see the technical reports.[4]

Sample Size top

The target sample size initially specified for the 1995 and 1996 surveys was 24,000 companies, and, as described above, was based primarily on compliance with predetermined sampling error constraints established for the large partition. The actual sample size for 1995 was 23,752 and for 1996 was 24,964 companies. These samples differed from the target for several reasons. First, the frames for the large company partition in both samples were subjected to independent sampling. Each company in the frames had an independent chance of selection, based on its assigned probability, i.e., selection of a company was completely independent of the selection of any other company. In independent (or Poisson) sampling, sample size itself is a random variable and the actual sample size will vary around the target or "expected" sample size. Theoretically, a sample of size zero or a sample the size of the entire universe is possible, but the probabilities of these extremes are so small that these are nearly impossible situations. In strata where the expected sample size is 50+, the actual sample probably will be within a fairly narrow range so that increased variability is not a real problem. However, in strata where the expected sample is small (i.e., less than 10) it is possible to grossly over or undersample the strata. In practice, the size of the originally drawn sample is usually quite close to the specified size. However, if there is too much deviation, the selection can be repeated until it is closer to the target.

Second, a minimum probability rule was imposed for both partitions in both the 1995 and 1996 samples. As noted earlier, for the large company partition, probabilities of selection proportionate to size were assigned each company, where size is the reported or imputed R&D value assigned to each company. Selected companies received a sample weight that was the inverse of their probability of selection. Selected companies that ultimately report R&D expenditures vastly larger than their assigned values can have adverse effects on the statistics, which are based on the weighted value of survey responses. To lessen the effects on the final statistics, the maximum weight of a company was controlled by specifying a minimum probability that could be assigned to the company. If the probability, based on company size, was less than the minimum probability, then it was reset to this minimum value. The consequence of raising these original probabilities to the minimum probability was to raise the expected sample size. Similarly, a maximum weight for each stratum was established for SRS of the small partition. If the sample size initially allocated to a stratum resulted in a stratum weight above this maximum value, then the sample size was increased until the maximum weight was achieved. It is likely that most of the difference between the size of the target sample and the sample actually selected was because of the minimum probability rule.

Third, between the time that the frame was created and the survey was prepared for mailing, the operational status of some companies changed. That is, they were merged with or acquired by another company, or they were no longer in business. Before preparing the survey for mailing, the operational status was updated to identify these changes. As a result, the number of companies mailed a survey form was somewhat smaller than the number of companies initially selected for the survey.

And finally, for 1995, a minimum sample size was established for each stratum of the small company partition. If the proportionately allocated sample size fell below the minimum value for a given stratum, then the sample size was set equal to this value. For 1996, the definition for the small company strata was changed (discussed under "Frame Creation" above) and collapsed to the manufacturing and nonmanufacturing levels. Separate samples were selected for both of these small company strata. Because only two samples were drawn from these strata, compared with the 25+ that were drawn for the 1995 sample, the minimum sample size constraint was not necessary for 1996.

Weighting and Maximum Weights top

Weights were applied to each company record to produce national estimates for both 1995 and 1996. Within the PPS partitions of the samples, company records were given weights up to a maximum of 50; for companies within the SRS partitions of the samples, company records were given weights up to a maximum of 300.

Survey Questionnaires top

Two questionnaires are used each year to collect data for the survey. For large firms known to perform R&D, a detailed questionnaire, form RD-1L, is used to collect data for odd-numbered years and an abbreviated version, form RD-1S, is used to collect data for even-numbered years. The questionnaires are cycled in this manner to reduce reporting burden on survey respondents.

Form RD-1L requests data on sales or receipts, total employment, employment of scientists and engineers, expenditures for R&D performed within the company with Federal funds and with company and other funds, character of work (basic research, applied research, and development), company-sponsored R&D expenditures in foreign countries, R&D performed under contract by others, expenditures for pollution abatement and energy R&D, detail on R&D by product field, Federal R&D support to the firm by contracting agency, domestic R&D expenditures by State, and foreign R&D by country. Form RD-1S requests the same information except for the last four items. Because companies receiving forms RD-1L and RD-1S generally have participated in previous surveys, computer imprinted data reported by the company for the previous year are supplied for reference.

To further limit reporting burden on small R&D performers and on firms that are included in the sample for the first time, an even more abbreviated form is used each year. Form RD-1A collects data only on R&D, sales, employment, and operational status and includes a screening item that allows respondents to indicate that they do not perform R&D before completing the questionnaires. No prior-year information is available since the majority of the companies have not reported previously.

Beginning in 1996, the collection of data on R&D performed under contract by others was expanded. Previously, data were collected only on nonfederally funded R&D performed under contract by others. In 1996, data on federally funded and total R&D contracted-out were collected to better measure the amount of R&D performed both within and between companies.

For the 1995 survey, about 2,700 companies that reported $1 million or more in R&D spending in the 1994 survey received form RD-1S and nearly 20,800 received form RD-1A. Of the 23,500 firms, approximately 4,800 reported R&D expenditures. For the 1996 survey, about 2,600 companies that reported $1 million or more in R&D spending in the 1995 survey received form RD-1S and over 22,300 received form RD-1A. Of the 24,900 firms, approximately 4,000 reported R&D expenditures. Both questionnaires and their accompanying instructions are reproduced in section C, Survey Documents.

Follow-up for Survey Nonresponse top

The 1995 and 1996 survey questionnaires were mailed in March 1996 and April 1997, respectively, and recipients were asked to respond within 60 days. Thirty days later, letters were mailed to all survey recipients reminding them that their completed questionnaire was due within the next 30 days. After 60 days, follow-up letters were sent to all firms that did not respond. Three additional follow-up mailings were made to persistent nonrespondents, after 90, 120, and 150 days.

In addition to the mailings, telephone follow-up was used to encourage response from those firms ranked among the 300 largest R&D performers, based on total R&D expenditures reported in the previous survey. Tables B-3 and B-3a show the number of companies in each industry or industry group that received a questionnaire and the percentage of companies that responded to the survey.

Imputation for Item Nonresponse top

For various reasons, many firms chose to return the survey questionnaires with one or more blank items.[5] For instance, the internal accounting procedures of the firm may not have allowed it to quantify the character-of-work distribution of R&D (i.e., basic research, applied research, and development). In addition, some firms, as a matter of policy, refused to answer any voluntary questions.[6]

When respondents did not provide the requested information, estimates for the missing data were made using imputation algorithms. In general, the imputation algorithms computed values for missing items by applying the average percentage change for the target item in the nonresponding firm's industry to the item's prior-year value for that firm, reported or imputed. This approach, with minor variation, was used for most items.[7] Tables B-4 and B-4a contain imputation rates for the principal survey items.

Response Rates and Mandatory Versus Voluntary Reporting top

Current survey reporting requirements divide survey items into two groups: mandatory and voluntary. Response to four data items on the questionnaires-total R&D expenditures, Federal R&D funds, net sales, and total employment-is mandatory; response to the remaining items is voluntary. During the 1990 survey cycle, NSF conducted a test of the effect of reporting on a completely voluntary basis to determine if combining both mandatory and voluntary items on one questionnaire influences response rates. For this test, the 1990 sample was divided into two panels of approximately equal size. One panel, the mandatory panel, was asked to report as usual, four mandatory items and the remainder voluntary, and the other panel was asked to report all items on a completely voluntary basis. The result of the test was a decrease in the overall survey response rate to 80 percent from levels of 88 percent in 1989 and 89 percent in 1988. The response rates for the mandatory and voluntary panels were 89 percent and 69 percent, respectively. Detailed results of the test were published in Research and Development in Industry: 1990. For firms that reported R&D expenditures in 1995 and 1996, tables B-5 and B-5a show the percentage that also reported data for other selected items.

Character of work top

Response to questions about character of work (basic research, applied research, and development) declined in the mid-1980's, and, as a result, imputation rates increased. The general imputation procedure described above became increasingly dependent upon information imputed in prior years, thereby distancing current-year estimates from any reported information. Because of the increasing dependence on imputed data, NSF chose not to publish character-of-work estimates in 1986. Consequently, the imputation procedure used to develop these estimates was revised in 1987 for use with 1986 and later data and differs from the general imputation approach. The new method calculates the character-of-work distribution for a nonresponding firm only if that firm reported a distribution within a 5-year period, extending from 2 years before to 2 years after the year requiring imputation. Imputation for a given year is initially performed in the year the data are collected and is based on a character-of-work distribution reported in either of the 2 previous years, if any. It is again performed using new data collected in the next 2 years. If reported data followed no previously imputed or reported data, previous period estimates were inserted based on the currently reported information. Likewise, if reported data did not follow 2 years of imputed data, the 2 years of previously imputed data were removed. Thus, character-of-work estimates were revised as newly reported information became available and were not final for 2 years following their initial publication.

Beginning with 1995, previously estimated values were not removed for firms that did not report in the third year, nor were estimates made for the 2 previous years for firms reporting after 2 years of nonresponse. This process was changed because in the prior period revisions were minimal. Estimates continue to be made for 2 consecutive years of nonresponse and discontinued if the firm does not report character of work in the third year.

If no reported data are available for a firm, character-of-work estimates are not imputed. As a consequence, only a portion of the total estimated R&D expenditures are distributed at the firm level. Those expenditures not meeting the requirements of the new imputation methodology are placed in a "not distributed" category. Tables B-6, B-7, B-8, and B-9 show the character-of-work estimates along with the "not distributed" component for 1993, 1994, 1995, and 1996 respectively. NSF's objective in conducting the survey has always been to provide estimates for the entire population of firms performing R&D in the United States. However, the revised imputation procedure would no longer produce such estimates because of the "not distributed" component. So, a baseline estimation method was developed to allocate the "not distributed" amounts among the character-of-work components. In the baseline estimation method, the "not distributed" expenditures are allocated by industry group to basic research, applied research, and development categories, using the percentage splits in the distributed category for that industry. The allocation is done at the lowest level of published industry detail only; higher levels are derived by aggregation, just as national totals are derived by aggregation of individual industry estimates, and result in higher performance shares for basic and applied research and lower estimates for development's share than would have been calculated using the previous method.[8] The estimates of basic research, applied research, and development provided in the tables in section A of this report were calculated using the baseline estimation method.

Comparability of Statistics top

This section summarizes the survey procedures and practices that may have affected the comparability of statistics produced from the Survey of Industrial Research and Development over time and with other statistical series.[9]

Revisions to Historical and Immediate Prior-Year Statistics top

Changes to historical statistics usually have been made because of changes in the industry classification of companies caused by changes in payroll composition detected when a new sample was drawn. Various methodologies have been adopted over the years to revise, or backcast, the data when revisions to historical statistics have become necessary. Documented revisions to the historical statistics from post-1967 surveys are summarized in Research and Development in Industry: 1991 (NSF 94-325). Detailed descriptions of the specific revisions made to the statistics from pre-1967 surveys are scarce. However, summaries of some of the major revisions are included in the technical paper cited below.[10]

Routine revision of previously published immediate prior-year statistics was discontinued beginning with the 1995 survey. The practice throughout the history of the survey was to use results from the current-year survey not only to develop current-year statistics, but also to revise immediate prior-year statistics. Changes to reported data can came from three sources: respondents, analysts involved in survey and statistical processing, and the industry reclassification process. Because of annual sampling, the continual strengthening of sampling methodology, and improvements in data verification, processing, and nonresponse follow-up, and because it is not clear that respondents or those who processed the survey results had any better information than they had when the data were first reported, it was determined that routinely revising published survey statistics increased the potential for error and often confused users of the statistics. For these reasons, the systematic revision of immediate prior-year statistics was discontinued. Now revisions are made to historical and immediate prior-year statistics only if egregious errors are discovered.

Year-to-Year Changes top

Comparability from year-to-year may be affected by new sample design, annual sample selection, and industry shifts.

Sample Design top

Changes to the sample design can affect comparability of year-to-year estimates. By far the most profound influence on statistics from recent surveys occurred when the new sample design for the 1992 survey was introduced. Revisions to the 1991 statistics were dramatic (see Research and Development in Industry: 1992 for a detailed discussion). While the allocation of the sample was changed somewhat, the sample designs used for the 1993-96 surveys were comparable in terms of size and coverage to the 1992 sample design.

Annual Sample Selection top

With the introduction of annual sampling in 1992, more year-to-year change has resulted than when survey panels were used. There are two reasons why this is so. First, changes in classification of companies not surveyed were not reflected in the year-to-year movement. Prior to annual sampling, the wedging operation, which was performed when a new sample was selected, was a means of adjusting the data series to account for the changes in classification that occurred in the frame (see the discussion on wedging below). Second, yearly correlation of R&D data is lost when independent samples are drawn each year.

Industry Shifts top

The industry classification of companies is redefined each year with the creation of the sampling frame. By redefining the frame, the sample reflects current distributions of companies by size and industry. During this process, a company may move from one industry into another because of several factors: changes in a company's payroll composition, which is used to determine the industry classification code (see discussion above under "Frame Creation"); changes in the industry classification system itself; or changes in the way the industry classification code is assigned or revised during survey processing.

A company's payroll composition changes because of a number of events. Among them are (1) the growth or decline of product or service lines; (2) the merger of two or more companies; (3) the acquisition of one company by another; (4) divestitures; or (5) the formation of conglomerates. Since the introduction of annual sampling in 1992, although this is unlikely, a company's industry designation can be reclassified yearly. The result is that a downward movement in R&D expenditures in one industry is balanced by an upward movement in another industry from one year to the next.

From time to time, the SIC coding system, which is used by most Federal Government agencies that publish industry statistics, is revised to reflect the changing composition of U.S. industry. For statistics developed for 1988-91 from the 1988-91 surveys, companies retained the industry classifications assigned for the 1987 sample. These classifications were based on the 1977 SIC system. The last major revision of the SIC system was for 1987. This new system was used to classify companies in the post-1991 surveys.

Finally, the method used to classify firms during survey processing was revised slightly in 1992. Research has shown that the impact on individual industry estimates has been minor.[11] The current method used to classify firms is discussed above under "Frame Creation." Methods used for past surveys are discussed in the technical paper cited below.[12]

Capturing Small and Nonmanufacturing R&D Performers[13] top

Before the 1992 survey, the sample of firms surveyed was selected at irregular intervals.[14] In intervening years, a panel of the largest firms known to perform R&D was surveyed. For example, a sample of about 14,000 firms was selected for the 1987 survey. For the 1988 through 1991 studies, about 1,700 of these firms were annually resurveyed; the other firms did not receive another questionnaire and their R&D data were estimated. This sample design was adequate during the early years of the survey because the performance of R&D remained concentrated in relatively few manufacturing industries. However, as more and more firms began entering the R&D-performing arena, the old sample design proved increasingly deficient because it did not capture births of new R&D-performing firms. The entry of fledgling R&D performers into the marketplace was simply missed during panel years. Additionally, beginning in the early 1970's, the need for more detailed R&D information for nonmanufacturers was recognized. At that time, the broad industry classifications "miscellaneous business services, and miscellaneous services" were added to the list of industry groups for which statistics were published. By 1975, about 3 percent of total R&D was performed by firms in nonmanufacturing industries.

During the mid-1980's, there was evidence that an increasing number of nonmanufacturing firms were conducting a significant amount of R&D, and again the number of industries used to develop the statistics for nonmanufacturers was increased. Consequently, since 1987 the annual reports in this series have included separate R&D estimates for firms in the communication, utility, engineering, architectural, research, development, testing, computer programming, and data processing service industries; hospitals; and medical labs. Approximately 9 percent of the estimated industrial R&D performance during 1987 was undertaken by nonmanufacturing firms.

After the list of industries for which statistics were published was expanded, it became clear that the sample design itself should be changed to reflect the widening population of R&D performers among firms in the nonmanufacturing industries[15] and small firms in all industries, to account better for births of R&D performing firms and to produce statistics that are generally more reliable. Beginning with the 1992 survey, NSF decided to (1) draw new samples with broader coverage annually, and (2) increase the sample size to approximately 23,000 firms.[16] As a result of the sample redesign, for 1992, the reported nonmanufacturing share was and continues to be estimated at approximately 25 percent of total R&D.

Time Series Analyses top

As discussed earlier, the statistics resulting from the survey are better indicators of changes in, rather than absolute levels of, R&D spending and personnel. Nevertheless, the statistics are often considered as a continuous time series that has been prepared using the same collection, processing, and tabulation methods. Such uniformity during preparation has not been the case. Since the survey was first fielded, improvements have been made to increase the reliability of the statistics and to make the survey results more useful. To that end, existing practices have been changed and new procedures have been instituted. Preservation of the comparability of the statistics has been an important consideration when improvements have been made, however. Changes to survey definitions, the industry classification system, and the procedure used to assign industry codes to multiestablishment companies[17] have had some, though not substantial, effects on the comparability of statistics.[18]

The aspect of the survey that had a greater effect on comparability was the selection of samples at irregular intervals (i.e., 1967, 1971, 1976, 1981, 1987, and 1992) and the use of a subset or panel of the last sample drawn to develop statistics for intervening years. As discussed earlier, this practice introduced cyclical deterioration of the statistics. As compensation for this deterioration, periodic revisions have been made to the statistics produced from the panels surveyed between sample years. Early in the survey's history, various methods were used to make these revisions.[19] After 1976 and until 1992 with the advent of annual sampling, a linking procedure called wedging was used.[20] Simply described, in wedging, the 2 sample years on each end of a series of estimates served as benchmarks in the algorithms used to adjust the estimates for the intervening years.

Wedging Methodology top

For a full discussion of the mathematical algorithm used for the wedging process that linked statistics from the 1992 survey with those from the 1987 survey, see the technical memorandum cited below.[21] In general, the memorandum states that wedging,

takes full advantage of the fact that in the first year of a new panel [when a new sample is selected], both current-year and prior-year estimates are derived. Thus, two independent estimates exist for the prior year. The estimates from the new panel are treated as superior primarily because the new panel is based on updated classifications [the industry classifications in the prior panel are frozen] and is more fully representative of the current universe (the prior panel suffers from panel deterioration, especially a lack of birth updating). The limitations in the prior panel caused by these factors are naturally assumed to increase with time, so that in the revised series, we desire a gradual increase in the level or revision over time which culminates in the real difference observed between the two independent sample estimates of the prior year. At the same time, we desire that the annual movement of the original series be preserved to the degree possible in the revised series.

To that end, the wedging algorithm does not change estimates from sample years and adjusts estimates from panel years, recognizing that deterioration of the panel is progressive over time.

Wedged Versus Not-Wedged Statistics top

One of the primary reasons for the decision to select a new sample annually rather than at irregular intervals was to avoid applying global revision processes such as wedging. Consequently, the 1992 survey was intended to be the last one affected by the wedging procedure.

Comparisons to Other Statistical Series top

The NSF collects data on federally financed R&D from both Federal funding agencies and performers of the work (industry, Federal labs, universities, and other nonprofit organizations). As reported by Federal agencies, NSF publishes data on Federal R&D budget authority and outlays, in addition to Federal obligations. These terms are defined below:[22]

For the reasons cited above, national R&D expenditure totals in NSF's National Patterns of R&D Resources report series are constructed primarily based on data reported by performers and include estimates of Federal R&D funding to these sectors. But until performer-reported survey data on Federal R&D expenditures are available from industry and academia, data collected from the Federal agency funders of R&D are used to project R&D performance. When survey data from the performers subsequently are tabulated (as they are in this report), these statistics replace the projections based on funder expectations. Historically, the two survey systems have tracked fairly closely. For example, in 1980 performers reported using $29.5 billion in Federal R&D funding, and Federal agencies reported total R&D funding between $29.2 billion in outlays and $29.8 billion in obligations.[24] In recent years, however, the two series have diverged considerably.[25] The difference in the Federal R&D totals appears to be concentrated in funding of industry (primarily aircraft and missile firms) by the Department of Defense. Overall, industrial firms have reported significant declines in Federal R&D support since 1990 (see table A-1), while Federal agencies reported level or slightly increased funding of industrial R&D.[26] NSF is examining the causal factors of these divergent trends.

Survey Definitions top

Cost Per R&D Scientist or Engineer

The arithmetic mean of the numbers of full-time-equivalent (FTE) scientists and engineers engaged in the performance of R&D reported for January in 2 consecutive years divided into the total R&D expenditures of the earlier year, with the ratio attributed to the earlier year. For example, the mean of the numbers of FTE R&D scientists and engineers in January 1995 and January 1996 is divided into total 1995 R&D expenditures for a total cost per R&D scientist or engineer in 1995.

Employment, FTE R&D Scientists and Engineers

Persons employed by the company during the January following the survey year who are engaged in scientific or engineering work at a level that requires knowledge of engineering or of the physical, biological, mathe- matical, statistical, or computer sciences equivalent at least to that acquired through completion of a 4-year college program with a major in one of those fields. The statistics in this report show the FTE employment. FTE employment is the number of scientists and engineers in the company who are assigned full time plus a prorated number of employees working part-time on R&D.

Employment, Total

Number of persons domestically employed by R&D-performing companies in all activities during the pay period that includes the 12th of March.

Federally Funded R&D Centers (FFRDC's)

R&D-performing organizations administered by industrial, educational, or other institutions on a nonprofit basis, exclusively or substantially financed by the Federal Government. R&D expenditures of the FFRDC's that are industry-administered are included with the Federal R&D data of the industry classification of each of the administering firms. The industry-administered FFRDC's included in the 1995 and 1996 surveys are listed as follows.

FFRDC's Supported by the Department of Energy:

Energy Technology Engineering Center
Rockwell International Corp.
Canoga Park, CA

Idaho National Engineering Laboratory
Lockheed Martin Corp.
Idaho Falls, ID

Oak Ridge National Laboratory
Lockheed Martin Corp.
Oak Ridge, TN

Sandia National Laboratories
Lockheed Martin Corp.
Albuquerque, NM

Savannah River Laboratory
Westinghouse Corp.
Aiken, SC

FFRDC Supported by the Department of Health and Human Services, National Institutes of Health:

NCI Frederick Cancer Research Facility
Science Applications International Corporation (SAIC)
Advanced Bioscience Laboratories, Inc.
Frederick, MD

Funds for R&D, Company and Other

The cost of R&D actually performed within the company and funded by the company itself or by other non-Federal sources, not including the cost of R&D supported by companies but contracted to outside organizations such as research institutions, universities and colleges, nonprofit organizations, or, to avoid double-counting, other companies.

Funds for R&D, Federal

Receipts for R&D performed by the company under Federal R&D contracts or subcontracts and R&D portions of Federal procurement contracts and subcontracts.

Funds for R&D, Total

Operating expenses incurred by a company in the conduct of R&D in its own laboratories or other company-owned or -operated facilities, including wages and salaries; materials and supplies; property and other taxes; maintenance and repairs; depreciation; and an appropriate share of overhead, not including capital expenditures.

Industrial Research and Development

The pursuit of a planned search for new knowledge, whether or not the search has reference to a specific commercial objective, although such investigations may be in fields of present or potential interest to the reporting company (basic research); the application of existing knowledge having specific commercial objectives with respect to products or processes (applied research); or the application of existing knowledge concerned with translating research findings or other scientific knowledge into products or processes (development) by persons trained, either formally or by experience, in engineering or in the physical, biological, mathematical, statistical, or computer sciences and employed by a publicly or privately owned firm engaged in for-profit activity in the United States. Industrial R&D includes the design and development of prototypes and processes and excludes quality control, routine product testing, market research, sales promotion, sales service, other nontechnological activities or routine technical services, and research in the social sciences or psychology.

Net Sales and Receipts

Dollar values for goods sold or services rendered by R&D-performing companies to customers (outside the company), including the Federal Government, less such items as returns, allowances, freight, charges, and excise taxes. Domestic intracompany transfers and sales by foreign subsidiaries are excluded, but transfers to foreign subsidiaries and export sales to foreign companies are included.


Footnotes

[1] Information for this section was provided by the Manufacturing and Construction Division of the Bureau of the Census, the collecting and compiling agent for the National Science Foundation (NSF). Copies of the technical papers cited can be obtained by contacting NSF's Research and Development Statistics Program in the Division of Science Resources Studies at the address given in section A, Introduction.

[2] See the Bureau of the Census technical memorandum entitled "Evaluation of Total Employment Cut-Offs in the Survey of Industrial Research and Development," Nov 3, 1994.

[3] These industries are listed and discussed later in this section under Comparability of Statistics.

[4] U.S. Department of Commerce, Bureau of the Census, Documentation of Nonsampling Issues in the Survey of Industrial Research and Development, RR94/03 (Washington, DC, Sept. 1994) and U.S. Department of Commerce, Bureau of the Census, A Study of Processing Errors in the Survey of Industrial Research and Development, ESMD-9403 (Washington, DC, Sept. 1994).

[5] For detailed discussions on the sources, control, and measurement of error resulting from item nonresponse, see the technical report: U.S. Department of Commerce, Bureau of the Census, Documentation of Nonsampling Error Issues in the Survey of Industrial Research and Development, RR94/03 (Washington, DC, Sept. 21, 1994). For a general discussion of the problems stemming from item nonresponse, see the technical report: National Science Foundation, Estimating Basic and Applied Research and Development in Industry: A Preliminary Review of Survey Procedures, NSF 90-322 (Washington, DC, 1990).

[6] All but four items-total R&D, Federal R&D, net sales, and total employment, which are included in the Census Bureau's annual mandatory statistical program-are voluntary. See further discussion under Response Rates and Mandatory Versus Voluntary Reporting, later in this section.

[7] For detailed descriptions and analyses of the imputation methods and algorithms used, see the technical report: U.S. Department of Commerce, Bureau of the Census, An Evaluation of Imputation Methods for the Survey of Industrial Research and Development, ESMD-9404 (Washington, DC, Sept. 1994).

[8] See the NSF technical report cited above for an explanation of the uncertainties in the data and to quantify their sensitivity to the choice of various possible imputation procedures.

[9] See also the technical paper U.S. Department of Commerce, Bureau of the Census, Documentation of the Survey Design for the Survey of Industrial Research and Development: A Historical Perspective (Washington, DC, 1995).

[10] U.S. Department of Commerce, Bureau of the Census, Survey Design of the Survey of Industrial Research and Development: A Historical Perspective (Washington, DC, 1995).

[11] The effects of recent changes in the way companies are classified during survey processing are discussed in detail in the Bureau of the Census technical memoranda entitled "Reclassification of Companies in the 1992 Survey of Industrial Research and Development for the Generation of the 'Analytical' Series," Oct. 25, 1994, and "Comparison of Company Coding Between 1992 and 1993 for the Survey of Industrial Research and Development," Nov. 3, 1994.

[12] U.S. Department of Commerce, Bureau of the Census, Documentation of the Survey Design for the Survey of Industrial Research and Development: A Historical Perspective (Washington, DC, 1995).

[13] See also National Science Foundation, , 1992 R&D Spending by U.S. Firms Rises, NSF Survey Improved (NSF 94-325), (Arlington, VA, Sept. 9 SRS Data Brief, 1994).

[14] During the early years of the survey, until 1967, samples were selected every 5 years. Subsequent samples were selected for 1971, 1976, 1981, and 1987.

[15] For the 1992 survey, 25 new nonmanufacturing industry and industry groups were added to the sample frame: agricultural services (SIC 07); fishing, hunting, and trapping (09); wholesale trade-nondurables (51); stationery and office supply stores (5112); industrial and personal service paper (5113); groceries and related products (514); chemicals and allied products (516); miscellaneous nondurable goods (519); home furniture, furnishings, and equipment stores (57); radio, TV, consumer electronics, and music stores (573); eating and drinking places (581); miscellaneous retail (59); nonstore retailers (596); real estate (65); holding and other investment offices (67); hotels, rooming houses, camps, and other lodging places (70); automotive repair, services, and parking (75); miscellaneous repair services (76); amusement and recreation services (79); health services (80); offices and clinics of medical doctors (801); offices and clinics of other health practitioners (804); miscellaneous health and allied services not elsewhere classified (809); engineering, accounting, research, management, and related services (87); and management and public relations services (874).

[16] Annual sampling also remedies the cyclical deterioration of the statistics that results from changes in a company's payroll composition because of product line and corporate structural changes.

[17] For discussions for each of these, see the Bureau of the Census technical memorandum entitled "Wedging Considerations for the 1992 Research and Development (R&D) Survey," June 10, 1994.

[18] See the Bureau of the Census technical memoranda entitled "Reclassification of Companies in the 1992 Survey of Industrial Research and Development for the Generation of 'Analytical' Series," Oct. 25, 1994, and "Effects of the 1987 SIC Revision on Company Classification in the Survey of Industrial Research and Development (R&D)," Dec. 6, 1993.

[19] See U.S. Department of Commerce, Bureau of the Census, Survey Design of the Survey of Industrial Research and Development: A Historical Perspective (Washington, DC, 1995).

[20] The process was dubbed wedging because of the wedgelike area produced on a graph that compares originally reported statistics with the revised statistics that result after linking.

[21] Bureau of the Census technical memorandum, "Wedging Considerations for the 1992 Research and Development (R&D) Survey," June 10, 1994.

[22] See also NSF, Federal Funds for Research and Development: Fiscal Years 1994-96, NSF 97-302 (Arlington, VA, 1997)

[23] See NSF, Federal R&D Funding by Budget Function: Fiscal Years 1994-96 (Budget Function), NSF 95-342 (Arlington, VA, 1995)

[24] NSF, National Patterns of R&D Resources: 1996, NSF 96-333 (Arlington, VA, 1996)

[25] Ibid.

[26] Ibid.


Top of page Table of Contents Help SRS Homepage