Survey Methodology

Survey Methodology[4]

Reporting Unit
Frame Creation
Sample Selection
Survey Questionnaires
Followup for Survey Nonresponse
Imputation of Item Nonresponse
Response Rates and Mandatory Versus Voluntary Reporting

Reporting Unit

The reporting unit for the Survey of Industrial Research and Development is the enterprise, or company, defined as a business organization of one or more establishments under common ownership or control. The survey includes two groups of enterprises: (i) companies known to conduct research and development (R&D) and (ii) a sample representation of companies for which information on the extent of R&D activity is uncertain.

Frame Creation

The Standard Statistical Establishment List (SSEL), a Bureau of the Census compilation that contains information on over 3 million establishments with paid employees, was the universe from which the frame used to select the 1992 survey sample was created (see table B-1 for universe and sample sizes). For companies with more than one establishment, data were summed to the company level. The firm was then assigned a single standard industrial classification (SIC) code based on the activity of the establishment(s) having the highest dollar value of payroll. This assignment was done on a hierarchical basis. The enterprise was first assigned to the economic division (manufacturing or nonmanufacturing) with the highest payroll, then to the 2-digit SIC code with the highest payroll within the assigned division, then to the 3-digit SIC code with the highest payroll within the assigned 2-digit industry.

The frame from which the survey sample was drawn included all for-profit companies classified in nonfarm industries. For surveys prior to 1992, the frame was limited to companies above certain size criteria based on number of employees. These criteria varied by industry. Also, some industries were excluded from the frame because it was believed that these industries contributed little or no R&D activity to the final survey estimates. For the 1992 sample, new industries were added to the frame[5] and the size criteria were lowered considerably and applied uniformly to firms in all industries. As a result, nearly 2 million enterprises with 5 or more employees were given a chance of selection. For comparison, the frame for the 1987 sample included 154,000 companies of specified sizes and industries.

External information about the likelihood that a company conducted R&D was used to identify nearly 10,000 companies that were included in the survey sample with certainty. External sources included prior R&D surveys, directories that include company information on R&D reported to the Securities and Exchange Commission, commercially available directories of R&D performing companies, Department of Defense directories of contracts awarded for R&D, and various publications and newsletters that highlight firms conducting R&D. In addition, all companies in the frame with 1,000 employees or more were selected with certainty.

Sample Selection

Probability Proportionate to Size.
As with most types of economic surveys, the sample was selected using probabilities proportionate to size. That is, large companies had a higher probability of selection than did small companies. For this survey it would have been ideal if company size could have been determined by the amount of R&D expenditures. Unfortunately, except for the companies that were in a previous survey or for which there was information from external sources, it was impossible to know the R&D expenditure values for firms in the universe. Consequently, most companies' R&D expenditures had to be estimated and the probability of selection, based on the estimated values.

Since total employment was known for each company in the universe, it was possible to use an already-observed relationship between employment and R&D to estimate R&D expenditure values for companies in the frame. This was the same strategy employed in the 1981 and 1987 sampling operations. For 1992 sampling, data collected in the 1991 survey was used to derive this relationship separately for single-unit companies and multiestablishment companies. The effect in all cases was to give firms with a large number of employees higher probability of selection since it was assumed that large companies were more likely to perform R&D and that the amount of R&D was proportionate to the size of the company.

Sample Stratification and Relative Standard Error Constraints
The particular sample selected was one of a large number of the same type and size that by chance might have been selected. Statistics resulting from the different samples would differ somewhat from each other. These differences are represented by estimates of sampling error. The smaller the sampling error, the more precise the statistic.

To control sampling error in the statistics resulting from this survey, parameters were specified to allocate the sample across various levels, or strata, that corresponded to industry groupings. These parameters permitted the sample size to be varied to achieve a desired level of sampling error for each stratum and were assigned so that estimated errors of total R&D for industries in these strata did not exceed certain levels. Sample sizes among the strata were only constrained by the limit placed on the total sample size dictated by the available budget.

For sample selections prior to 1992, the strata designations were the published industry categories. The sample was allocated across these industry categories to provide high, medium, and low levels of precision. For the 1992 sample the criteria for this allocation were modified. In order to gather information to review and evaluate the appropriateness of the published industry groupings, the allocation of the sample was controlled for levels of industry detail below those traditionally published. The result was that the frame was partitioned into 95 manufacturing industry strata and 25 nonmanufacturing strata.

Each industry was allocated to one of three groups. The first group was formulated to analyze the distribution of data in manufacturing industries. In this group each 3-digit manufacturing industry was considered a separate stratum. The second group was formulated to improve coverage and to identify emerging industries. In this group, selected 2-digit and 3-digit nonmanufacturing industries each were considered a separate stratum. The industries were identified as those for which statistics had been published previously and those with high concentrations of scientists and engineers as reported in occupational surveys. The third group was a large stratum of companies in nonmanufacturing industries that had not been included in previous sampling frames or for which there was little indication of R&D activity.

Once the strata were defined, the following criteria were used to achieve the target sampling error for total R&D.

Based on the desired precision represented by these sampling error estimates, the criteria suggested a total sample size of approximately 23,000.

A limitation of the sample allocation process should be noted. Sampling errors were controlled by using a universe total that, in large part, was improvised. That is, as previously noted, an R&D value was assigned to every company in the frame, even though many of these companies actually may not have had R&D expenditures. The value assigned was imputed for the majority of companies in the frame and, as a consequence, the estimated universe and the distribution of individual company values did not necessarily reflect the true distribution. Estimates of sampling variability were nevertheless based on this distribution. The presumption was and this had been confirmed using the previous sample selection that actual variation in the sample design would be less than that estimated, because many of the sampled companies have true R&D values of zero, not the widely varying values that were imputed using total employment as a predictor of R&D. Thus, the 2-percent and 5-percent error levels described earlier are conservative. (See table B-2 for a list by industry of the actual standard error estimates for total R&D.)

In addition to sampling error, the estimates are subject to nonsampling error. Errors are grouped into five categories: specification, coverage, response, nonresponse, and processing. For detailed discussions on the sources, control, and measurement of each of these types of error, see the technical reports cited below[6].

Sample Size and Weighting
The sample was selected with a target sample size of 23,000 and with other parameters set to ensure compliance with the standard error constraints. An actual sample of 23,376 was selected. The actual sample size differed from the target for two reasons. First, the sample frame was subjected to independent sampling. Each company in the frame had an independent chance of selection, based on its assigned probability, i.e., selection of a company was completely independent of the selection of any other company. In independent sampling, sample size itself is a random variable. Theoretically, a sample of size zero or a sample the size of the entire universe is possible, but the probabilities of these extremes are so small that these are nearly impossible situations. The actual sample size is usually quite close to the specified size. If there is too much deviation, the selection is simply executed again.

Second, a minimum probability rule was imposed. As noted earlier, probabilities of selection proportionate to size are assigned to each company, where size is the imputed R&D value assigned each company. Selected companies that report actual R&D expenditures vastly larger than their assigned values can have adverse effects on the statistics, which are based on the weighted value of survey responses[7]. To lessen the effects on the final statistics, the maximum weight a company could assume was arbitrarily controlled by specifying the probability of the company's selection. If the probability, based on company size, was less than the arbitrarily set minimum, then the probability was set equal to the minimum value. The consequence of raising these original probabilities to the minimum probability was to raise the expected sample size. It is likely that most of the difference between the size of the target sample and the actually selected sample was because of this rule.

Survey Questionnaires

Two questionnaires are used each year to collect data for the survey. For large firms known to perform R&D, a detailed questionnaire, Form RD-1L, is used to collect data for odd-numbered years and an abbreviated version, Form RD-1S, is used to collect data for the even-numbered years. The questionnaires are cycled in this manner to reduce reporting burden on survey respondents.

The Form RD-1L requests data on sales or receipts, total employment, employment of scientists and engineers, expenditures for R&D performed within the company with Federal funds and with company and other funds, character of work (basic research, applied research, and development), company-sponsored R&D expenditures in foreign countries, R&D performed under contract to others, expenditures for pollution abatement and energy R&D, detail on R&D by product field, Federal R&D support to the firm by contracting agency, domestic R&D expenditures by State, and foreign R&D by country. The Form RD-1S requests the same information except for the last four items. Because companies receiving the Forms RD-1L and RD-1S generally have participated in previous surveys, computer imprinted data reported by the company for the previous year is supplied for reference. Companies are encouraged to revise or update this imprinted data if they have more current information.

To further limit reporting burden on small R&D performers and on firms that are included in the sample for the first time, an even more abbreviated form is used each year. Form RD-1A collects data only on R&D, sales, employment, and operational status and includes a screening item that allows respondents to indicate that they do not perform R&D. No prior-year information is available since the majority of the companies have not reported previously[8].

For the 1992 survey, about 1,600 companies received Form RD-1S and nearly 22,000 received Form RD-1A. Of the 22,000 firms, 1,760 reported R&D expenditures. Both questionnaires and the instructions provided to respondents are reproduced in section C, Survey Documents.

Followup for Survey Nonresponse

The 1992 survey questionnaires were mailed in May 1993, and recipients were asked to respond within 60 days. Thirty days later, letters were mailed to all survey recipients reminding them that their completed questionnaire was due within the next 30 days. After 60 days, followup letters were sent to all nonresponding firms. Two additional followup mailings were made to persistent nonrespondents, after 90 and 120 days. The 90-day followup mailing included a replacement questionnaire.

In addition to the mailings, telephone followup was used to encourage response from those firms ranked among the 300 largest R&D performers, based on total R&D expenditures reported in the previous survey. Telephone followup was also used for these firms during the initial data edit phase of survey operations if data items were missing or unclear.

Imputation for Item Nonresponse

For various reasons, many firms chose to return the survey questionnaires with one or more blank items[9]. For instance, the internal accounting procedures of the firm may not have allowed it to quantify the pollution-abatement expenditures portion of R&D. In addition, some firms, as a matter of policy, refused to answer any voluntary questions[10].

When respondents did not provide the requested information, estimates for the missing data were made using imputation algorithms. In general, the imputation algorithms computed values for missing items by applying the average percentage change for the target item in the nonresponding firm's industry to the item's prior-year value for that firm, reported or imputed. This approach, with minor variation, was used for most items[11]. Table B-3 contains imputation rates for the principal survey items.

Character of work
Response to questions about character of work (basic research, applied research, and development) declined in the mid-1980s and, as a result, imputation rates increased. The general imputation procedure described above became increasingly dependent upon information imputed in prior years, thereby distancing current-year estimates from any reported information. Because of the increasing dependence on imputed data, NSF chose not to publish character-of-work estimates in 1986. Consequently, the imputation procedure used to develop these estimates was revised in 1987 for use with 1986 and later data and differs from the general imputation approach. The new method calculates the character-of-work distribution for a nonresponding firm only if that firm reported a distribution within a 5-year period, extending from 2 years before to 2 years after the year requiring imputation. Imputation for a given year is initially performed in the year the data are collected and is based on a character-of-work distribution reported in either of the 2 previous years, if any. It is again performed using new data collected in the next 2 years. Thus, character-of-work estimates are revised as newly reported information becomes available and are not final for 2 years following their initial publication.

If no reported data are available for a firm, character-of-work estimates are not imputed. As a consequence, only a portion of the total estimated R&D expenditures are distributed at the firm level. Those expenditures not meeting the requirements of the new imputation methodology are placed in a "not distributed" category. Tables B-4 through B-8 show the character-of-work estimates along with the "not distributed" component for 1988-92, respectively.

NSF's objective in conducting the survey has always been to provide estimates for the entire population of firms performing R&D in the United States, however, the revised imputation procedure would no longer produce such estimates because of the "not distributed" component. So, a baseline estimation method was developed to allocate the "not distributed" amounts among the character-of-work components. In the baseline estimation method, the "not distributed" expenditures are allocated, by industry group, to basic research, applied research, and development categories, using the percentage splits in the distributed category for that industry. The allocation is done at the lowest level of published industry detail only; higher levels are derived by aggregation (just as national totals are derived by aggregation of individual industry estimates), and results in higher performance shares for basic and applied research and lower estimates for development's share than would have been calculated using the previous method[12]. The estimates of basic research, applied research, and development provided in section A of this report were calculated using the baseline estimation method.

Response Rates and Mandatory Versus Voluntary Reporting

Detailed unit and item response rates are shown in tables B-9 and B-10, respectively. Table B-9 shows the number of companies in each industry or group of industries that received a questionnaire and the percentage that responded to the survey. Table B-10 shows the percentage of firms with R&D expenditures that also reported data for selected items or groups of items.

Current survey reporting requirements divide survey items into two groups: mandatory and voluntary. Response to four data items on the questionnaires (total R&D expenditures, Federal R&D funds, net sales, and total employment) is mandatory, whereas response to the remaining items is voluntary. During the 1990 survey cycle, NSF conducted a test of the effect of reporting on a completely voluntary basis to determine if combining both mandatory and voluntary items on one questionnaire influences response rates. For this test, the 1990 sample was divided into two panels of approximately equal size. One panel, the mandatory panel, was asked to report as usual (four mandatory items and the remainder voluntary), and the other panel, the voluntary panel, was asked to report all items on a completely voluntary basis. The result of the test was a decrease in the overall survey response rate to 80 percent from levels of 88 percent in 1989 and 89 percent in 1988. The response rates for the mandatory and voluntary panels were 89 percent and 69 percent, respectively. Detailed results of the test were published in Research and Development in Industry: 1990.

4. Information for this section was provided by the Industry Division of the Bureau of the Census, the collecting and compiling agent for the National Science Foundation. Copies of the technical papers cited can be obtained by contacting NSF's Science and Engineering Activities Program in the Division of Science Resources Studies at the address given in the General Notes preceding section A.
5. These industries are listed and discussed under Comparability of Statistics later in this section.
6. U.S. Department of Commerce, Bureau of the Census, Documentation of Nonsampling Issues in the Survey of Industrial Research and Development, RR94/03 (Washington, DC, September 1994) and U.S. Department of Commerce, Bureau of the Census, A Study of Processing Error in the Survey of Industrial Research and Development, ESMD-9403 (Washington, DC, September 1994).
7. The weight given to a company selected for the survey is the inverse of its probability of selection. Companies selected for the sample with certainty (see "Frame Creation" above) represented only themselves and each had a weight of 1.0.
8. For the 1992 survey, companies were asked to report R&D expenditures for both the current and previous years. For subsequent years, only current year data will be requested.
9. For detailed discussions on the sources, control and measurement of error resulting from item nonresponse, see the technical report: U.S. Department of Commerce, Bureau of the Census, Documentation of Nonsampling Error Issues in the Survey of Industrial Research and Development, RR94/03 (Washington, DC, September 21, 1994). For a general discussion of the problems stemming from item nonresponse, see the technical report: National Science Foundation, Estimating Basic and Applied Research and Development in Industry: A Preliminary Review of Survey Procedures, NSF 90-322 (Washington, DC, 1990).
10. All but four items, total R&D, Federal R&D, net sales, and total employment, which are included in the Census Bureau's annual mandatory statistical program, are voluntary. See further discussion under Response Rates and Mandatory Versus Voluntary Reporting below.
11. For detailed descriptions and analyses of the imputation methods and algorithms used, see the technical report: U.S. Department of Commerce, Bureau of the Census, An Evaluation of Imputation Methods for the Survey of Industrial Research and Development, ESMD-9404 (Washington, DC, September 1994).
12. See the NSF technical report cited above for an explanation of the uncertainties in the data and to quantify their sensitivity to the choice of various possible imputation procedures.

NEXT arrowUP arrowNEXT arrow

TOC buttonHelp buttonNSF