General Scope of Survey Method of Collection Sampling Procedures Response Occupational Estimates Variance Estimates Reliability of Estimates Quality Control Measures

## General

National estimates of occupational employment in the scientific, technical, and engineering fields for nonmanufacturing industries were based on data from the 1993 Occupational Employment Statistics (OES) Survey. The OES Program is a Federal-state cooperative effort in which each state conducts its own survey to produce its estimates. The Bureau of Labor Statistics (BLS) provided each state with survey procedures, technical support, and trouble-shooting assistance. The government agencies participating in this program were the 50 State Employment Security Agencies (SESAs) plus the District of Columbia, Puerto Rico, Guam, and American Samoa. For this report, estimates at the national level were produced by BLS-Washington based on data from the fifty states plus the District of Columbia. State-level estimates can be obtained from the individual SESAs.## Scope of Survey

The BLS nonmanufacturing industries survey covers establishments in SIC codes 10, 12-17, 60-65, 67, 70, 72, 73, 75, 76, 78-80 (except 806), 81, 83, 84, 86, 87, and 89.The reference date of this survey was the week that included April 12, May 12, or June 12, 1993. The reference date for any particular unit in the survey depended on its SIC code. See the chart below for those SICs covered in this publication.

SIC CODEREFERENCE DATESIC CODEREFERENCE DATE10 May 12 72 May 12 12 May 12 73 Apr 12 13 May 12 75 May 12 14 May 12 76 May 12 15 May 12 78 June 12 16 May 12 79 Apr 12 17 May 12 80 Apr 12 60 May 12 81 May 12 61 May 12 83 Apr 12 62 May 12 84 Apr 12 63 May 12 86 Apr 12 64 May 12 87 June 12 65 May 12 89 June 12 67 May 12 70 May 12

## Method of Collection

Survey schedules were initially mailed out to the personnel offices of almost all sampled establishments. Some of the larger establishments, however, received a personal visit.

Two additional mailings were sent to nonrespondents at approximately six-week intervals. Nonrespondents that were critical to the survey because of their size received a telephone call or a personal visit followup.

## Sampling Procedures

The sampling frame for the OES survey was a list of units reported to the state's Unemployment Insurance (U.I.) files. The reference date of the sampling frame was the second quarter of 1992.

Within each state, the universe was stratified by SIC and size class where size class was defined as follows:

Size Class Employees 1 1-4 2 5-9 3 10-19 4 20-49 5 50-99 6 100-249 7 250-499 8 500-999 9 1000+ U.I. reporting units with fewer than 5 employees were not sampled in most states; instead, units with 5-9 employees were assigned a larger weight to account for employment in size class 1. U.I. reporting units with 250 or more employees were included in the sample with certainty. The sample sizes needed to calculate state estimates at a targeted relative standard error of 10, 15, or 20 percent for one standard deviation were developed for each SIC across its non-certainty size classes. The sample size for each SIC was determined by calculating averages of occupational rates and averages of coefficients of variation (CVs) for a given set of typical occupations using data from the previous survey round. Within each SIC, the sample size was then allocated proportionally across size classes based on size class employment. The sample was selected systematically with equal probability within each state/(area)

/ SIC/size class cell.^{[1]}The states were given the option of selecting three target relative standard errors in designing their samples. Many states took advantage of this flexibility by varying target relative standard error across SICs in order to balance the cost and reliability of their estimates.

The above allocation resulted in a total initial sample size of 305,366 U.I. reporting units nationally.

## Response

Of those sampled, 290,877 were eligible units (i.e., respondents, refusals, unusables, and nonrespondents). Usable responses were obtained from 226,627 units, producing a response rate of 77.9 percent based on units and 72.9 percent based on weighted employment.

See the table below for additional details.^{[2]}

## Occupational Estimates

Weights were determined for sample units that had usable response. Each weight was composed of two factors, the reciprocal of the probability of selection and a nonresponse adjustment factor (NRAF).

For questionnaires that were not returned or were otherwise unusable, an NRAF was calculated to impute for the missing data. This factor was the ratio

It was calculated for each state/three-digit SIC/size class sampling cell.

The sample employment used to calculate the NRAF was obtained from the sampling frame. If the NRAF in a cell was greater than a predetermined maximum factor (the latter increases as the number of respondents in a cell increases), the cell was collapsed with other homogeneous cells in the industry until the NRAF for the combined cell was not greater than the appropriate maximum factor. If the collapsing procedure terminated (i.e., no more cells were available for collapse) before satisfying the constraint above, then the most recent maximum factor was used. Note that homogeneous cells were adjacent size cells within a state and SIC. The final weight assigned to each usable unit in the sample was the product of the NRAF and the reciprocal of the probability of selection.

A separate ratio estimate of occupational employment was used to develop national estimates. The auxiliary variable used was the 1992 population value of total employment. This variable is also referred to as cell benchmark employment, denoted by M

_{ij}. The term

is known as the benchmark factor. It is the ratio of cell benchmark employment to cell weighted reported total employment. The estimation formula below produced final estimates (P_{ij}) of occupational employment through benchmarking, that is, the process of multiplying the cell's weighted reported occupational employment

(S W_{ijk}P_{ijk}) by its benchmark factor.

where= estimated employment for occupation P in industry i and size class j

i = a three-digit industry j = size class k = establishment W _{ijk}= weight for establishment k in industry i and size class j after adjusting for nonresponse P _{ijk}= reported employment for occupation P in establishment k within industry i and size class j E _{ijk}= reported total employment for establishment k in industry i and size class j M _{ij}= population value of total employment for industry i and size class j. The estimated employment for an occupation at the three-digit industry i level was obtained by summing the occupational employment estimates P

_{ij}across all size levels j within industry i.

where L

_{i}was the number of size levels j in industry i.Similarly, the estimated employment for an occupation at the two-digit industry g level was obtained by summing the occupational employment estimates P

_{i}across all three-digit industries i within two-digit industry g.

where L

_{g}was the number of three-digit industries i in industry g.It is important to note, however, that because of publishability requirements, rounding adjustments were made such that occupational employment estimates at the three-digit industry level may not sum to the two-digit level estimates.

## Variance Estimates

Estimates of sampling error were calculated on survey estimates to allow users to determine whether or not the occupational estimates were reliable enough for their needs. Only a probability sample can be used to estimate sampling error from a sample.

The formulas used to estimate the variance, a common measure of sampling error, were based on the sample design and on the method of estimation. In the OES survey, the formula used to estimate the variance of occupational employment was a subsample replication technique called the jackknife random group. The jackknife derives R estimates of total occupational employment from R subsamples of the parent sample by excluding one random group at a time. The jackknife then estimates the variance of the parent sample's employment estimator from the variability between the R employment estimates.

The variance for an occupational employment estimate at the three-digit industry i/size class j level is

Where

= estimated variance of

R = number of random groups = estimated employment for occupation P in industry i and size class j = estimated employment for occupation P in industry i, size class j, and subsample r = estimated mean employment for occupation P in industry i and size class j across R subsamples The above formula for variance has been simplified. The actual formula includes corrections for finite populations.

The variance for an occupational employment estimate at the three-digit industry i level is obtained by summing the variances S2 (Pij) across all size levels j within industry i.

where L_{i}is the number of size levels j in industry i. Similarly, the variance for an occupational employment estimate at the two-digit industry g level is obtained by summing the variances S2(P_{i}) across all three-digit industries i within industry g.

where L_{g}is the number of three-digit industries i in industry g.## Reliability of Estimates

Estimates developed from the sample may differ from the results of a complete census of all the establishments in the sampling frame. Two types of error, sampling and nonsampling, are possible in an estimate based on a sample survey. Sampling error occurs because observations are made only on a sample, not on the entire population. Nonsampling error can be attributed to many sources, e.g., an inability to obtain information about all cases in the sample, differences in the respondents' interpretation of questions, inability or unwillingness of respondents to provide correct information, errors made in recording, coding, or processing the data, errors made in estimating values for missing data, and failure to represent all units in the population.

Sampling error arises because the particular sample used in this survey is only one of a large number of possible samples of the same size that could have been selected with the same sample design. Estimates derived from those different samples would differ simply as a result of random effects. Relative standard errors that are a measure of that sampling error effect are presented in this publication. The relative standard errors of a survey estimate measure the variation among the estimates from all possible samples. The relative standard error is the standard error of the estimate divided by the employment estimate for that occupation. Thus, it shows the size of the standard error relative to the occupational estimate itself.

Use of the relative standard error enables the analyst to construct a confidence interval around the occupational estimate. The confidence interval includes the average value of the estimates obtained from all possible samples (of that size and design) at a confidence level specified by the analyst. If no nonsampling error is present (which is unlikely) the interval will contain the true value with the confidence level specified.

To construct the confidence interval, divide the relative error shown in the table by 100 and multiply the result by the occupational estimate. The confidence interval is the occupational estimate, plus or minus the number resulting from the calculation described above. This estimate yields a confidence level of approximately 68%. That is, the "true value" (neglecting nonsampling error) will be contained in the interval 68% of the time. Most analysts prefer to have a confidence level higher than 68%. If a 90% confidence level is desired, multiply the number produced from the calculation in the first sentence above by 1.6. For a 95% confidence level, multiply by 1.96. For almost full confidence (99%), multiply by 2.57.

For example, suppose the occupational employment estimate for chemist is 5,000 with an associated relative standard error shown on the table

of 3. The 68% confidence interval will then be (3/100) x 5,000 or the chemist estimate plus or minus 150. The "true value" will be contained in the interval of 4,850 to 5,150 about 68% of the time. For 95% confidence , multiply 150 times 1.6 = 240. The 95% confidence interval is 4,760 to 5,240. It is important to remember that nonsampling error can have important effects on the accuracy of the estimate. Unfortunately nonsampling errors can be very difficult to measure and are not available.^{[3]}The relative standard errors primarily indicate the magnitude of the sampling error. They do not measure nonsampling error, including any biases in the data. Many edit and quality control procedures are used to reduce the nonsampling error caused by mistakes in recording, coding, and processing the data. The adjustments made for nonrespondents assumed that the characteristics of the nonrespondents are the same as those of the respondents at a given level. To the extent that this is not true, bias is introduced in the data. The magnitude of this bias is not known.

Particular care should be exercised in the interpretation of small estimates or small differences between estimates, because of relatively large sampling errors and the unknown magnitude of the biases.

## Quality Control Measures

As described above, the OES Survey is a Federal-state cooperative effort in which states conduct their own surveys.A major concern with a cooperative program like OES is accommodating state-specific publication needs with limited resources while standardizing survey procedures across all fifty states and the District of Columbia in order to produce quality estimates. The control on sources of nonsampling error in this decentralized environment can be particularly difficult. In addition, the review and validation function is spread across eight regional offices, thus leading to procedural differences between regions. Examples of quality control measures employed by BLS are the Survey Processing and Management System (SPAM) and the Estimates Delivery System (EDS), which were developed to provide a consistent and automated framework for state procedures and to reduce the workload at state, regional, and national levels.

By standardizing data processing activities (i.e., validating the sample frame, allocating and selecting the sample, refining mailing addresses, addressing envelopes and mailers, editing and updating questionnaires, producing management reports, and producing estimates) across all states, the use of SPAM and EDS has also standardized the survey methodology. This has significantly reduced the number of errors on the data files as well as the time needed to review them.

Other quality control measures used in the OES survey include

- follow-up solicitations of nonrespondents (especially critical nonrespondents),

- review of schedules to verify the accuracy and reasonableness of the reported data,

- adjustments of atypical data reporters,

- validation of the nonresponse adjustment factors,

- validation of the benchmark employment figures and of the benchmark factors, and

- validation of the analytical tables of estimates (at the two and three-digit SIC levels).

Footnotes

Note: Some states opted to further stratify their samples by substate areas.^{[1]}

Subsequent to the closeout date for national estimates, additional data were collected by the states and used to prepare their respective estimates. Consequently, the response rates in most states were higher than the response rate used to develop national estimates.^{[2]}

Although the term "relative standard error" is used throughout the text, the term "relative error" is used in the tables. BLS asserts that the terms are statistically equivalent for the OES survey.^{[3]}