Design Options for SESTAT for the Current Decade: Statistical Issues
Implications for Variance Estimation
An important feature of probability sampling methods is that they permit the calculation of the sampling errors associated with the survey estimates. None of the four design options has a significant advantage over the others in regard to variance estimation. As long as appropriate variance stratum and unit identifiers are available in the data and related survey control files, reasonably good variance estimators can be developed using any of a number of well-known techniques. Because the four options all involve the same basic design elements (i.e., use of unclustered census samples plus essentially independent samples of recent college graduates and doctorate recipients), the same general approach can be used (with minor modifications) for all four options. The important aspects to capture in the variance estimator are (a) the relevant features of the designs used to select the various samples and (b) the weighting and estimation procedures used to develop estimates from the integrated survey data. Although the mechanics of weighting and variance estimation would be more complicated under option 3 (because it involves "parallel" samples of slightly different design that must be weighted separately and then combined using composite estimation), the additional processing burden associated with this option should not be a major factor in choosing among the alternative designs.
In general, sampling errors may be estimated by using analytical variance formulas based on a Taylor Series approach or by using replication techniques such as jackknife repeated replication or BRR. The Taylor Series approach is straightforward for simple linear estimates such as the expansion estimate of a population total, but the variance formulas can be complex for nonlinear statistics. Replication methods (e.g., see McCarthy 1966 or Wolter 1985) provide a relatively simple way of calculating variances and have some advantages over other variance estimation methods. In particular, the impact of weighting adjustments can be reflected approximately in the variance estimates obtained by replication methods.
As indicated in the section "Variance Estimation in SESTAT," the SESTAT surveys of the 1990s employed a variety of replication techniques for variance estimation. The BRR method was used for the NSCG and the SDR, whereas a jackknife approach was used for the NSRCG. Although valid variance estimates are generated when different variance estimation methods are used for the different components of the SESTAT integrated database (in fact, it would be possible to use Taylor Series approximations for one component and replication for another), using the same technique for all components would provide analysts with a unified approach for variance estimation. Unfortunately, this may be impractical because it would require recalculating existing replicate weights for one or more components of the SESTAT database.
For example, suppose that it is desired to use jackknife replication for variance estimation. Under option 1, a completely new census sample would be selected for fielding in 2003. For this component of the SESTAT database, it would be straightforward to develop the required jackknife replicate weights. This is also true for the NSRCG component (where jackknife replication is currently being used). However, the SDR has used BRR since the beginning of the integrated database with the 1993 surveys. To develop the required set of jackknife replicate weights for the SDR, it would be necessary to first construct jackknife replicates for each existing "panel" (cohort) in the SDR. Once the jackknife replicates had been constructed, all of the weighting adjustments applied to the full sample would have to be repeated for each replicate (separately for each panel within the SDR). Although it is theoretically possible to construct the jackknife replicate weights in this manner, the work involved would be difficult, time consuming, and prone to error. Thus, it may be preferable to simply continue with the current BRR approach for the SDR.
As a simplification, under the assumption that the total sample can be assigned to appropriate variance units and variance strata, it is possible to replicate the current (existing) full sample weights (without replicating the weighting adjustments that have occurred in the previous rounds). The resulting weights will not reflect all of the adjustments that have been made, but may nonetheless provide a reasonably good approximation of the variance. If this simplification does not seriously affect the integrity of the variance estimates, it may provide a practical solution to the problem of retroactively creating replicate weights for an existing sample.