text-only page produced automatically by LIFT Text
Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation Division of Science Resources Statistics

SESTAT Survey Design and Methodology

 

Missing Data Imputation

A completed interview was defined as a questionnaire in which all designated "critical" questions, such as degrees received and occupation, were answered. When possible, telephone follow-up was used to obtain answers to critical items for otherwise complete questionnaires. (See "Editing Guidelines and Procedures" for further details.)

Except for items with verbatim responses, missing data for noncritical items was replaced or "imputed." Imputation was not begun until after all logical editing was completed. The specific procedure used to impute missing data is known as sequential hot deck imputation. Hot deck imputation replaces missing values for a particular data item with an existing response from another data record associated with another individual's data record (the "donor") who is considered to be "similar" to the individual whose record has the missing value (the "recipient.") In sequential hot deck imputation the donor record is typically the nearest record with an existing response and that is similar to the recipient.

To ensure that adjacent data records were similar, the records for each component survey were grouped into imputation classes on the basis of variables thought to be strongly or even uniquely associated with the data item to be imputed. A donor record was selected only from those records that belonged to the same imputation class as the recipient record.

Before imputation, data records within each imputation class were also sorted by variables thought to be associated with both the answer for the data item and the propensity for nonresponse to the data item. Serpentine sorting was used as it ensured that adjacent data records were as similar as possible. In serpentine sorting, the sort order is reversed as boundaries are crossed for higher level sort variables.

National Science Foundation Division of Science Resources Statistics (SRS)
The National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA
Tel: (703) 292-8780, FIRS: (800) 877-8339 | TDD: (800) 281-8749
Text Only