nsf.gov - NCSES Comparison of the National Science Foundation's Scientists and Engineers Statistical Data System (SESTAT) with the Bureau of Labor Statistics' Current Population Survey (CPS) - US National Science Foundation (NSF)
text-only page produced automatically by LIFT Text
Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation National Center for Science and Engineering Statistics
Comparison of the National Science Foundation's Scientists and Engineers Statistical Data System (SESTAT) with the Bureau of Labor Statistics' Current Population Survey (CPS)

Appendix A. Detailed Comparison of SESTAT and CPS Procedures for Collecting and Processing Data


This appendix provides a detailed comparison of the SESTAT and CPS procedures for collecting and processing data, as summarized in "Data Collection and Processing Procedures." The two systems use different modes of data collection. SESTAT data are collected primarily by mail and telephone, with some follow-up by personal interview. Most CPS data are collected through personal interviews during the first and fifth months of participation and through telephone interviews during the other months of participation; all CPS data are collected using CATI or CAPI. In addition to these differences in data collection mode, other important differences include the following:

  • Different strategies for asking questions (i.e., question wording and flow)
  • Extent to which proxy reports are allowed
  • Length of time between the survey reference week and data collection
  • Different coding schema and definitions
  • Different rules for imputing missing data

This appendix examines aspects of the SESTAT and CPS designs that could give rise to differences in the estimates from the two systems. The discussion is divided into six sections: (1) data collection procedures, (2) data processing procedures, (3) questions about academic degree, (4) questions about employment status, (5) questions about occupation, and (6) questions about other respondent classifications. In each section, a brief description of the SESTAT and CPS procedures is provided, followed by an explanation of how these procedures might affect the estimates.

Data Collection Procedures

The three surveys included in SESTAT use mail and telephone modes of data collection in different ways. NSCG and SDR were collected using self-administered questionnaires delivered by mail, with CATI telephone follow-up. For a small percentage of NSCG cases, field interviewers visited sample members and conducted a CAPI. Most NSCG and SDR data are collected by mail; in 1997, 79% of NSCG responses and 77% of SDR responses were completed by mail. NSRCG is primarily a CATI survey with some responses collected by mail. In the 1997 NSRCG, 97% of the responses were CATI.[1] For the 1997 SESTAT overall, 61% of responses were obtained by mail, 37% by CATI, and 2% by CAPI.

CPS relies on a combination of CAPI and CATI data collection. Sample households are surveyed eight times during a 16-month period. The design calls for each household to be surveyed monthly from month 1 through month 4, and again from month 13 through month 16. CAPI interviews are the primary mode of collection in month 1 and in month 13; telephone interviews (CATI) are conducted in the other 6 months.

CPS and SESTAT use computer-assisted technologies to differing degrees. Both the CPS and SESTAT CATI systems conduct internal consistency checks during the survey administration. These computer-generated edit checks produce edit screens that ask the respondent to resolve or clarify any discrepancies in the data. In addition, CPS programs use "dependent interviewing," in which responses to selected questions collected on each household member during a prior month are used during subsequent rounds of data collection. The SESTAT system does not use dependent interviewing. A small number of cross-year edits are conducted using data collected in prior survey cycles, but they are generally conducted after data collection.

CPS and SESTAT also differ in the use of proxy respondents; all SESTAT sample members are asked to provide self-reports, whereas CPS relies heavily on proxy reports. The SESTAT mail surveys are addressed to the sample members, who provide the bulk of the survey responses. During telephone follow-up, SESTAT interviewers are allowed to complete the interviews with someone other than the sample member only in special limited situations.[2] Historically, these special situations have represented only a handful of respondents.

In CPS, a household respondent is identified at the beginning of each interview. These household respondents are asked to provide information about each eligible household member. Questions asked at the first month of interviewing (but not necessarily in every month) include the member's sex, race/ethnicity, date of birth and age, current military status, and highest level of education completed or highest degree received. Interviewers are encouraged to ask individual household members to self-report on labor force participation. However, interviewers are under tight time constraints and are just as strongly encouraged to collect as much information as possible in one contact (thereby minimizing callbacks). Typically, just under one-half of the data collected on labor force participation are provided by proxies.

Both SESTAT and CPS require respondents to focus on a 1-week period of time (i.e., the reference week) as they answer the survey questions. In 1997, SESTAT questionnaires asked respondents to focus on the week bounded by 13 April and 19 April. CPS always asks respondents to focus on the week of the month that includes the 12th; in April 1997, that week was bounded by 6 April and 12 April.

The length of the data collection periods scheduled for CPS and SESTAT are very different. CPS collects data for about 1 week after the reference week. SESTAT surveys, on the other hand, last for several months after the reference week. For example, the 1997 NSCG mailed the first questionnaires roughly 4 weeks after the reference week and closed CATI follow-up of nonrespondents 5 months after the reference week. The 1997 NSRCG began data collection roughly 4 weeks after the reference week and nonresponse follow-up continued into the 10th month after the reference week. The 1997 SDR mailed the first questionnaires about 6 weeks after the reference week. CATI collection was initiated 6 months after the reference week and nonresponse follow-up continued until the 11th month after the reference week.

The design features highlighted here could produce differences between estimates from the two survey systems. Although no investigation comparing the two designs has been conducted, methodological research indicates that variation in such design features can alter response patterns. For example, the self-administered paper-and-pencil questionnaires used in the SESTAT survey system give the respondent control over the order in which questions are presented and answered. In the SESTAT CATI component and in CPS, interviewers retain control of the order in which questions are presented. This difference can affect survey estimates by causing changes to the meaning of the survey questions. Dillman (2000) writes,

Schwarz (1996) has detailed how, in the normal give-and-take of regular conversations, people tend to give answers that take into account things they have already said… Although the carryover is probably most extensive from questions that immediately follow one another, there is limited evidence that effects also occur when questions are widely separated. Consequently, it is important to recognize early on that a questionnaire cannot be viewed as a compilation of completely independent questions that have no effects on one another. Not only must each question be evaluated on the basis of its individual content, but also with regard to the larger context that often adds or subtracts meaning. (p. 91)

Differences between the SESTAT and CPS data collection procedures might influence the survey estimates in several other ways, which are explained below. Discussions of these issues can be found in Groves (1989), Biemer (1991), Fowler (1993, 1995), Mangione (1995), and Dillman (2000). However, it is also important to note that the reinterview studies conducted on the SESTAT surveys did not find large differences between the initial interview and the second interview. Although this does not indicate that the respondent-reported information is correct, it does indicate consistency and can be used as a measure of data reliability.

Recording Accuracy

Interviewers for both SESTAT and CPS receive extensive hands-on training in questionnaire administration, whereas sample members who are asked to self-administer the SESTAT questionnaires receive no training. This factor may lead to differences in reporting errors between the CPS and SESTAT surveys. However, as mentioned earlier, the reinterview studies have generally found consistent responses in the SESTAT data.

All CPS responses are collected using computer-assisted technology, whereas only 37% of the 1997 SESTAT responses were collected using this technology. These systems take advantage of built-in range checks and internal consistency checks. Out-of-scope and discrepant responses can be reviewed and resolved with the respondent during the interview. Although all SESTAT responses are subjected to editing once they are received, telephone follow-up to resolve discrepancies is conducted only for a small number of critical items. However, it is important to note that the primary variables used in this report, including degree level completed, working during the reference week, and occupation, are critical SESTAT items. Any editing problems with the critical items are resolved with the respondent during the initial interview or during data retrieval contacts. Additionally, in SESTAT, the critical items are divided into "critical completes" and "critical callbacks." Any cases for which critical complete items are not resolved with the respondent are considered incomplete and are not included in the final data system. Editing rules and imputation are used to resolve any items that are not considered "critical" in the SESTAT system.

Reliance on Self-Reports Versus Proxy Reports

Because CPS collects roughly one-half of the data (including labor force data) from household proxies, and any "knowledgeable adult" living in the household is eligible to serve as proxy, the possibility exists for the use of proxy reports to lead to some differences between SESTAT and CPS data.

Availability of Visual Aids

The self-administered SESTAT surveys and the CAPI-administered months of CPS (one-quarter of the April 1997 CPS respondent households) allow for the visual presentation of materials to the respondents. In the SESTAT mail questionnaires, respondents can read the question and all response options and can review the lists of occupation and education codes. In the CAPI CPS, respondents can view response options (i.e., show cards) for at least some items. In the CATI surveys, respondents must rely entirely on auditory perception. However, rarely in CPS are CATI questions asked for which there were CAPI visual aids.

English Language Skills

Both SESTAT and CPS include respondents for whom English is not their primary language. Most SESTAT respondents are graduates of U.S. colleges and therefore have some English language skills.[3] By comparison, CPS includes some respondents with limited English skills and is likely to capture more individuals without U.S. degrees than SESTAT. Language problems are expected to be more of an issue for prebachelor's degree respondents than for postbachelor's respondents. Cases with language problems are also affected by the mode of data collection. Although the self-administering respondents can enlist the help of family or friends or refer to a dictionary if they need assistance with the survey, it is less likely that CATI respondents will do so. To the extent that this is true, the SESTAT CATI component and CPS will either count these individuals as nonrespondents or include the data they provide, which may have higher levels of response error than mail surveys. CPS relies on interviewers who live in the areas where they interview. These interviewers can collect data in some languages other than English.

Level of Detailed Responses to Open-Ended Questions

Both the SESTAT and CPS surveys include open-ended questions. Respondents to self-administered questionnaires often do not provide the same level of detailed responses to open-ended items as they do when interviewers administer the questionnaires because interviewers are trained on the intent of the question and on proper probing techniques. The extent to which coders must perform their task using respondents' answers that lack probing information will increase the likelihood of coding error. Occupation is the main data item used in the report that is collected (at least partially) with open-ended questions. The collection and coding of occupation is discussed in detail in the section "Occupation."

Social Desirability

Many studies have shown that survey respondents tend to report higher levels of socially undesirable behaviors, attitudes, and characteristics when they are allowed to respond without interviewer involvement. (This is not always the case, however. See Fowler [1993], p. 58.) In a study of mode effects on the 1993 NSCG conducted by the U.S. Census Bureau (Keathley, Riker, and Hicks 1995),[4] social desirability was cited as a possible reason for differences found in the reporting of labor force status between the mail and telephone groups. However, this was considered to be a small factor in the differences. Because few of the questions in SESTAT and CPS would be considered sensitive by most respondents, social desirability issues are expected to have only a small effect in both survey systems.

Relationship Between Reference Week and Survey Day

Memory decay is a problem in any survey that collects data on dynamic characteristics such as education, employment, and income. The more time that elapses between the reference week and the survey administration, the more likely it is that a response error will occur. As described earlier, SESTAT and CPS study designs are very different in this regard—SESTAT data collection periods continue for several months and the CPS data collection period is for only 1 week. However, it is important to note that the response variance study conducted by the U.S. Census Bureau for the 1993 NSCG found that the NSCG displayed good reliability, with an index of inconsistency exceeding 50 for only one question and an index of 30 or more for one-quarter of the questions.

Record Checks

Self-administration typically gives respondents more time than telephone surveys to consult records that might improve response accuracy (see Fowler [1993], p. 58). On the other hand, the short time period between the interviewer-administered CPS survey and the reference week might minimize the need for record checks. However, the interviewer-administered SESTAT interviews, which might occur months after the reference week, might suffer from memory decay and record checks would probably be used infrequently. The SESTAT mail surveys, since they are self-administered, would benefit insofar as respondents consult their records. Although this is true in typical survey work, few variables other than salary would be affected by record check differences in SESTAT versus CPS comparisons.

Item Nonresponse

Paper-and-pencil questionnaire self-administration typically results in a larger amount of missing data than either CAPI or CATI administration. The 1993 NSCG mode effect study found that for the 12 characteristics included in the study, the mail group had significantly higher item nonresponse rates than the telephone group, with few exceptions. However, most of the questions had low nonresponse rates for both groups (less than 3%). Also, as discussed previously, the primary data items used in this report, including working during the reference week, looking for work during the reference week, and occupation, are critical complete items in SESTAT, meaning they have zero-item nonresponse. The additional SESTAT question used to determine labor force status, whether the respondent is on layoff from a job, generally has an item nonresponse rate of 1% or less. For CPS data, the item nonresponse for January 1997 data is 0.3% for labor force status and 1.7% for occupation. Methods used by SESTAT and CPS to handle item nonresponse are discussed in the section "Statistical Issues."

Top of page. Back to Top

Data Processing Procedures

In many respects, SESTAT and CPS use similar data processing procedures. Data from CATI and CAPI interviews are examined during the interviews through the use of programmed range checks and internal consistency checks. SESTAT mail questionnaires are processed through a sophisticated computer editing system. Both survey systems conduct postcollection editing for range checks and skip error edits using computerized systems. In addition, the SESTAT computer editing system conducts additional checks, including mark-one edits for questions with more than one response marked, consistency edits, and cross-editing with previous cycle data for a small number of items. The SESTAT surveys also conduct computer-assisted backcoding of "other specify" responses.

SESTAT has additional rules for "best coding" for occupation and field of study to correct respondent recording errors, such as not making a code entry or not reviewing the entire list before making a selection. However, coders are not allowed to change the respondent's chosen code unless evidence exists—based on answers to other questions—that the recorded response is incorrect. SESTAT occupation and education codes are assigned using a computer-assisted system. CPS relies heavily on computerized range checks and internal consistency checks that are contained in the CAPI and CATI programs. However, textual responses to questions on industry and occupation are edited and coded by coders using a computer-assisted system. The section "Occupation" contains additional information on the collection and processing of occupation data.

Although SESTAT and CPS follow many of the same steps in data processing, the techniques and rules for resolving problem cases vary. For example, SESTAT counts as a "noninterview" all cases that are missing one or more critical complete items (after attempted telephone follow-up), but CPS has no such rule. Other important differences in the coding of occupation are discussed in the section "Occupation."

Top of page. Back to Top

Academic Degree

Unlike SESTAT, CPS includes respondents both with and without bachelor's or higher degrees (see "Coverage Issues"). Following is a discussion of the collection of education data in the two survey systems. Both CPS and SESTAT collect information on completed degrees, but SESTAT collects more detailed education data. CPS respondents are asked to report the highest level of education completed as follows:

CPS Education Items

What is the highest level of school (name/you) (has/have) completed or the highest degree (name/you) (has/have) received?[5]

(Previous education level entry displayed – after first month)

Less than 1st grade
1st, 2nd, 3rd, or 4th grade
5th or 6th grade
7th or 8th grade
9th grade
10th grade
11th grade
12th grade No Diploma
High school graduate, high school diploma, or equivalent (for example, GED)
Some college but no degree
Associate's degree in college occupational/vocational program
Associate's degree in college academic program
Bachelor's degree (for example, BA, AB, BS)
Master's degree (for example, MA, MS, MEng, MEd, MSW)
Professional school degree (for example, MD, DDS, DVM)
Doctorate degree (for example, PhD, EdD)

Two types of edit screens can be displayed for this item. First, if the respondent reports a lower level degree than reported in an earlier interview, the respondent is asked to resolve the discrepancy. Second, if a household member's age is less than expected for a reported degree level, then an edit screen is displayed. CPS collects no information about the field of degree or about the institution that awarded the degree.

To be eligible for SESTAT, a sample member must have completed a bachelor's or higher degree in any field. In the NSCG and NSRCG surveys, education history is collected during the baseline survey cycle and includes high school graduation year and state/country, whether an associate's degree was completed, and number of bachelor's and higher degrees received.[6]  For bachelor's and higher degrees, detailed information on the most recent degree, second most recent degree, and first bachelor's degree is also collected during the baseline survey. This degree history (including 2-year degrees) is updated with new degree information during the follow-up surveys. For the SDR, a complete degree history is collected with the Survey of Earned Doctorates (SED), which is used as the sampling frame for the SDR and updated with new degree information during each SDR survey cycle. In all three SESTAT surveys, data collected for bachelor's and higher degrees include the college/university that awarded the degree, month and year awarded, degree level, and the major field of study. In the follow-up surveys, data are also collected for postbaccalaureate and postmaster's certificates. The questions used to collect degree level in the baseline and follow-up surveys are as shown below.[7]

Baseline SESTAT Surveys

What type of degree did you receive?

Master's (including MBA)
Doctorate (e.g., PhD, DSC, DSc, EdD)
Other professional degree (e.g., JD, LLB, ThD, MD, DDS) – specify
Other – specify

Follow-up Survey (as found on the 1997 SESTAT questionnaires)

If you were taking courses or enrolled in a college or university between April 1995 and April 1997, toward what degree or certificate, if any, were you (or are you) working?

Mark (X) this box if no specific degree or certificate and skip to (question number)

If more than one applies, mark the highest level

Bachelor's degree
Post baccalaureate certificate
Master's degree (including MBA)
Post master's certificate
Doctorate (e.g., PhD, DSC, DSc, EdD)
Other professional degree (e.g., JD, LLB, ThD, MD, DDS) – specify
Other – specify

Between April 1995 and April 1997, did you complete a degree or certificate?


(IF YES) What degree or certificate did you receive? Enter number of appropriate type of degree/certificate received from (question number) above.

Type of Degree/Certificate: ________________

For degree level, both CPS and SESTAT use the same categories of bachelor's, master's, doctorate, and other professional degree; therefore, these data are expected to be consistent across survey systems. The main differences between CPS and SESTAT in the collection of degree information is that CPS collects the highest level of school or degree completed and does not collect the field of degree, whereas SESTAT collects the school, level, date, and field for degrees at the bachelor's level and higher.

Top of page. Back to Top

Employment Status

Both survey systems collect data on workforce participation, including principal and secondary jobs, during the survey reference week. Although both survey systems ask similar questions about working for pay or profit during the survey reference week, the battery of questions used to determine labor force status are not the same in the two survey systems. The questions and formulas used in each survey system are shown below.

CPS Labor Force Questions and Definition (from U.S. Census Bureau 2000: figure 5-1)

  1. Does anyone in this household have a business or a farm?

  2. LAST WEEK, did you do ANY work for (either) pay (or profit)?

    Parenthetical filled in if there is a business or farm in the household. If 1 is "yes" and 2 is "no," ask 3. If 1 is "no" and 2 is "no," ask 4.

  3. LAST WEEK, did you do any unpaid work in the family business or farm?

    If 2 and 3 are both "no," ask 4.

  4. LAST WEEK (in addition to the business), did you have a job, either full or part time? Include any job from which you were temporarily absent.

    Parenthetical filled in if there is a business or farm in the household.

    If 4 is "no," ask 5.

  5. LAST WEEK, were you on layoff from a job?

    If 5 is "yes," ask 6. If 5 is "no," ask 8.

  6. Has your employer given you a date to return to work?

    If "no," ask 7.

  7. Have you been given any indication that you will be recalled to work within the next 6 months?

    If "no," ask 8.

  8. Have you been doing anything to find work during the last 4 weeks?

    If "yes," ask 9.

  9. What are all of the things you have done to find work during the last 4 weeks?

    Individuals are classified as employed if they say "yes" to questions 2, 3 (and work 15 hours or more in the reference week or receive profits from the business/farm), or 4.

    Individuals who are available to work are classified as unemployed if they say "yes" to 5 and either 6 or 7, or if they say "yes" to 8 and provide a job search method that could have brought them into contact with a potential employer in 9.

SESTAT Labor Force Questions[8]

A1.   Were you working for pay (or profit) during the week of April 15, 1997? This includes being self-employed or temporarily absent from a job (e.g., illness, vacation, or parental leave), even if unpaid.

STUDENTS: Do NOT count financial aid awards with no work requirement

Yes — SKIP to A7

A2.   (IF NO) Did you look for work during the four weeks preceding April 15, 1997 (that is, anytime between March 19 and April 15, 1997)?


A3.   What were your reasons for not working during the week of April 15?

Mark (X) all that apply

     Year Retired
     19 |__| |__|
On layoff from a job
Family responsibilities
Chronic illness or permanent disability
Suitable job not available
Did not need or want to work
Other — Specify

SESTAT Labor Force Definition (from SESTAT website definition of LFSTAT variable)[9]:

Status Definition
1 (Employed) Working during reference week (A1 = Y)

2 (Unemployed) Not working during reference week (A1 = N) and
Not looking for work (A2 = N) and
Reason for not working is layoff from job (A3(2) = Y)

2 (Unemployed) Not working during reference week (A1 = N) and
Looking for work (A2 = Y) and
Reason for not working is not layoff from job (A3(2) = N)

2 (Unemployed) Not working during reference week (A1 = N) and
Looking for work (A2 = Y) and
Reason for not working is layoff from job (A3(2) = Y)

3 (Not in Labor Force) Not working during reference week (A1 = N) and
Not looking for work (A2 = N) and
Reason for not working is not layoff from job (A3(2) = N)

Despite these differences, the definition of "employed" is similar in the two survey systems. In CPS, an individual is classified as employed if during the reference week (1) the person did any work for pay or profit; (2) the person did unpaid work in a family business or farm and worked 15 hours or more per week or received profits; or (3) the person had a job, including any job from which he or she was temporarily absent. For the SESTAT labor force variable (LFSTAT), an individual is classified as employed if he or she was working for pay or profit during the reference week, including being self-employed or temporarily absent from a job, even if unpaid. The main difference is that CPS specifically asks about work on a family business or farm and classifies the individual as employed if working 15 hours or more per week or receiving profits, whereas SESTAT simply instructs the individual to include self-employment.

The two survey systems contain additional differences in how they define "unemployed." In CPS, an individual who is not working is classified as unemployed if (1) the person is on layoff from a job and has been given a date to return to work or has been given any indication of being recalled to work within the next 6 months or (2) the person has been trying to find work during the last 4 weeks and lists a job search method that could have brought him or her into contact with a potential employer. For the SESTAT labor force variable, an individual who is not working is classified as unemployed if (1) the person is on layoff from a job or (2) the person was looking for work during the 4 weeks preceding the reference week.

Both SESTAT and CPS collect information on full-time or part-time employment status during the survey reference week. In both survey systems, full-time or part-time status can be determined for either principal job alone or for all jobs combined. CPS collects the number of hours worked per week on the main job and the number of hours worked per week on all other jobs. SESTAT collects the number of hours worked per week on the main job and the full-time or part-time status for all jobs combined. In this report, full time is defined as working 35 or more hours per week for all jobs combined. The following questions are used to collect full-time and part-time employment data:

CPS Full-Time or Part-Time Employment Questions

How many hours per week (do/does) (name/you) USUALLY work at (your/his/her) (job?/main job?) By main job we mean the one at which (you/he/she) usually (work/works) the most hours.

How many hours per week (do/does) (you/he/she) USUALLY work at (your/his/her) other (job/jobs)?

SESTAT Full-Time or Part-Time Employment Questions (from the 1997 NSCG mail survey)[10]

A7.   Counting all jobs held during the week of April 15, 1997, did you USUALLY work…

A total of 35 or more hours per week — SKIP to A10
Fewer than 35 hours per week

(The questionnaire section for collecting principal job information starts with the following statement: The next set of questions ask about your work on your principal job during the week of April 15, 1997.)

A39.   During a typical week on this job, how many hours did you usually work?

Top of page. Back to Top


Data collection, code assignment, and coding schemes are used to compare occupation data for CPS and SESTAT. With CPS, the industry and occupation information are collected using open-ended questions and dependent interviewing. During the initial CPS interviews, this information is collected using the following open-ended questions:

What kind of business or industry is this?
READ IF NECESSARY: What do they make or do where (you/he/she) (work/works/worked)?

What kind of work (do/does/did) (name/you) do, that is, what (is/was) (your/his/her) occupation? (For example: plumber, typist, farmer)

What (are/were) (your/his/her) usual activities or duties at this job?

Dependent interviewing for the industry and occupation questions is used for households that were included in the sample the previous month. Respondents are provided with the name of the employer they provided in the previous month and asked if they still work for that employer. If the answer is "no," respondents are asked the independent questions on industry and occupation. If the answer is "yes," respondents are asked, "Have the usual activities and duties of your job changed since last month?" If the answer is "yes," the duties have changed, then the respondents are asked the independent questions on occupation and activities or duties. If the duties have not changed, then the respondents are asked to verify the previous month's description through the question, "Last month, you were reported as (previous month's occupation or kind of work performed) and your usual activities were (previous month's duties). Is this an accurate description of your current job?" If the answer is "yes," the previous month's occupation is brought forward and no coding is required. If the answer is "no," respondents are asked the independent questions on occupation activities and duties.

After collection, the new monthly CPS cases and cases in which the industry or occupation changed from the previous month are sent for coding by the industry/occupation coders. These specialized coders use a computer-assisted coding system and assign codes based primarily on the respondent-provided verbatim occupation, industry sector, and job duties. Ten percent of each month's cases are selected to go through a quality assurance system to evaluate the work of each coder. The selected cases are verified by another coder after the monthly processing is completed. CPS coders have a short time to complete their work and are evaluated on speed. They may also be penalized for inconsistent coding identified during the quality assurance process.

In the SESTAT surveys, the respondent is asked to provide two types of occupation information: (1) a verbatim description of the occupation and (2) a self-selected occupation code. First, the respondent is asked to describe his or her occupation in an open-ended question format. In the mail questionnaire, the respondent is then asked to select a job code from a printed list of approximately 120 codes. These SESTAT codes are organized in categories listed in alphabetical order by major occupational group. The questions used on the mail survey to collect occupation are as follows:

What kind of work were you doing on your principal job held during the week of April 15, (survey year) — that is, what was your occupation?

Please be as specific as possible, including any area of specialization.

Example: High school teacher — Math (for NSCG and NSRCG)
Example: College professor – Electrical Engineering (for SDR)


Using the JOB CODES (List B: pages x-x), choose the code that BEST describes the work you were doing on your principal job during the week of April 15, (survey year).

CODE           |___|___|___|               NOTE: Job codes range from 010 to 500

In the SESTAT CATI survey, the respondents are first asked for the verbatim occupation description, as in the mail survey. The SESTAT interviewers are trained to collect as detailed a response as possible, including a job title. The CATI system then uses an occupation dictionary that was developed for the SESTAT system. Once the interviewer has finished collecting the occupation verbatim and description data, the CATI system compares the description to the occupation dictionary. This dictionary contains two sections, an autocoding section and a branching section. If the description matches an occupation in the autocoding section, CATI will automatically assign the appropriate three-digit SESTAT occupation code, and no additional occupation information is collected from the respondent. If the description matches an occupation in the branching section, CATI will display the appropriate screen that offers a reduced number of categories. The respondent will then be asked to select the category on that screen that best describes his or her occupation. Based on this selection, CATI assigns the appropriate three-digit code or proceeds through a sequence of additional occupation coding screens until a code can be assigned. If the occupation description does not match any of the dictionary items, then CATI will display the standard main heading screen. This screen offers the respondent a choice of eight main headings, such as "sales or marketing" or "scientific or engineering occupations," as well as an "other" option. The respondent will be asked to select a main heading. CATI will then continue to display occupation coding screens until a three-digit code can be assigned. The code assigned through one of these processes is considered the respondent's self-selected occupation code.

After survey collection, both mail and CATI SESTAT surveys undergo a "best coding" process; the purpose is to verify the respondent's self-reported occupation code.[11]  Trained occupation coders assign a "best code" to each occupation after considering the open-ended response and self-selected code provided by the respondent, as well as other relevant survey data such as employer type, number of people supervised, educational degrees, work activities, and salary. Coders receive detailed training and instructions, and at least 10% of their work is verified. SESTAT coders are encouraged to take the time needed to review and analyze all the available information and choose the best code. Coders are not penalized for choosing a different code than the one chosen during verification.

In the SESTAT surveys, occupation data are collected independently during each survey cycle. Unlike CPS, no dependent interviewing is used for SESTAT occupation data collection. However, SESTAT coders on follow-up surveys are instructed to consider the best occupation code assigned in the previous cycle under certain conditions: the respondent reports working for the same employer and in the same job as the previous cycle, the start date of the current job is before the reference date of the previous cycle, and the previous and current verbatim descriptions appear reasonably similar. The difference between CPS and SESTAT is that SESTAT coders are trained to consider the previous code, whereas CPS uses the previous code without review by a coder. Another important difference is that the previous CPS data are generally 1 month old, whereas the previous SESTAT data are 2 years old.

Different occupational taxonomies are used in CPS and SESTAT, but both taxonomies were developed from the 1980 Standard Occupational Classification (SOC) maintained by the Bureau of Labor Statistics.[12]  Therefore, the two taxonomies are generally consistent. However, the SESTAT system uses broad categories for non-S&E jobs and more specific categories for S&E jobs. The CPS data are coded in both detailed and broad classifications for all jobs. Although the taxonomies are generally consistent, differences in training instructions and decision rules could result in differences in occupational classifications across the two survey systems. In particular, SESTAT occupation coders are trained to review the respondent's self-selected code and assign a new code only when it is more accurate; when it is not possible to determine a better code, the respondent's self-selected code should be maintained. Coders are instructed not to change responses unless sufficient evidence exists that the respondent has made a mistake and the information provided allows the assignment of a better code. In contrast, CPS coders do not have a self-selected code and instead assign codes based on the verbatim responses, duties, and industry sector.

In summary, both CPS and SESTAT collect verbatim occupation descriptions using similar questions. However, the rest of the occupation coding process is different for the two survey systems. In CPS, an additional open-ended question is asked to collect job activities and duties. The CPS occupation coders use the occupation description, duties, and industry to assign a code after collection. During follow-up months, CPS uses dependent interviewing. In SESTAT, respondents are asked to select their own occupation code, either from a printed list or through a series of CATI screens. SESTAT occupation coders review these codes along with the occupation descriptions, employer name, employer sector, work activities, supervisory responsibilities, salary, educational history, and information on the respondent's job during the previous survey round. These different collection and code assignment processes could cause differences in the quality of occupation data for the two survey systems. In addition, differences in coder training, emphasis on coding speed, and coding decision rules could also result in differences in the data.

Top of page. Back to Top

Other Respondent Classifications

Respondent characteristics that can be used in analysis include sex, age, and race/ethnicity. Again, SESTAT and CPS collect these data using slightly different methods. The main difference is that CPS collects data by proxy but SESTAT does not use this method. Recent studies have shown that people report ethnicity differently (for example, according to a person's age)


The way CPS and SESTAT collect data on sex differs slightly. SESTAT data come from the sampling frames or the baseline surveys. For NSCG cases sampled from the decennial census, the sex variable comes from the census long form and the information is carried forward into the SESTAT data system. For the SDR, the information is collected on the SED and carried over into the data system, but it was verified in 1993 because there had been a major redesign in the survey system and some cases were very old. For the NSRCG, the mail survey asks the respondent to choose male or female and the CATI survey instructs the interviewer to code without asking or to ask if necessary. For all SESTAT follow-up surveys, this item is asked periodically for sample person verification but the original value from the sampling frame or baseline survey is always carried forward in the data system. The CPS survey instructs the interviewer to code sex without asking unless the interviewer is unable to determine by voice or appearance.


Both SESTAT and CPS determine age by collecting the respondent's birth date and resolving any discrepancies with birth date information collected in earlier survey cycles. The SESTAT interviews (both self-administered and interviewer administered) include the question, "What is your birth date?" Respondents are asked to provide the month, day, and year of their birth. This question is included in each survey cycle. During follow-up surveys, the birth date collection is used for sample person verification. If the respondent gives a birth date that is not consistent with a prior cycle, then editing and data retrieval procedures are conducted to resolve the discrepancy. These data are recoded into "age" for analysis.

CPS respondents are asked, "What is (name's/your) date of birth?" Again, interviewers record the month, day, and year and the data are recoded into "age" for analysis. The CPS survey provides the interviewer with data from previous interviews with the household. This information allows the interviewers to ask for clarification if the answers are discrepant. CPS also uses a scripted follow-up question after asking for the birth date, "As of last week, that would make (name/you) ((age)/approximately (age)/less than 1/over 98) (years/year) old. Is that correct?"[13]

Race and Ethnicity

The SESTAT data for race/ethnicity[14] come from the sampling frames or the baseline surveys. For NSCG cases sampled from the decennial census, race/ethnicity comes from the census long form and the information is carried forward into the SESTAT data system. For the SDR, the information is collected on the SED and carried over into the data system, but it was verified in the 1993 cycle. Race/ethnicity is collected in the NSRCG survey each cycle. In all of these surveys, Hispanic origin is collected in a separate question from race. The census form asks the race question before the question on Hispanic origin, whereas SED and NSRCG ask Hispanic origin first. The Hispanic question has slight wording differences between the census long form, SED, and NSRCG. The census form includes subcategories of Hispanic as part of the same question, but SED and NSRCG collect this information in a separate question.

The race categories for NSRCG are white, black or African American, Asian or Pacific Islander, American Indian or Alaskan Native, and Other – specify. The responses to the "Other" category are backcoded into existing categories after data collection according to the SESTAT backcoding rules. The race categories for SED are the same with a slight wording difference, and there is no "Other" category. The 1990 census form contains categories similar to NSRCG except the census form uses the designation "Black or Negro," asks American Indians to record the tribe, lists Eskimo and Aleut as separate categories, and lists subcategories for Asian/Pacific Islander.

CPS asks respondents to select a race category from the same list of options used in the NSRCG surveys, with slight wording differences. CPS asks the race question before the ethnicity question. In addition, CPS collects the verbatim race responses provided by the respondents but edits any such responses back into the four main race groups. As described in the CPS Glossary of Subject Concepts, persons of Hispanic origin are determined on the basis of a question that asks for self-identification of the person's origin or descent. Respondents are asked to select their origin (or the origin of some other household member) from a "flash card" listing 20 ethnic origins. Persons of Hispanic origin are those who indicated that their origin was Mexican American, Chicano, Mexican (Mexicano), Puerto Rican, Cuban, Central or South American, or other Hispanic.[15]

Top of page. Back to Top


[1] During the 1997 survey cycle, data for individuals originally sampled as part of the 1993 and 1995 NSRCG surveys were collected in the NSRCG panel survey. The survey instrument used to collect data was the same as the NSCG instrument, except it included some additional education questions. These data were analyzed as part of the NSCG component and are considered part of that component, but the data collection method was similar to NSRCG, with most responses collected by CATI. In the 1997 NSRCG panel survey, 96% of the responses were collected by CATI.

[2] The SESTAT guidelines include the following rules about the collection of proxy information from knowledgeable respondents, such as relatives or university officials: (1) proxy information may be used to determine eligibility for sample members who are living outside the United States, institutionalized, or deceased during the reference week; (2) a very limited set of proxy responses have been permitted for individuals who have extremely high positions and are too busy to respond otherwise; and (3) for a small number of cases involving language problems, respondents may ask another person to act as an interpreter for the questionnaire or CATI interview, but this is not considered a proxy situation.

[3] In the 1997 SESTAT, 96% of the eligible population had received their most recent degree from a U.S. institution.

[4] This report had some methodological issues that could affect the conclusions. It indicated the significance testing was performed assuming a design effect of 1. However, it was later determined that the correct design effect was 1.6, but the tests were not performed again. The authors also indicated that some of the results that were originally found to be significant might not be significant. Therefore, it is possible that the social desirability issue was even smaller than originally thought by the authors.

[5] CPS also fields the school enrollment supplement every October, in which it asks about grades or year of school "attending."

[6] The NSCG baseline survey for the 1990 decade was the 1993 cycle. The NSRCG is a baseline survey every cycle.

[7] The baseline survey for the SDR is the SED survey, which uses a different format than the one shown.

[8] For a copy of the 1997 NSCG mail survey refer to http://sestat.nsf.gov/.

[9] Additional information on the LFSTAT variable is available at http://sestat.nsf.gov/docs/lfstat.html.

[10] For a copy of the 1997 NSCG mail survey refer to http://sestat.nsf.gov/

[11] In the SESTAT follow-up surveys, some cases are excluded from the best coding process. The exclusion rule on the 1999 NSCG is as follows: If the 1999 occupation self-code is not 055, 099, 254, 255, or 500, and the 1995 best code, 1997 self code, 1997 best code, and 1999 self-code all have the same code and are not blank, then set the 1999 best code = 1999 self-code and do not send the case to occupation best coding.

[12] CPS is scheduled to begin using the 2000 SOC system in 2003. SESTAT occupation coding is also expected to be based on the 2000 SOC codes starting in 2003.

[13] CPS caps the allowable range for age at 90—interviewers are instructed to code "90" for anyone age 90 or older. SESTAT excludes anyone age 76 or older from the survey system.

[14] The current Office of Management and Budget standards on questions about race and ethnicity were not in effect during 1997, the year on which this discussion is based. In evaluating the match between SESTAT and CPS data on race and ethnicity in future years, one would have to consider each survey system's schedule for incorporating the new standards.

[15] CPS Annual Demographic Survey Glossary of Subject Concepts, http://www.bls.census.gov/cps/ads/1996/sglosary.htm.

Comparison of the National Science Foundation's Scientists and Engineers Statistical Data System (SESTAT) with the Bureau of Labor Statistics' Current Population Survey (CPS)
Working Paper | SRS 07-205 | August 2007