

U.S. Academic Scientific Publishing
8.0 Key Variables Associated with InstitutionLevel Publication CountsThis section discusses model development where the basic unit of analysis is the institutionyear (i.e., the number of publications for an institution during a given year). Section 8.1 describes the analyses of fractional count publications in the expanding journal set. Because various groups of independent (explanatory) variables were highly correlated, a stepwise regression approach was used on each group of such variables to determine the few that accounted for most of the explanatory ability. For example, of eight different measures of research funding, only three were retained after the stepwise analysis. Variables were retained only if they resulted in the explanatory capability of the model increasing by 1%. Variables retained from the first round of stepwise regressions were then entered together into a final stepwise regression. Using similar criteria for retention, the model only contained three variables: 1) total academic R&D expenditures, 2) number of S&E postdoctorates, and 3) number of S&E doctoral degrees granted. The coefficient in this regression was 3.31 fractionalcount publications per $1M in academic R&D expenditures, 0.882 per S&E postdoctorate, and 1.18 per S&E Ph.D. recipient. Faculty counts were not found to improve the model's fit, although we believe the reason for this may be the deficiencies in the various faculty measures that are available. Section 8.2 describes a repeat of the regression analysis for whole count publications in the expanding journal set. The same three explanatory variables were retained. The coefficients in this regression were 4.78 wholecount publications per $1M in academic R&D expenditures, 1.66 per S&E postdoctorate, and 1.45 per S&E Ph.D. recipient. Section 8.3 discusses a path analytic model that was developed under the assumption that academic R&D expenditures have both a direct effect on publication counts and an indirect effect (through funding of some S&E postdoctorates and Ph.D. graduate students). We found that each $1M in additional academic R&D funding was associated with an additional 2.53 postdoctoral researchers and 1.39 additional Ph.D. recipients. The combined direct and indirect effect of adding $1M in funding was an additional 7.18 fractional publication counts and 11.0 whole publication counts. Section 8.4 presents visual plots of the relationship of publication counts, academic R&D expenditures, postdoctoral counts, and S&E doctoral recipient counts. Harvard University shows up as an outlier in some of these graphs and possible reasons for this anomaly are discussed. 8.1 Analyses of Fractional Count Publications in the Expanding Journal SetThe Publications Trends database contains sets of personnel and financial variables that are highly correlated. One of the initial steps in constructing a model was to screen each set of variables to determine which of them appeared to have the most explanatory power. An alternative approach would have been to conduct a factor analysis and extract one or two factors for inclusion in a model. However, NSF and SRI felt that it might be difficult to interpret factors clearly and unambiguously, and it would be better to determine initially if culling of independent variables would yield a parsimonious model with good explanatory ability. Stepwise regression was used for screening purposes for each set of related variables to identify likely candidates for a more comprehensive model. Because of the known problems of stepwise regression in deflating pvalues, a criteria of incremental rsquare values was used. In addition, all coefficients for variables entered into the model were statistically significant at the p < 0.001 level. The results of each stepwise regression were examined for reasonableness using the authors' experience and knowledge of the field, and for consistency across the various dependent variables and academic fields, before selecting the variables for the final regression models. Parsimony was an important criteria in model building, and out of the many dozens of variables considered, final models typically contained only 3 to 6 variables. In the first set of analyses the dependent variable was publications as measured by fractional counts in the expanding journal set. The data set was aggregated to the institutionyear level. Financial variables were deflated using the GDP implicit price deflator; personnel variables were lagged by one year and financial variables by two years. Analyses were conducted using stepwise linear regression. We typically retained any variable that, when entered into the regression, increased rsquare by at least 0.01. All such variables were highly statistically significant when entered. In the first stepwise regression the only independent variables were faculty counts. The set of faculty count variables (all lagged by 1 year) in their order of entry into the regression and the cumulative rsquare values after entry were as follows: 1) full professors (0.363), 2) associate professors (0.452), and 3) total faculty (0.486). We retained these three variables for inclusion in later regressions. Instructors and assistant professors did not meet the 0.01 threshold. When only these three independent variables were included, the coefficients were 1.31 for full professors (i.e., 1.31 additional fractional counts per year per full professor), 3.70 for associate professors, and 0.99 for total faculty. Since addition of an associate professor also increases total faculty, the net effect of adding an associate professor would be –2.71 and the net effect of adding a full professor would be 2.30. Possible explanations for the negative coefficient for associate professors may include associate professors being disproportionately located in campuses which do not have a primary research responsibility or tenured associate professors who do not achieve full professor status due to lack of publications. In the second stepwise regression the only independent variables were the Carnegie classification indicators. The set of Carnegie classifications in their order of entry into the regression and the cumulative rsquare values after entry were as follows: 1) R1 (0.487) and 2) Medical (0.498). We retained R1 and Medical. The other Carnegie classifications (including the indicator of whether the institution was publicly or privately financed) did not meet the 0.01 threshold. In the third stepwise regression the only independent variables were academic R&D expenditure variables. The set of academic R&D expenditures variables in their order of entry into the regression and the cumulative rsquare values after entry were as follows: 1) total academic R&D expenditures (0.835), 2) federally financed academic R&D expenditures (0.851), and 3) other funding sources of academic R&D expenditures (0.863). The remaining academic R&D expenditure variables did not meet the 0.01 threshold. The remaining variables included various sources of funding for academic R&D expenditures (industry, the institution, or state/local government), total academic basic research expenditures (total, federally financed, and nonfederally financed), research equipment expenditures (total, federally financed, and nonfederally financed), institution financed organized research, and unreimbursed indirect costs. In the fourth stepwise regression the only independent variables were enrollment variables. The set of enrollment variables in their order of entry into the regression and the cumulative rsquare values after entry were as follows: 1) number of S&E graduate students (0.459) and 2) fall enrollment of undergraduates (0.519). The enrollment of graduate students and total fall enrollment did not meet the 0.01 threshold. All fall enrollment variables were institutionwide rather than S&E specific. The coefficient for undergraduates was negative. In the fifth stepwise regression the only independent variables were postdoctoral count variables. The set of postdoctoral count variables in their order of entry into the regression and the cumulative rsquare values after entry were as follows: 1) total S&E postdoctorates (0.702), 2) S&E postdoctorates supported by federal research grants (0.782), 3) S&E postdoctorates with M.D.s supported by federal research grants (0.803), and 4) S&E postdoctorates supported by federal traineeships (0.816). The coefficients for these variables were 0.11, 5.21, 8.25 and 4.9 indicating potentially widely varying impacts of different types of S&E postdoctorates. The remaining variables that did not meet a 0.01 threshold included S&E postdoctorates with M.D.s in total and by support type (federal fellowships, federal traineeships, and nonfederal sources), S&E postdoctorates without M.D.s in total and by support type (federal fellowships, federal traineeships, and nonfederal sources), and S&E postdoctorates by support type (federal fellowships, federal traineeships, and nonfederal sources) In the sixth stepwise regression the only independent variables were nonfaculty doctoral research staff count variables. The only nonfaculty doctoral research staff count variable in the regression and the cumulative rsquare values after entry was nonfaculty doctoral research staff (0.498). Nonfaculty doctoral research staff with M.D.s did not meet the 0.01 threshold. In the seventh stepwise regression the only independent variables were degree award count variables. The set of degrees awarded variables in their order of entry into the regression and the cumulative rsquare values after entry were as follows: 1) S&E doctorates from the NSF Survey of Earned Doctorates (0.703), and 2) S&E BA/BS degrees (0.715). S&E doctorates from the IPEDS Completions Survey, and S&E Master's degrees awarded did not meet the 0.01 incremental threshold. The coefficient for BA/BS degrees was negative, consistent with the finding for number of undergraduate students. The two doctoral degree variables had a correlation of 0.973. The complete set of retained variables identified above was entered into the eighth stepwise regression. The variables in their order of entry into the regression and the cumulative rsquare values after entry were as follows: 1) total academic R&D expenditures (0.840) and 2) S&E postdoctorates supported by federal research grants (0.922). All of the other variables increased rsquare to 0.945. We note that total S&E postdoctorates has a 0.955 correlation with S&E postdoctorates supported by federal research grants, and it appeared reasonable to substitute the former for the latter under the assumption that nonfederally supported S&E postdoctorates also made a contribution to publications. When total S&E postdoctorates were substituted for S&E postdoctorates supported by federal research grants the cumulative rsquare in the regression was slightly reduced to 0.912. Rerunning the eighth stepwise regression with this substitution, S&E doctorates from the NSF Survey of Earned Doctorates was the only additional variable that satisfied the 0.01 threshold and therefore enters the set of explanatory (independent) variables, increasing the rsquare to 0.927. Finally, we ran a ninth stepwise regression including all independent variables, including a variable not previously included in regressions (number of patents issued in 1988 to 2001 for the institution). No change was found in the variables entering the regression equation. Thus, after screening all of the variables, we conclude that a model including only three variables (total academic R&D expenditures, S&E postdoctorates, and S&E Ph.D. recipients) captures most of the explanatory ability of the entire set of potential explanatory variables. These explanatory variables are also highly correlated with publications as measured by fractional counts, with correlation coefficients of 0.913, 0.840, and 0.838, respectively. The coefficients for this model were 3.31 publications per $1M in academic R&D expenditures, 0.882 per S&E postdoctorate, and 1.18 per S&E Ph.D. recipient, with corresponding standard errors of 0.110, 0.019, and 0.057, respectively. We also conducted analyses to determine whether there were twofactor interactions between total academic R&D expenditures, S&E postdoctorates, and S&E Ph.D. recipients, by adding these interactions as independent variables into our model. A statistically significant interaction term between academic R&D expenditures and S&E postdoctorates almost managed to reach the 1% threshold for model inclusion. Visual inspection of scatterplots of expected and observed average publications suggested that the improvement in the model's fit was minimal, and we decided to exclude the interaction term. 8.2 Analyses of Whole Count Publications in the Expanding Journal SetWe repeated this process for publications as measured by whole counts in the expanding journal set. The set of variables in their order of entry into the regression and the cumulative rsquare values after entry were as follows: 1) total academic R&D expenditures (0.816), 2) S&E postdoctorates (0.926), and 3) S&E Ph.D. recipients (0.936). These explanatory variables are also highly correlated with publications as measured by whole counts, with correlation coefficients of 0.903, 0.877, and 0.815, respectively. No other variable satisfied the 0.01 threshold. Cumulatively all variables accounted for an rsquare of 0.959. The coefficients for this model were 4.78 publications per $1M in academic R&D expenditures, 1.66 per S&E postdoctorate, and 1.45 per S&E Ph.D. recipient, with corresponding standard errors of 0.15, 0.03, and 0.08, respectively. Since both publications as measured by fractional and whole counts in the expanding journal set could be explained with the same three variables — total academic R&D expenditures, S&E postdoctorates, and S&E Ph.D. recipients — we retained these three variables for further analysis.[19] 8.3 Path Analytic Model for Publication CountsWe developed a path analytic model under the assumption that total academic R&D expenditures have a direct and indirect effect on publications, where the indirect effect is through funding of some S&E postdoctorates and S&E Ph.D. recipients. We also assumed that S&E postdoctorates and S&E Ph.D. recipients have a direct effect on number of publications. A path analytic model allows total academic R&D expenditures to directly influence the number of S&E postdoctorates and S&E Ph.D. recipients, and for total academic R&D expenditures, the number of S&E postdoctorates, and the number of S&E Ph.D. recipients to directly influence the number of publications. For publications as measured by fractional counts in the expanding journal set, the regression coefficient for total academic R&D expenditures was 3.31 (i.e., 3.31 publications per each $1M in academic R&D expenditures), the coefficient for S&E postdoctorates was 0.882, and the coefficient for S&E Ph.D. recipients was 1.18. We then performed a regression where the dependent variable was the number of S&E postdoctorates and the independent variable was total academic R&D expenditures. We found that each $1M additional funding was associated with an increase of 2.53 additional S&E postdoctorates (with a standard error of estimate of 0.05). A similar regression showed that each $1M in academic R&D funding was associated with an increase of 1.39 S&E Ph.D. recipients (with a standard error of estimate of 0.02). Thus, adding together the direct and indirect effects, we found that each $1M in academic R&D expenditures resulted in 7.18 (i.e., 3.31 + 0.882 x 2.53 + 1.18 x 1.39) publication counts (or expressed slightly differently, R&D expenditure per fractional publication count is about $139K ($1M/7.18)). For publication counts as measured by whole counts in the expanding journal set, the regression coefficient for total academic R&D expenditures was 4.78 per $1M, the coefficient for S&E postdoctorates was 1.66, and the coefficient for S&E Ph.D. recipients was 1.45. The coefficient for total academic R&D expenditures regressed on the number of S&E postdoctorates was 2.53 per $1M. The coefficient for total academic R&D expenditures regressed on the number of S&E Ph.D. recipients was 1.39 per $1M. Consequently, adding together the direct and indirect effects, we find that each $1M in additional funding is associated with an increase of 11.0 (i.e., 4.78 + 1.66 x 2.53 + 1.45 x 1.39) publications as measured by whole counts (or, expressed slightly differently, R&D expenditure per whole count publication is about $91K ($1M/11.0)). 8.4 Visual Relationship Among Variables in the Path ModelGiven that academic R&D expenditures, the number of S&E postdoctorates, the number of S&E Ph.D. recipients and publication counts are related, it is worthwhile to examine scatterplots of those relationships. Scatterplots were developed using variables averaged over the 1988 to 2001 time period at the institution level. Figure 14 is a scatterplot of academic R&D expenditures versus number of S&E postdoctorates for the top 200 R&D performing R&D academic institutions. The number of S&E postdoctorates on staff decrease to zero as academic R&D expenditures decrease. There are institutions that expend academic R&D funds while employing few, if any, S&E postdoctorates. The outlier in the plot is Harvard University, which reported an average of 2,584 S&E postdoctorates and academic R&D expenditures of $221.2M (i.e., a modest average of $85,600 dollars of academic R&D expenditure per S&E postdoctorate). The lower expenditure per S&E postdoctorate may reflect the large number of affiliations between Harvard University and other medical institutions (including nonprofit hospitals), which is reflected in an unusually large proportion of Harvard S&E postdoctorates who have MD degrees or are in the medical sciences.[20] Many of these S&E postdoctorates may be supported through clinical revenues, supported by nonprofit hospitals (whose expenditures do not show up in the academic R&D expenditures database), or by R&D revenues that do not pass though Harvard and so are not captured in the academic R&D data. Other possible explanations include double counting of S&E postdoctorates if they are affiliated with both a clinical and nonclinical department. Excluding Harvard University from the model increases the rsquare for explaining publications as measured by whole counts in the expanding journal set using academic R&D expenditures and number of S&E postdoctorates by about 0.008. Figure 15 is a scatterplot of academic R&D expenditures versus the number of publications as measured by whole counts in the expanding journal set. With the exception of Harvard University, the data show a clear linear relationship between these two variables. Figure 16 is scatterplot of the number of S&E postdoctorates versus the publications measured by whole counts and the expanding journal set. This relationship appears to have a slight curvilinearity. Harvard University appears to be consistent with the other institutions (i.e., they are obtaining approximately the same number of publications per S&E postdoctorate). The rapid rise in the number of publications with relatively small number of S&E postdoctorates may suggest that S&E postdoctorates are not necessary for a modest number of publications; however all institutions with larger number of publications have a substantial number of S&E postdoctorates. An alternative explanation would be that postdoctorates are highly associated with increased production of articles. Figure 17 is a scatterplot of the number of S&E Ph.D. recipients versus total academic R&D expenditures. There are a number of institutions with relatively low total academic R&D expenditures but substantial number of S&E Ph.D. recipients, suggesting that academic R&D expenditures may not be a major source of funding for S&E Ph.D. recipients in some institutions. For example, some of the S&E Ph.D. recipients may be funded as teaching assistants. Figure 18 is a scatterplot of the number of S&E Ph.D. recipients versus the number of publications as measured by whole counts in the expanding journal set. Some institutions also generate substantially more publications than could be expected by the model on the basis of S&E Ph.D. recipients alone.
Footnotes
[19] For comparison purposes, we also fit a Hierarchical Linear Model (with years nested within institutions) using total academic R&D expenditures, postdoctorates, and S&E Ph.D. recipients as the explanatory variables and both publications as measured by fractional and whole counts in the expanding journal set as dependent variables. These three explanatory variables were statistically significant at the 0.0001 level in the HLM model using fractional and whole counts as dependent variables. [20] In 2000, approximately 60% of Harvard postdoctorates were in the medical sciences versus 26% for the remaining 199 institutions and 33% of Harvard postdoctorates had M.D.s as compared to 16% of postdoctorates in the remaining institutions. 
