nsf.gov - NCSES U.S. Academic Scientific Publishing - US National Science Foundation (NSF)
text-only page produced automatically by LIFT Text
Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation National Center for Science and Engineering Statistics
U.S. Academic Scientific Publishing

10.0 Observed and Expected Publication Trends Within Institutions

 

In this section we extend the previous model to explicitly examine the effect of changes in resources over time on changes in publication counts within institutions. The models were able to account for a substantial portion (about two-thirds) of the variability over time within institutions in publications as measured by whole counts in the expanding journal set. Corresponding models for changes in fractional publication counts had a worse fit, suggesting that influential factors other than those available in our database were present.

We find that a change in resources within an institution over time results in a smaller change in publication counts than the difference in publication counts of two institutions with resources that differ by the same amount. For fractional counts, changes in resources within institutions results in changes in publication counts that are only 35% of the difference in publication counts that would be expected from two institutions that differed by the same amount of resources; for whole counts the change is 67% of that expected from different institutions. This suggests either that academic R&D expenditures, S&E postdoctorates, and S&E doctoral recipients are partial surrogates for other institutional characteristics that do not change with an increase in academic R&D funding, or that institutions are less efficient at using additional funds to expand publications (and perhaps less sensitive to reductions in funding) than would be expected on the basis of institution-to-institution differences in publications and resources.

The modeling conducted in this section also revealed that the within-institution effects were sensitive to the type of funding and the type of S&E postdoctorates. Increases in federally financed academic R&D expenditures were associated with larger within-institution changes in whole count publications than non-federally financed academic R&D expenditures, changes in S&E postdoctorates without M.D.s were associated with positive increases in publication counts, and changes in S&E postdoctorates with M.D.s were associated with negative changes in publication counts (perhaps by redirecting resources from research to other activities such as clinical activities).

Section 10.1 presents scatterplots of the expected and observed trends within institutions based on the previous model. Section 10.2 develops a hierarchical linear model (HLM) model that incorporates both between- and within-institution variability in the independent variables. Section 10.3 develops a model where the dependent variable is the deviation of each institution's publications from its average publication output. Section 10.4 presents plots comparing the expected and observed annual percentage change in publications. An observed average annual percentage change is defined as the average annual change for observed counts for an institution divided by the average observed count for that institution over time, and the expected average annual percentage change is similarly defined. The average annual percentage changes in expected fractional count publications are about 2.2% larger than the average annual percentage changes in observed fractional count publications, again reflecting increased resources necessary per fractional count publication over time.

10.1 Scatterplot of Average Annual Changes of Expected and Observed Trends Within Institutions

Because differences in publications and resource use between institutions are much greater than differences within institutions across time, the coefficients of the regression model essentially quantify long term relationships between resource use and publication output. To examine the extent to which the model captures shorter term relationships between changes in resource use and publications within institutions, we performed two linear regressions for each institution. In the first regression the dependent variable was the observed publication count and the independent variable was year; in the second regression the dependent variable was the expected publication count[38]and the independent variable was year. The coefficient of year in the first regression is the average change per year in observed publications; the coefficient of year in the second regression is the average change per year in expected publications. We compared these two average annual changes to ascertain the extent to which the average change per year in expected publications matched the average change per year in observed publications.

Figure 27Figure. is a scatterplot of the the average annual change from 1988 to 2001 in observed and expected publications within institutions as measured using fractional counts in the expanding journal set. There is considerable scatter in this plot; the correlation of the average annual changes of the expected and observed S&E publications is only 0.56. In addition, almost all of the plotted points lie below the 45 degree line, indicating that for most institutions increases in resources are associated with less than expected increases in publication counts. The average annual change in observed publications is 3.7 (i.e., 3.7 additional publication counts per unit of inputs per year) and the average annual change for expected publications is 17.6. The slope of the best fit line is 0.345, so that on average changes in resource use within institutions result in observed changes in fractional count publications that are only 35% of the expected change in fractional count publications.

Figure 28Figure. is a scatterplot of the average annual change of observed and expected publications within institutions as measured by whole counts in the expanding journal set. The fit of a line through the plotted points is substantially improved relative to fractional counts; the correlation of the average annual changes of the expected and observed S&E publication counts is 0.824. However, there is still some bias towards expected increases in publication counts overestimating observed increases. The average annual change for observed publications is 19.8 (i.e., 19.8 additional publication counts per unit of inputs per year) and the average annual change for expected publications is 27.3. The slope of the best fit line is 0.67, so that on average changes in resource use within institutions result in 67% of the expected change in publications as measured by whole count publications. Since, from a previous analysis, we found that the ratio of expected counts to observed counts averaged over all institutions only decreases by about 1% per year, the vast majority of the difference between the slope of 0.67 in this figure and the optimal slope of 1.0 may be attributable to institutions being marginally less efficient in increasing publication counts when their resources increase.

10.2 A Model for Both Between and Within Institution Variability

To further examine the issue of the changes in resources within institutions resulting in less change in observed publications than expected, we fit both a non-hierarchical and hierarchical linear model (HLM) to the data. Each model included an independent variable equal to the average institutional funding across years for each institution and an independent variable equal to the deviation of each year's funding from the average funding for that institution. Similarly, each model included "between" and "within" versions of independent variables for S&E doctoral recipient counts and number of S&E postdoctorates. This parameterization of the model allowed for separate between-institution and within-institution coefficients for resources.

Table 3Excel table. compares results obtained using: 1) a non-HLM model on whole counts in the expanding journal set,[39] 2) an HLM model on whole counts in the expanding journal set,[40] 3) a non-HLM model for fractional counts in the expanding journal set,[41] and 4) an HLM model for fractional counts in the expanding journal set.[42] In one of the datasets the outcome variable was the number of publications as measured by fractional counts; in the other dataset the outcome variable was the number of publications as measured by whole counts. The explanatory (independent) variables are the within-institution average across years for total academic R&D expenditures, number of S&E postdoctorates, and number of S&E doctoral recipients, and the difference between the yearly values for total academic R&D expenditures, number of S&E postdoctorates, and number of S&E doctoral recipients and the within-institution average values. In addition, in the HLM model there is a random effects term for each institution (which approximately corresponds to separate intercepts for each institution). As seen in table 3Excel table., the coefficients are almost identical whether the HLM or non-HLM models are used. The r-square values for the two non-HLM regressions are 0.941 and 0.938; for the HLM models they are both 0.994. All regression coefficients are statistically significant at the 0.001 level in both models. Standard errors for averaged variables tend to be about three times larger in the HLM than non-HLM models, and standard errors for within-institution variables tend to be about three times smaller.

The model results confirm that changes in funding, the number of S&E postdoctorates or S&E doctoral recipients within institution are likely to result in smaller changes to publications than would be estimated from the association of these variables to the institutional average number of publications across years. Using the results from the HLM model, we find for academic R&D expenditures that differences of $1M in average expenditures between institutions are associated with a difference of 5.01 expected whole counts, but a change of $1M of expenditures within an institution from year to year is associated with an expected change of 3.68 whole counts—so that increases within-institution in funding only yield 74% of the benefit that would have been expected examining between institution associations of funding and publications. Similar ratios for S&E postdoctorates and S&E doctoral recipients are 38% and 64%, respectively. For publications as measured by fractional counts the ratios of between-institution and within-institution changes in these three independent variables are 12%, 29%, and 49%, respectively.

The results in table 3Excel table. can be used to calculate the effect of an increase in resources on expected publication output. The mean values across all institutions for average academic R&D funding, number of S&E postdoctorates and number of S&E doctoral recipients are $81.0M, 170.1, and 120.8, respectively. The mean value for the change measures are zero. Suppose that institution A has those resources available to it in a given year. Substituting these values into the HLM regression model, and accounting for the intercepts, we would estimate that institution A will generate 956 publications as measured by whole counts and 632 publications as measured by fractional counts. If institution B has available 10% more resources than institution A, the model estimates that institution B will generate 1,041.8 publications as measured by whole counts and 689.2 publications as measured by fractional counts. This is an increase of 9.0% and 9.1%, respectively. Thus two different institutions, one at the mean value and one 10% above it, are expected to differ by about 9% in publication counts. If institution A's academic R&D funding, S&E postdoctorates, and S&E doctoral recipients are each increased by 10%, the model estimates that institution A will now generate 1,007 publications as measured by whole counts and 645 publications as measured by fractional counts. These are increases of 5.3% and 2.1%, respectively, which are substantially less than the 9% difference between institutions A and B at their original resource levels. It is not clear why the increases in whole and fractional counts are not equal. We would expect that increases in funding would result in equal increases in the number of whole and fractional counts, unless increased funding is differentially allocated to research that results in publications with fewer or more institutional authors than is typical for that institution, in which case these two publication measures would increase at different rates. Perhaps the reasons for the differential rate of increase in these two publication counts is that increases in resources result in greater amounts of collaborative research resulting in fewer additional fractional counts per publication.

The finding that changes in academic R&D expenditures, number of S&E postdoctorates, and number of S&E doctoral recipients are associated with smaller changes in publications within institution than between institution suggests that these variables may be surrogates for other important differences between the institutions that do not change in response to relatively small changes in these three resource inputs. These might include differences between institutions on their degree of focus on research, the degree to which their staff are publications-oriented, etc.

10.3 A Model for Within Institution Variability

The analyses in table 3Excel table. attempt to simultaneously model the association of academic R&D expenditures, number of S&E postdoctorates, and number of S&E Ph.D. recipients along with the changes in these independent variables with between- and within-institution variation in publication counts. However, since 98.9% and 97.5% of the variability of publication counts as measured by fractional and whole counts, respectively, in the expanding journal set are associated with between-institution variability, we felt that unless we modeled the within-institution changes in publications directly, we would not be able to assess the extent to which we were able to explain within-institutional changes in publications. Consequently we conducted regression analyses where the outcome variable was the change in publication counts and the independent variables were the change in explanatory variables, where change was calculated relative to the mean value within institution. Table 4Excel table. shows the model coefficients and the associated r-square values for regressions on within-institution change in publication whole counts[43] and fractional counts.[44]  Since the HLM and non-HLM results were essentially identical and only 2% of variability existed between institutions, we present only the non-HLM results below. All coefficients are statistically significant at the 0.001 level using both HLM and non-HLM models.

The r-square for publications as measured by whole counts is reasonable high at 0.65, so that approximately two-thirds of the variability within institution is accounted for. The intercept is non-zero as the result of missing values and lagging of the independent variables. Federally-financed R&D has almost three times as large an effect on increasing publications than non-federally-financed academic R&D expenditures (5.0 versus 1.7 per $1M). The number of S&E Ph.D. recipients increases whole count publications 0.81. In addition, not all types of S&E postdoctorates increase publication counts equally. Increasing S&E postdoctorates without M.D.s increases whole count publications by 0.74; increasing S&E postdoctorates with M.D.s supported by federal traineeships (which primarily train physicians for clinical practice) decreases publications by 3.26 (perhaps via funneling of faculty and other resources from research-related activities to clinical activities), and increasing the number of S&E postdoctorates supported by federal research grants does not affect whole count publications. As before, increases in S&E doctoral recipients are associated with increases in whole count publications. The number of graduate students and postdoctorates supported by federal research grants was not statistically significant. The coefficients in this model are reasonably similar to those in Table 3Excel table. (after adjusting for the fact that 58% of academic R&D expenditures are federally financed).

The r-square for publications as measured by fractional counts is considerably lower at 0.28. In addition, non-federally financed academic R&D expenditures, the number of S&E Ph.D. recipients and postdoctorates without M.D.s did not contribute at least 1% to r-square and so were not included in the regression. The regression shows that increases in the number S&E postdoctorates supported by federal research grants and the number of graduate students increases publication counts, but increases in the number of S&E postdoctorates with M.D.s supported by federal traineeships decreases publication counts. We also examined whether we could substitute the same three explanatory change variables as in table 3Excel table. in this regression. These yielded coefficients that were almost identical to those in table 3Excel table., but the r-square using those variables is considerably lower, at 0.23.

10.4 Scatterplots of Average Annual Percent Change in Expected and Observed Publication Counts Using the HLM Model

We also calculated the average annual percent change for both observed and expected publication counts using the model in table 3Excel table. for the expected publication counts. The annual percent change for the observed publication counts for an institution is defined as the average yearly change in observed publication counts for that institution over time divided by the average observed publication count for that institution over time. That is, the annual percent change for observed counts is the percentage change per year in the observed counts from the least squares regression line where the dependent variable is the observed count and the independent variable is year. The average annual percent change for the expected counts is similarly defined.

Figure 29Figure. is a scatterplot of the difference in the average annual percent change for observed and expected counts (i.e., the average annual percent change for observed publication counts minus the average annual percent change for expected publication counts) versus the average annual observed publications as measured by fractional counts in the expanding journal set. The average difference in the average annual percent observed and expected changes is –2.2% (i.e., expected increases in publication counts are about 2.2% greater per year than observed increases). Institutions with fewer publications show greater variability in the difference between the average annual percent change in expected and observed publications. The standard deviation of differences is 3.0% for institutions with fewer than 500 publications as measured by fractional counts per year and 1.3% for institutions with 500 or more publications.

Figure 30Figure. is a scatterplot of the difference in the average annual percent change for observed and expected counts versus the average annual number of observed publications as measured by whole counts in the expanding journal set. The average difference in the average annual percent change for expected and observed publications is -0.7% (i.e., expected increases in publication counts are about 0.7% greater per year than observed increases). Institutions with fewer publications show greater variability in the difference between the average annual percent change for expected and observed publications. The standard deviation of differences is 3.0% for institutions with fewer than 700 publications as measured by whole counts and 1.5% for institutions with 700 or more publications.


Top of page. Back to Top



Footnotes

[38] The expected publication counts were obtained from a model with three independent variables--academic R&D expenditures, number of S&E postdoctorates, and number of S&E Ph.D. recipients.

[39] See Exhibit I-1 for regression output.

[40] See Exhibit I–2 for regression output.

[41] See Exhibit I-3 for regression output.

[42] See Exhibit I-4 for regression output.

[43] See Exhibits I-5 and I-6 for regression output.

[44] See Exhibits I-7 and I-8 for regression output.


 
U.S. Academic Scientific Publishing
Working Paper | SRS 11-201 | November 2010