U.S. Academic Scientific Publishing
9.0 Adequacy of Model Fit: Time Trends in Resource Utilization per Publication Count and Model Fit by Institutional Characteristic
This section examines the adequacy of the model fit for fractional and whole counts in an expanding data set. Section 9.1 examines the trend, over time, in the ratio of expected to observed publication counts. A clear linear trend over time is present for these ratios. For a given amount of resources (i.e., the average amount of academic R&D expenditures, S&E postdoctorates, and S&E doctoral recipients), the number of fractional publication counts produced has diminished by 29% from 1990 to 2001 and the number of whole publication counts has diminished by 10%. Possible reasons for these reductions are discussed.
Sections 9.2 to 9.7 examine the fit of expected to observed publications: 1) by institution (to examine whether there were differences in the model's fit across institutions); 2) for private Research I (R-1) institutions (to determine if the model fit could be improved by separately fitting these institutions); 3) for institutions with substantial patenting activity; 4) for institutions with higher collaboration index values, 5) for institutions with higher relative citation index values, and 6) for institutions with higher National Research Council (NRC) quality ratings. None of these investigations resulted in modifications to the model.
We found that the correlation between the expected and observed publication counts for individual institutions is very high with no obvious outliers. The fit to private R-1 institutions cannot be substantially improved by fitting a separate model. Even though the three institutions with the highest patenting level have smaller publication counts than expected, there is insufficient evidence to conclude that patenting activity has led to reductions in publication counts. Accounting for differences in the degree of collaboration among institutions did not substantially increase the explanatory ability of the model (suggesting that increases in collaboration have been fairly consistent across institutions). Although institutions with higher relative citation indices tended to have slightly more publications than expected, the evidence for including the relative citation index in the model is relatively weak. Finally, NRC quality ratings were only weakly associated with publication counts (which may be due to the presence of many different journals at varying quality levels). Cumulatively these findings suggest that once we account for total academic expenditures, the number of S&E postdoctorates, and the number of S&E Ph.D. recipients, other factors not captured by these three variables do not contribute further explanatory power for article production.
9.1 Trends in the Ratio of Expected and Observed Publication Counts
To examine the relationship of expected and observed publications, we modeled the number of publications as measured by fractional counts in the expanding journal set for the top 200 R&D performing academic institutions. We found that 3 variables were strongly associated with publication counts: 1) total academic R&D expenditures, 2) S&E postdoctorates, and 3) S&E Ph.D. recipients. However, to achieve near comparability with results by field groups (described later), we added four more variables to the model: 1) academic R&D expenditures funded by "other" sources (primarily foundations), 2) R&D basic research expenses (a component of total academic R&D expenses), 3) total S&E degrees awarded by field, and 4) 1994 Carnegie R-1 class.  Addition of these variables increased the r-square by slightly less than 0.01, indicating that they did not substantially increase the fit of the model.
We conclude that the relationship between resource inputs and publication outputs has changed over time. Figure 19 shows a plot of the ratio of the average yearly observed to expected publications as measured by fractional and whole counts. (The plot is missing the ratio for 2000 because S&E degrees awarded was not available for 1999; ratios for 1988 and 1989 are missing because the financial variables were lagged by 2 years and were only available starting in 1988). There was a clear linear trend in the ratio of average yearly observed to expected fractional publications over time, decreasing from 1.141 in 1990 to 0.887 in 2001, indicating that there were more observed than expected publications in earlier years and that there were fewer observed publications than expected publications in the later years. This trend suggests that it took a greater amount of resource inputs included in the model to produce a given number of fractional publication counts in 2001 than in 1990. The resources in the model associated with production of a fractional count of 1.0 in 1990 were associated with a fractional count of 0.777 (i.e., 0.887 / 1.141) in 2001, or equivalently that the resources associated with a production of one fractional count in 2001 were associated with a fractional count of 1.29 (i.e., 1.141/0.887) in 1990. This implies that the cost per each fractional count in 2001 was 29% more than in 1990.
We repeated this analysis for publications as measured by whole counts in the expanding journal set. Figure 19 shows a plot of the ratio of the average publication count in the expanding journal set to the average expected publication count by year. Also shown is the least squares line of best fit. Although the ratio of average yearly observed to expected whole count publications exhibited a decline over time, this trend did not fit as well compared to fractional counts. (Whereas the r-square value for the regression of year on the ratio of expected to observed fractional publication counts was 0.980, the r-square value for the regression of year on the ratio of observed to expected whole count publications was 0.862). More importantly, the range of ratios was narrower than for fractional counts. Using the values on the best fit line, observed publications were about 4.7% larger than expected in 1990 and 4.8% smaller in 2001. Equivalently, the resources associated with production of a whole count of 1.0 in 1990 were associated with a whole count of 0.909 (i.e., 0.952/1.047) in 2001. This implies that cost per whole count increased by 10% from 1990 to 2001.
The effect of increasing collaboration of institutions is to increase the cost per fractional count more rapidly than the cost per whole count (in approximately the same ratio as discussed in the preceding paragraph. A hypothetical illustration of this point might be useful. Assume for simplification sake that the only resource required for publication is academic R&D funding. Suppose that in 1990 there are two institutional authors per paper, the cost of doing the research for that paper is $100,000, and that those costs are split equally between institutions.
In this example, the cost per publication count in 1990 as measured by whole counts is $50,000 (i.e., each institution receives one whole count and spends $50,000). The cost per publication count as measured by fractional counts is $100,000 (i.e., each institution receives one-half of a fractional count and spends $50,000). Further suppose that in 2001 there are on average 2.25 institutional authors per paper, costs are split equally between institutions' contributing authors, and that the total cost for the publication has risen to $122,625. Then the cost per publication count as measured by whole counts is $54,500 (i.e., each institution receives one whole count and spends on average $54,500, calculated as $122,625 divided by 2.25). The cost per publication count as measured by fractional counts is $122,625 (i.e., each institution receives on average 0.444 fractional counts at a cost of $54,500). Therefore, in this example the cost per publication as measured by fractional counts has increased from $100,000 to $122,625 (an increase of 22.6%) and the cost per publication as measured by whole counts has increased from $50,000 to $54,500 (an increase of 9%).
The fractional count data implies that the efficiency of article production (i.e., the relationship between resource inputs and outputs) decreased by about 29% from 1990 to 2001. However, we cannot necessarily attribute those increased costs to collaboration. There are a variety of possible explanations for this reduction. Research leading to publications may have become more complex; costs for faculty and S&E postdoctorates (including tuition offsets) may have increased faster than the GDP implicit price deflator; larger research teams may have increase costs associated with collaboration; real costs for materials and equipment may have increased faster than the GDP implicit price deflator; there may have been a shift towards publications in fields with greater costs per publication (such as medical sciences); journal submissions may be moving towards more comprehensive articles, etc. As noted earlier, another reason might be that the trend towards producing more papers per faculty member requires greater resources per paper (i.e., a reduction in the marginal efficiency of production). Another possible factor is marginal research productivity in biomedical research may have fallen because universities were unable to increase research productivity rapidly in response to large increases of federal funding over the last decade. Possible factors are that universities used some of the additional resources to purchase new equipment and facilities, hired new faculty or postdoctorates that required training or time to become productive, or pursued research in new areas or expanded existing high risk research.
The decline in the ratio of the observed to expected publications as measured by fractional counts is substantially greater than for whole counts. The difference between the 29% decrease in the ratio of observed to expected publications in 2001 compared to 1990 as measured by fractional counts and the 10% decrease in the ratio of whole counts reflects the increase in collaboration. If collaboration had not changed, we would have expected the ratio of observed to expected publications as measured by whole counts to decrease by 29% also. However, collaboration did increase, and this increase leads to more institutions getting whole credit for the same journal article. Assuming that institutions equally split the cost for each article, then they are receiving the same amount of whole credit at a reduced cost, which boosts apparent efficiency. This apparent boost attenuates the 29% ratio reduction seen with fractional counts by 19 percentage points, resulting in the reduction of the ratio to 10% seen with whole counts.
9.2 Scatterplot of Expected and Observed Publication Counts by Institution
To examine the fit of expected to observed publication counts, we calculated the average observed and expected publication counts across years for each institution, using only those years for which expected publication counts were available from the regression model described in section 9.1.
Figure 20 is a scatterplot of observed and expected publications as measured by fractional counts and the expanding journal set. Generally the fit is quite good, with a correlation coefficient of 0.971. The model tends to overestimate slightly for institutions with fewer than about 200 observed publications.
Figure 21 is a scatterplot of observed and expected publications as measured by whole counts and the expanding journal set. The fit is very similar to that for publications as measured by fractional counts, and the correlation coefficient is 0.975.
9.3 Adequacy of Fit to Private R-1 Institutions
SRI generated a listing of institutions, their characteristics, the observed and expected values for publications as measured by whole and fractional counts for each institution averaged over 1988 to 2001, the slopes obtained by regressing year on observed publication counts separately for each institution, and the slopes obtained by regressing year on expected publication counts separately for each institution. These lists were sorted in various ways (i.e., by the difference in observed and expected publications, the difference in slopes, the ratio of the slopes, etc.) to identify any relationship between the adequacy of fit of expected and observed values and institutional characteristics. It appears that for the Private R-1 institutions the difference between the slopes of observed and expected publications is slightly larger than for other institutions. To determine if the fit could be improved, we generated a model specifically for the 30 Private R-1 institutions and examined whether the fit had improved.
Figure 22 is a display for private R-1 institutions displaying the slopes of two regressions. One was obtained by regressing year on observed publication counts plotted against the slopes obtained by regressing year on expected publications, where publications were measured by whole counts in the expanding journal set, and the expected publications are obtained from the regression of resources on publication counts using all institutions. The second one obtains expected publications from a regression on publication counts using only private R-1 academic institutions. There is very little difference visually between the goodness of fit for these two plots. We conclude that adjusting the model for R-1 institutions would not substantially increase model fit.
9.4 Adequacy of Fit for Institutions with Substantial Patenting Activity
We analyzed the data to assess whether the reduction in publications in recent years might be attributable to greater amounts of patenting, and a reluctance of researchers who are involved in patenting activities to publish their findings. The Publications Trends database contains total patenting activity from 1988 to 2001 (i.e., number of patents granted). This variable did not enter regression equations for fractional publication counts in the expanding journal setbecause adding this variable into the regression increased r-square by less than 0.01. (The incremental r-square was 0.0001 and the p-value was 0.053). However, we were concerned that the low r-square might be attributable to there being few institutions with large amounts of patenting activity. (Half of the institutions had 38 or fewer patents, 75% had 147 or fewer patents, and the three institutions with the largest number of patents had an average of 1,170 patents). Figure 23 shows a scatterplot of the difference between observed and expected publications as measured by whole counts in the expanding journal set (by institution, averaged over the years from 1988 to 2001 when expected publications could be calculated) versuspatenting activity from 1988 to 2001. For institutions with less than 250 patents, there is no discernable bias in the relationship between expected and observed publications; for institutions with 250 to 750 patents, the general tendency is for observed publications to exceed expected publications; for the top 3 institutions (with patenting activity of 900 or greater) observed publications are less than expected publications. From this graph we conclude that while it is possible that an extremely large amount of patenting leads to reduced publications, there is insufficient evidence in the Publications Trends database to demonstrate that patenting activity has led to reductions in publication counts. However, since our patent counts were at an institution level, it is possible that a more refined analysis that could allocate patents to fields might find an effect. In addition, given the backlog of patent applications at the patenting office, different results might have been observed using patent applications rather than number of patents granted.
9.5 Adequacy of Fit for Institutions with Higher Collaboration Indices
As described earlier, there has been a tendency for increased collaboration over time as measured by the increasing ratio of whole to fractional counts. We examined whether institution specific estimates of the amount of collaboration could improve the fit of the model for publications as measured by fractional counts in the expanding journal set. The measure of collaboration used was the ratio of whole to fractional count publications in the expanding journal set (which we denote as the "W/F ratio"), calculated separately for each institution and year. Three new independent variables were calculated as the product of the W/F ratio times total academic R&D expenditures, S&E postdoctorates, and number of S&E Ph.D. recipients. The original three (unmodified) variables explain 91.9% of the variation in fractional publications counts; the three W/F-modified variables explain 93.0%. Thus, adjusting for collaboration results in a slight increase in the explanatory ability of the model.
Figure 24 is a scatterplot of difference between observed and expected S&E publications as measured by whole counts in the expanding journal set versus the ratio of whole to fractional publication counts. Each point represents a single institution over the period from 1988 to 2001. Figure 25 shows a scatterplot of the difference between observed and expected S&E publications as measured by fractional counts in the expanding journal set versus the ratio of whole to fractional publication counts. Generally there is no trend indicating over- or under-estimation as a function of the amount of collaboration and the variability in the difference between the observed and expected publications are comparable over the range of collaboration exemplified by the vast majority of the institutions. There may be a small tendency for a better fit for those few institutions with very high levels of collaboration (i.e., an index of 1.7 or greater).
9.6 Adequacy of Fit for Institutions with Higher Relative Citation Indices
We sought to determine whether the number of publications at an institution is related to the quality of its publications. At the institutional level, one possible measure of quality of publications is the citation to publication ratio (i.e., the relative citation index). We calculated the ratio of all citations from 1992 to 2001 to all publications from 1988 to 1999 as a simplified measure of this index. Figure 26 shows a scatterplot of the difference between observed and expected publications (using whole counts in the expanding journal set) by institution on the vertical axis and the relative citation index on the horizontal axis. There is a modest tendency for institutions with higher relative citation indices to produce more publications than could be explained by resource inputs alone. When the relative citation index is less than 10.0, the average number of publications is 400 and the average number of expected publications is 465, for a net overestimation of 65 or 16%. When the relative citation index is 10 or greater, the average number of publications is 1,345 and the average number of expected publications is 1,307, for a net underestimation of 38 or 3%. The correlation between the difference in observed and expected publication counts and the relative citation index is small (0.12); the correlation between the ratio of observed to expected publication counts and the relative citation index is 0.35.
To examine whether the relative citation index could improve the regression equation, we defined the relative citation index for a given year as the ratio of number of citations that occur two years in the future to the count of publications occurring in the three year period ending in the given year. For example, we defined the relative citation index for 1992 to be the number of citations in 1994 using whole counts in the expanding journal set to all publications in 1990, 1991, and 1992. We added the relative citation index to a regression model with total academic R&D expenditures, number of S&E postdoctorates, and number of Ph.D. recipients, along with the interaction of that index with the other three independent variables. The r-square for explaining whole publication counts in the expanding journal set increased from 0.934to 0.941. When we used the average relative citation index across all years (to reduce variability in this variable, thus modeling the longer term average relative citation index for each institution), the r-square increased from 0.934 to 0.942. Inclusion of the relative citation index and all interactions reduced the root mean square error from 244 to 226. We conclude that including the relative citation index in the model results in an improvement (which did not meet our customary threshold of a 0.01 increase in r-square).
9.7 Adequacy of Fit for Fields with Higher NRC Quality Ratings
The only variable in the Publication Trends database that directly measures quality is the National Research Council's (NRC's) scholarly quality rating (SQR). NRC raters were asked to rate 41 programs at various institutions for "scholarly quality of program faculty" using a 0 to 5 scale with 0 signifying "not sufficient for doctoral education" and 5 signifying "distinguished." From these responses, the committee calculated a mean rating for each program appearing in the study. For some disciplines the program ratings are provided at a more detailed discipline level than WebCASPAR Academic Disciplines, therefore, it was necessary to compute a weighted average of the program ratings for these disciplines; the institution's number of graduates in the program was used to weight the ranking data. These ratings are available at the field level, rather than an institutional level. Ratings are only available in 1993; these same ratings were applied for every year.
For this analysis, regression was performed at the field-year level for all fields with an NRC rating (yielding a total of over 22,000 observations in the regressions). The dependent variable was whole publication counts in the expanding journal set. Explanatory variables were total academic R&D expenditures, number of postdoctoral researchers, number of S&E Ph.D. recipients, NRC SQR, and the interactions of NRC SQR and the other three explanatory variables. With only the first three explanatory variables, the r-square for expected publications as measured by whole counts in the expanding journal set was 0.830; addition of NRC SQR and all of the interaction terms only increased the r-square minimally to 0.834. When the dependent variable was publications as measured by fractional counts in the expanding journal set, the first three explanatory variables were associated with an r-square of 0.829; addition of NRC SQR and all of the interaction terms increased the r-square to 0.838. This is a very slight improvement. We originally conjectured that the NRC Quality Rating might not substantially improve the fit because the quality might be highly correlated with total academic R&D expenditures, number of S&E postdoctorates, and number of S&E Ph.D. recipients; however, the correlation of these three variables with the NRC SQR are only 0.33, 0.23, and 0.45, respectively. In addition, the correlations of the NRC SQR with publications as measured by whole and fractional counts are only 0.296 and 0.299, respectively. This leads us to conclude that the NRC SQR ratings are only weakly associated with publication counts, and whatever improvement in the model's fit is subsumed by the other aforementioned explanatory variables.
 However, as discussed in the appendix on relative citation counts, the NRC rating is highly associated with the ratio of citation counts to publication counts. The latter may be a measure of the influence or quality of the publication.
 These seven variables were those that provided a satisfactory fit for all of the field groups when the dependent variable was the first principal component of publications and citations. Only six of these variables were required when the dependent variable was either publications as measured by fractional or whole counts in the expanding journal set.
 See Exhibit H-1 for regression output.
 A simple example using only financial resources will clarify the calculations. Suppose that it costs $100K in resources to produce one fractional count in 1990 and $150K in 2001, and that the model estimates $125K. Then each $125K will produce 1.25 fractional counts in 1990 and 0.833 fractional counts in 2001, yielding ratios of observed to expected counts of 1.25 and 0.833, respectively. The ratio 1.25/0.833 = 1.5 indicates that the cost per fractional count increased by 50% from 1990 to 2001; the ratio 0.833/1.25 = 0.667 indicates that the cost per fractional count in 1990 was 33.3% lower in 1990 than 2001.
 See Exhibit H–2 for regression output.
 The change in the ratio of resource inputs to fractional count publication outputs is dependent on the particular resource being considered. From 1988 to 2001, the number of publications as measured by fractional counts increased by 17%; academic R&D funding (GDP deflated) increased by 81%; postdoctorates increased by 66%; non-faculty doctoral research staff increased by 52%; S&E Ph.D. recipients increased by 23%; and total faculty increased by about 6% (although S&E faculty may have increased by substantially more than this amount). The 29% value is based on the difference between the expected resource inputs per publication in 2001 compared to 1990. Resource inputs per publication are derived from a linear combination of academic R&D funding, postdoctorate counts, and S&E Ph.D. recipients.
 For each institution, the slope of the regression of year on observed publication counts is a measure of the average annual change in the number of publications (i.e., the average annual change in output). The slope of the regression of year on the expected publications is a measure of the expected average annual change in publications based on resource inputs if the private R-1 institutions have the same input/output relationship as obtained from a regression using all institutions. A scatterplot of these two slopes is a visual representation of how well the model based on all institutions performs in estimating the observed average annual change.
 See Exhibit H-3 for regression output.
 See Exhibit H-4 for regression output.
 See Exhibit H-5 for regression output.
 See Exhibit H-6 for regression output.
 See Exhibit H-7 for regression output.
 The r-square value using the same three independent variables differs slightly in these two regressions because of differences in the years for which the relative citation index and average relative citation index were available.
 See Exhibit H-9 for regression output.
 See Exhibit H-10 for regression output.
 See Exhibit H-11 for regression output.
 See Exhibit H-12 for regression output.