Prior Publication Productivity, Grant Percentile Ranking, and Topic-Normalized Citation Impact of NHLBI Cardiovascular R01 Grants
Rationale: We previously demonstrated absence of association between peer-review–derived percentile ranking and raw citation impact in a large cohort of National Heart, Lung, and Blood Institute cardiovascular R01 grants, but we did not consider pregrant investigator publication productivity. We also did not normalize citation counts for scientific field, type of article, and year of publication.
Objective: To determine whether measures of investigator prior productivity predict a grant’s subsequent scientific impact as measured by normalized citation metrics.
Methods and Results: We identified 1492 investigator-initiated de novo National Heart, Lung, and Blood Institute R01 grant applications funded between 2001 and 2008 and linked the publications from these grants to their InCites (Thompson Reuters) citation record. InCites provides a normalized citation count for each publication stratifying by year of publication, type of publication, and field of science. The coprimary end points for this analysis were the normalized citation impact per million dollars allocated and the number of publications per grant that has normalized citation rate in the top decile per million dollars allocated (top 10% articles). Prior productivity measures included the number of National Heart, Lung, and Blood Institute–supported publications each principal investigator published in the 5 years before grant review and the corresponding prior normalized citation impact score. After accounting for potential confounders, there was no association between peer-review percentile ranking and bibliometric end points (all adjusted P>0.5). However, prior productivity was predictive (P<0.0001).
Conclusions: Even after normalizing citation counts, we confirmed a lack of association between peer-review grant percentile ranking and grant citation impact. However, prior investigator publication productivity was predictive of grant-specific citation impact.
The current approach to selecting grants for funding has come under recent criticism for lacking an evidence base.1,2 Scientific peer review, which provides a percentile ranked score for grant applications, is the main determinant for funding decisions at the National Heart, Lung, and Blood Institute (NHLBI) and other Institutes at the National Institutes of Health (NIH). However, a systematic review assessing the peer-review process demonstrated a lack of studies evaluating the effect of peer review on the quality and scientific achievement of funded research.3 Identifying factors that predict scientific impact of grants may help inform a more empirical approach to funding decisions.
Our previous work demonstrated a lack of correlation between peer-review–derived grant percentile ranking and scientific impact, as measured by citation rates, in a large cohort of NHLBI-funded cardiovascular R01 grants.4 Our analysis was limited by failure to account for prior investigator publication productivity and by failure to normalize citation outputs for subject category, article type, and year of publication.
Building on prior data modeling, the goal of this analysis was to test the hypothesis that measures of investigator prior performance correlate with scientific impact as measured by normalized citation metrics.
We extended the methods from our previous work.4 We considered 1492 investigator-initiated R01 grants that met the following inclusion criteria: (1) award on or after January 1, 2001 and before September 1, 2008, (2) duration of funding of ≥2 years, (3) assignment to a cardiovascular unit within NHLBI, and (4) receipt of a percentile ranking based on a priority score given by a NIH peer-review study section.
We obtained grant-specific and investigator-specific award and funding data from an internal NHLBI Tracking and Budget system, which includes information on investigator status (early stage or established), grantee institution, peer-review study section, percentile ranking, involvement of human subjects, project start and end dates, and total funding (direct and indirect). We used the Scientific Publication Information Retrieval and Evaluation System (http://era.nih.gov/nih_and_grantor_agencies/other/spires.cfm) to map publications to specific grants. Because many publications were supported by >1 grant, we adjusted the counts for publications and citations by dividing by the number of cited grants, as previously described.
We linked publications to a Thomson-Reuters InCites database that included 260 000 NHLBI-supported articles published between January 1981 and December 2013. InCites stratifies publications based on year of publication, type of publication (eg, research, review, or perspective), and subject category.5 The database includes a publication percentile indicating how often the article was cited compared with articles in the same strata. The database also provides author-specific data. A publication percentile of 0 indicates a article with the greatest number of citations within the strata and a percentile of 100 indicates the lowest citation rate. We transformed the InCites publication percentile with the formula [(100−InCites percentile)/100] to give a normalized citation impact score per publication, where 1 has the highest citation impact within its strata and 0 has the lowest. The normalized impact score per grant is derived by adding the normalized impact scores for each of its publications.
The coprimary bibliometric end points for this analysis were the normalized citation impact score per million dollars allocated, and the number of top 10% publications per million dollars allocated (a top 10% publication has an InCites percentile of ≤10).
The predictors for this analysis were investigator prior productivity and grant peer-review percentile score. Measures of prior productivity included number of NHLBI-supported publications in the 5 years before the grant review (obtained from the InCites database), prior normalized citation impact, number and funding amount of NIH grants received before the index grant, and number of NIH review study sections served before the index grant. The prior normalized citation impact is the sum of the normalized citation impact scores for each of the NHLBI-supported articles published in the 5 years before the grant review (obtained from the InCites database).
For descriptive purposes, we present baseline measures of investigator prior productivity, grant characteristics, and bibliometric outcomes with numbers and percentages for categorical variables and median and interquartile range for continuous variables, stratified by prior publication tertiles (≤3, 4–10, and >10 publications) and stratified by prior citation impact tertiles (<2.2, 2.2–6.9, >6.9). Differences between tertiles were assessed with χ2 and nonparametric tests as appropriate. To describe the association of bibliometric outcomes with measures of prior productivity and percentile, we computed and plotted nonparametric locally weighted scatterplot smoothing estimates (lowess fits). Multivariable regression analyses were performed according to the methods of Harrell6 to determine adjusted linear and nonlinear associations with bibliometric outcomes. Independent variables included measures of prior productivity, percentile score, grant duration, calendar year of first award, study type (human subjects or not), new investigator status, mean number of grants acknowledged per article, and total institutional funding within the portfolio of all grants included in the study sample. Because number of prior publications and bibliometric measures have right-skewed distributions, we performed natural logarithmic transformations.
To further evaluate the independent association of prior productivity measures with bibliometric outcomes, we constructed Breiman random forests, which are machine learning–based constructs that allow for robust, unbiased assessment of complex associations. We assessed the relative variable importance based on a variable importance value that reflected gain of discrimination by adding a variable as well as by average minimal depth.7
Because prior number of publications was limited to NHLBI-supported publications in the original analysis, we repeated the analysis on a random sample of 100 grants, using all prior publications, regardless of funding support, in the 5-year period before the grant review. This effort required a more intensive manual name disambiguation effort. Publications were identified using Scopus. Prior normalized citation impact could not be determined for this subset because the InCites database included only NHLBI-supported publications.
Statistical analyses were conducted using R statistical software packages RMS, HMisc, and RandomForestSRC.
The 1492 grants yielded 19 260 publications through December 2013; of these, 5534 (29%) were top 10% articles. Tables 1 and 2 summarize grant and applicant characteristics and bibliometric outcomes stratified by number of prior publication counts and by prior normalized citation impact score, respectively. Measures of improved prior productivity, specifically increased numbers of prior NHLBI publications and higher prior normalized citation impact score, were significantly associated with a lower (better) percentile ranking (Tables 1 and 2).
After accounting for potential confounders, there was no association between peer-review percentile ranking and normalized citation impact score per million dollars allocated (adjusted P=0.53; Figure 1A, lowess fits without covariates) or number of top 10% articles per million dollars allocated (adjusted P=0.71; Figure 1C, lowess fits without covariates). Number of prior NHLBI-supported publications was predictive of citation impact score per million dollars allocated (adjusted P<0.0001; Figure 1A and 1B, lowess fits without covariates) and number of top 10% articles per million dollars allocated (adjusted P<0.0001; Figure 1C and 1D, lowess fits without covariates).
Prior normalized citation impact score was also predictive of citation impact score (of the grant) per million dollars allocated and the number of top 10% articles per million dollars allocated (adjusted P<0.0001 for both; Figure 2, lowess fits without covariates). There was no association of number and funding amount of prior NIH grants and number of NIH review study sections served on and the bibliometric end points.
In a machine-learning Breiman random forest model, which accounted for the same covariates in Table 1, the strongest predictor of citation impact score per million dollars and of number of top 10% articles per million dollars was average number of grants acknowledged per article. In both cases, the second strongest predictor was the number of prior NHLBI-supported publications; we found that more prior NHLBI-supported publications predicted higher grant-derived citation impact (Figure 3A). Breiman random forest models also demonstrated that prior normalized citation impact was the first or second most important predictor of bibliometric outcome, with a higher prior normalized citation impact score predicting higher grant-derived citation impact (Figure 3B).
A repeat analysis, on a random subset of 100 grants in which all publications (not just NHLBI-supported publications) were counted, confirmed our findings. Percentile ranking was not associated with the bibliometric end points (Figure 4A and 4C, lowess fits without covariates). Number of prior publications was predictive of the citation impact score per million dollars (adjusted P=0.03; Figure 4A and 4B, lowess fits without covariates) and number of top 10% articles per million dollars (adjusted P=0.005; Figure 4C and 4D, lowess fits without covariates). In Breiman random forest models, the number of prior publications was the strongest predictor of both bibliometric end points, with more prior publications predicting greater grant-derived citation impact (Figure 3C).
This extended analysis of previous work confirmed a lack of association between peer-review grant percentile ranking and grant citation impact, this time even after considering scientific field, article type, and year of publication. Also, we demonstrated that prior investigator publication productivity was predictive of grant-specific citation impact.
An important limitation of using citation rate as an end point for research impact is that number of citations is dependent on time from publication, type of article, and field of study.5 The Thomson-Reuters InCites database improves on absolute citation numbers by stratifying publications based on year of publication, type of article, and field of science and then normalizing citation rates within strata. Using this substantially different bibliometric end point, we still saw no significant association between percentile ranking and citation impact.
In our previous work, we acknowledged that some potential predictors were not evaluated, such as detailed preapplication metrics of principal investigators.4 In this updated analysis, we included measures of investigators’ prior productivity. Prior number of publications and prior normalized citation impact score were associated with citation impact. These findings were durable across several different types of analyses. We should note that prior normalized citation impact was determined from an InCites database created in 2014 and based on citation accumulation through 2013. Therefore, the prior citation impact scores in this analysis would be different from the scores calculated at the time of peer review (the grants in our data set were reviewed in the years 2000 through 2008).
Others have attempted to use measures of academic performance as predictors of future scientific impact at the individual investigator level. Number of published articles was 1 of 5 parameters used by Acuna et al8 to predict future scientific success as measured by the h-index. The other parameters included the h-index at the time of prediction, years since first publication, number of publications in prestigious journals, and the number of distinct journals.9 Mazloumian10 demonstrated that annual citations at the time of prediction was the best forecaster of future citations, with other citation indicators, including h-index and number of publications, improving predictive power only minimally. Our analysis identified a robust association between certain measures of investigator prior productivity and citation impact when viewed at a grant level. We specifically did not consider the h-index because of the well-recognized limitations of this metric.11
There are limitations to our analysis. The use of citations as an end point provides an incomplete picture of scientific impact. Admittedly, bibliometric end points do not adequately or fully measure scientific quality, which comprises multiple factors such as the scientific importance of the work, the rigor of the methods used, and the elegance or esthetic qualities of the research design and findings.11 However, citation indicators are generally considered to be a direct measure of the usefulness of the data within the publication5 and reasonably capture the impact of research as determined by trends in publications.11 They are also used widely and increasingly accepted.12 The Council of Canadian Academies recently evaluated indicators for assessing research quality as part of a broader appraisal of current funding strategies.11 They noted that citation-based indicators may be considered valid if the indicator meets the following criteria: it is field normalized, it is based on a sufficiently long citation window (typically 3–5 years), and a sufficiently large percentage of research output is captured within the data source. The InCites database used in this analysis meets these criteria.
Additional limitations include the fact that despite the extension of our previous analysis, there are additional confounders that we were unable to consider. Institutional environment, mentorship, and collaborators may also influence future scientific impact. Also, it is unclear whether the findings of this study are generalizable to disciplines other than cardiovascular research as peer-review emphasis and citation dynamics may be different in other fields.
The federal research enterprise has come under significant criticism for not knowing the best approach(es) for distributing its funding.2 Analyses such as this one may identify factors, such as number of prior publications or prior citation impact, that more accurately predict the potential for future scientific impact. The results of such analyses may inform the peer-review process improving its validity and effectiveness. Emphasizing rigorously determined predictors of scientific impact in current funding strategies or incorporating them into innovative approaches may help create a more evidence-based policy for research funding decisions.
All authors were full-time employees of the National Heart, Lung, and Blood Institute at the time they worked on this project.
The views expressed in this article are those of the authors and do not necessarily represent those of the National Heart, Lung, and Blood Institute, the National Institutes of Health, or the US Department of Health and Human Services.
- Nonstandard Abbreviations and Acronyms
- National Heart, Lung, and Blood Institute
- National Institutes of Health
- Received July 7, 2014.
- Revision received August 4, 2014.
- Accepted August 14, 2014.
- © 2014 American Heart Association, Inc.
- Langer JS
- Demicheli V,
- Di Pietrantonj C
- Danthi N,
- Wu CO,
- Shi P,
- Lauer M
- Bornmann L,
- Marx W
- Harrell FE
- 11.↵The Expert Panel on Science Performance and Research Funding. Informing Research Choices: Indicators and Judgment. Ottawa: Council of Canadian Academies; 2012.