Thought Exercises on Accountability and Performance Measures at the National Heart, Lung, and Blood Institute (NHLBI): An Invited Commentary for Circulation Research
Recently, popular magazines and newspapers broadcast skeptical headlines, such as, “Desperately Seeking Cures”1; “A Decade Later, Genetic Map Yields Few Cures”2; “Faltering Cancer Trials”3; and “Grant System Leads Cancer Researchers to Play It Safe.”4 Some patients, physicians, advocacy groups, journalists, scholars, and policymakers openly wonder about the value of billions of dollars in government-supported research. During the past 10 years, the NIH budget has doubled, yet cardiovascular disease, while decreasing in incidence and severity, is still the leading cause of death and cancer incidence and death rates have declined little. Along with “Where Are the Cures?” critics ask “Who Is Accountable?”5
Interest in Measuring the Performance of Science
There is a rapidly growing body of scholarship on the metrics of science, methods by which scientists, scientific organizations, policymakers, and funding agencies can gauge the value and impact of their work and investments.6 To assess the impact of the 2009 American Recovery and Reinvestment Act (ARRA), the National Science Foundation, the National Institutes of Health, and the White House Executive Office are engaged in an ambitious “STAR METRICS” project.7 Metrics often focus on publications and citations (“bibliometrics”) but also consider commercial products and quantitative impacts on practice or intellectual thought. Although scientists are typically eager to publish their work and see it cited, many worry about seeing their value to society, employers, and funders graded by black-box numbers.8 Some professional groups have harshly criticized bibliometrics, noting their inherent flaws and the risk that the act of measuring science will damage innovation, which is at the heart of the scientific enterprise.9
At NHLBI, we increasingly recognize that we need to be held accountable for our performance, which means that it is incumbent on us to assess the performance of the scientists and projects we choose to support. It is a genuine challenge, however, for any nonprofit organization to assess its performance, because we cannot look to readily available metrics, such as revenues, profits, and stock prices. As in any business, it is important to align investments with expected returns. U.S. taxpayers are investing more than $30 billion a year in the NIH and expect more concrete measures than advancement in biomedical knowledge and a fiscal accounting of funds allocated to research. Many scientists whose career trajectories are largely dependent on securing NIH funding may be frustrated by critics who hold them responsible for high population burdens of disease, along with the inability of a fragmented healthcare system to exploit prior research discoveries. How can we manage expectations and encourage investments appropriate to desired outcomes?
A Thought Exercise About Unemployment and Jobs Training
Imagine that instead of funding research in cardiovascular biology and disease, we at NHLBI were asked to establish and run a jobs-training program in an impoverished city, a city that has seen its manufacturing base disappear and has not yet realized the benefits of the new service, technology-oriented economy. Our ultimate “end-product” would be a city bustling with full-employment, with high levels of personal income and socioeconomic well-being. As a community, we'll need to bring together numerous partners and stakeholders, including legislators, government executives and agencies, employers, labor unions, and schools and universities. We'll need to develop a multiple-pronged strategy to deal with short- and long-term factors that contribute to unemployment; one part of that strategy may be to establish a jobs-training program, the program for which we now find ourselves responsible.
Should our jobs-training program be held solely responsible for the city's unemployment rate? Somehow, that does not seem right, as there are many factors contributing to unemployment that go way beyond what we can possibly address with the resources available to us. However, there are performance measures we could “accept” as reasonable reflections of the quality of our work. These might include the number of unemployed workers we train and the number of courses we offer. But these measures are descriptive only of the resources we have put in place and do not tell us whether we provided the right training to the right people. We could look for other measures of impact, like how many (and what proportion) of our trainees secured employment, how many (and what proportion) secured high-paying jobs, and how many employers choose to come to us to find future employees. If we focus on these measures, we can demonstrate our value to the community, and equally important, we can make informed decisions about changes we can make that will improve our efficiency and effectiveness.
We have just applied an approach called “Results-Based Accountability (RBA),” which is “a disciplined way of thinking and taking action that can be used to improve the quality of life in communities, cities, counties, states, and nations. Results-based accountability can also be used to improve the performance of programs, agencies, and service systems.”10 During the past few years, a number of NIH leaders have learned about RBA as part of a formal NIH senior leadership program run in conjunction with the University of Maryland School of Public Policy. I summarize how we applied RBA to unemployment and job training in Table 1.
In RBA, we start by thinking about population ends, in this case an economically successful city operating at full employment. We then move to the role of our jobs-training program and come to realize that we should focus on means, in this case training unemployed workers and enabling them to secure new jobs. We identify measureable and meaningful performance metrics, such as the proportion of our trainees who secure jobs. RBA is a helpful construct because it forces us to separate population ends from program performance.
In Table 1 and Figure 1, I summarize a similar thought exercise about how we, a Federal funding agency that supports biomedical researchers, can use the RBA construct to evaluate our performance. We proceed through a series of steps as follows.
1) Identify our customers and articulate the services we provide (eg, provide funds to applicants who submit highly meritorious proposals).
2) Identify a relatively small number of “headline” performance metrics. Some metrics focus on process (eg, accruing patients to trials, completing application reviews on time, answering queries from applicants and grantees), whereas others focus on impact (eg, publications, citations, ancillary studies, new commercial products, changes in clinical practice).
3) For each metric, plot changes over time and forecast what is likely to happen if we make no changes in our processes.
4) “Tell the story” behind the metric. We ask a series of questions to identify factors that have improved or worsened performance.
5) Based on our story, identify partners and possible strategies to “turn the curve,” that is, attempt to change adverse trends.
Performance Metrics for Biomedical Research
To identify what performance metrics might be useful, we can do another thought experiment. Suppose we were an outright failure, that is, we funded universities and researchers who did absolutely nothing scientific with our money. How would we know? We would find no publications that could be linked to our grants. With no publications, there would be no citations. We might also find no evidence of new products being made ready for clinical testing and eventual commercialization (ie, no patents). We would see no trials going anywhere near completion, resulting in no publications, citations, or guidelines citing our research for recommendations on changes in practice.
This worst-case scenario is easy for all of us to assess. In the real world, it is more difficult to assess whether a given level of productivity is appropriate in quantity (numbers of publications or inventions), quality (impact on the field of science or practice of medicine), and timeliness (how long it takes for work to “pay off”). During the past few years, there has been extraordinary interest in bibliometrics, science performance metrics based on publications and citations. I summarize some commonly used measures in Table 2.6 Some scholars have suggested that bibliometrics are powerful predictors of future scientific success and recognition11 and that they can be used to assess teamwork in science.12 Others express “deep concern” about the misapplication of metrics for purposes beyond which they were created13 and about the misuse of citation streams to support false beliefs.14 Remarkably, an almost identical conversation is taking place about education, in which there is an increasing emphasis on accountability of schools at all levels and concern that the items that can be measured may not reflect value and actually provide an incentive for schools, including universities, to reward the activities that improve test performance at the expense of the development of critical thinking and creativity in students.15
An Incomplete Example
Using an internal NIH-based search tool, I used the keyword “myocard*” to identify 1267 R01 grants that were awarded between 1990 and 2009 through the NHLBI cardiovascular research division and that were classified as “nonhuman.” To date, these grants and their successful competitive renewals account for $2.2 billion in funding. They have led to at least 15 656 publications that have garnered 448 830 citations (approximately 29 citations per publication). The average cost per publication is $143 276, and the average cost per citation is $4997. These figures compare favorably to national estimates of cost per publication for all academic-based research, which has increased from $186 567 in 1998 to $308 641 in 2008.16
Figure 2 shows the behavior of selected process and impact metrics over time. There has been a marked increase in the number of projects funded and total funding (even after accounting for inflation17), coincident in part with the NIH doubling in the late 1990s and early 2000s. To allow for fair comparisons, the output metrics related to publications and citations are shown for each project from its start until 5 years later. The number of publications increased commensurate with the increase in funding, whereas the cost per publication remained remarkably constant. However, the number of citations has not kept pace, with a decrease in the number of citations per publication and an increase in cost per citation.
Our next step, which is critical, is to ask a series of questions that will reveal the story behind these data. This is an exercise that is beyond the scope of this article. We might start by asking why the citation rate is going down. Is it because the field is losing luster? Or, perhaps, is it because we funded a number of relatively new investigators during the doubling, whose projects have not yet matured to the point at which they can garner many citations?
As stewards of taxpayer monies, we at NHLBI expect to be held accountable for our decisions and policies. No differently than scientists, hospital administrators, and for-profit company executives, we look to performance metrics as tools—nothing more or nothing less—to help understand the stories behind the effectiveness of our work and to help us make the best informed decisions. We look to colleagues in NIH, in the extramural scientific community, in medical editing, and in professional societies to work with us as we develop greater degrees of sophistication in program evaluation, ultimately to reach the goal of a society free of the ravages of cardiovascular disease.
The opinions expressed in this NHLBI Page are not necessarily those of the editors or of the American Heart Association.
- © 2011 American Heart Association, Inc.
- Carmichael M,
- Begley S
- Wade N
Editorial. Faltering cancer trials. New York Times. 2010; April 24. Available at http://www.nytimes.com/2010/04/25/opinion/25sun1.html?_r=1&scp=1&sq=Faltering%20Cancer%20Trials&st=cse. Accessed December 27, 2010.
- Kolata G
- Galani RJ
NSF. STAR METRICS: new way to measure the impact of federally funded research. 2010; June 1. Available at http://www.nsf.gov/news/news_summ.jsp?cntn_id=117042. Accessed December 27, 2010.
- Abbott A,
- Cyranoski D,
- Jones N,
- Maher B,
- Schiermeier Q,
- Van Noorden R
- Adler R,
- Ewing J,
- Taylor P
- Friedman M
- Hirsch JE
- Greenberg SA
- Rothstein J
National Science Foundation report on science and engineering indicators: 2010, 8– 94. Available at http://www.nsf.gov/statistics/seind10/pdf/c08.pdf. Accessed December 27, 2010.
Biomedical Research and Development Price Index (BRDPI): Fiscal Year 2009 update and projections for FY 2010-FY 2015. 2010; Feb 1. Available at http://officeofbudget.od.nih.gov/pdfs/FY11/BRDPI_Proj_Feb_2010.pdf. Accessed December 27, 2010.