Quantifying Scientific Merit
Is it Time to Transform the Impact Factor?
Thus ornament is but the guiled shore To a most dangerous sea; the beauteous scarf Veiling an Indian beauty; in a word, The seeming truth which cunning times put on To entrap the wisest.
—William Shakespeare, Merchant of Venice, Act III, Scene 2 1596–97
Use of impact factor (IF) to quantify scientific merit is severely flawed. Three changes are recommended to strengthen the assessment of journals that primarily report basic research: (1) calculate IF based on original scientific contributions; (2) use the 5-year IF; and (3) eliminate self-citation. For journals reporting clinical trials and population research, an index of readership, such as downloads, would better reflect true influence. For journals that report both basic and clinical research, a hybrid measure would assess both research quality and influence. The time has come for the scientific community to transform IF.
Quantification is the bedrock of science. Yet, when it comes to mathematically estimating the relative merit of scientific articles and the journals in which they are published, the scientific community has struggled with significant challenges. Surprisingly, despite over a half-century of study and application of quantitative methods to rate scientific impact, consensus has still not been achieved on the best way to measure the true quality and influence of a scientific contribution or a journal. This conundrum is not because of a lack of available measurement tools, for many have been developed. Rather, it is the reluctant acceptance by the science and publishing world of one of them, IF, as the appropriate gauge of scientific quality. IF, a citation-based tool, was developed in 1955 by Eugene Garfield and has been released annually since 1975 for those journals indexed in Journal Citation Reports.1,2
IF is defined as the mean number of citations received per article published in a specific journal during the year of interest (the numerator) divided by the total number of citable articles published by the journal during the previous 2 years (the denominator). Citable articles include original research papers and reviews but also include clinical practice guidelines, scientific statements, advisories, position papers, and similar materials. In addition, there is a gray zone that includes perspectives, commentaries, essays, highlights, and opinions, among other pieces.3
The thesis of citation-based metrics, such as IF, lies in the perception that citations offer a valid quantitative estimate of a discovery’s importance or a scientist’s stature within the scientific community. However, numerous criticisms have been directed toward IF.4–10 I have attempted to summarize them in rough order of importance as follows: (1) citation mingling—citations to original research articles are commingled with citations to guidelines, statements, advisories, and other nonoriginal material, as detailed earlier, often simply to inflate the journal’s IF. These pieces often have nothing to do with the competence, expertise, vision, or efficiency of the editorial team or the viewpoint or philosophy of the journal but are coveted by journals simply to raise their IFs. (2) Self-citation—IF can be influenced by author citation of his or her own work to promote self-recognition in science. (3) Restricted evaluation period—IF is based on a relatively narrow 2-year time frame, which is too abbreviated to assess long-term scientific impact. The number of citations garnered by an article is heavily weighted by the article’s age; thus, citation-based comparisons favor older papers and seasoned investigators. (4) Subject dependency—a less important article on a common disease is more likely to be cited than a consequential article on a rare disease. (5) Publication emplacement influence—articles in journals with high IFs are more likely to be cited than similar papers in journals with lower IFs. (6) Indiscriminant parity among authors—all coauthors of a multiauthor paper are cited identically, despite widely varying contributions to the work. (7) Disproportionate significance—use of mean, rather than median, citation counts conveys disproportionately high significance to a few highly cited articles. (8) Skewed citation distributions—different disciplines and subdisciplines have different citation patterns. This results in most articles having fewer citations than indicated by IF, and the variation of citations per article can span 2 or 3 orders of magnitude, resulting in major overlap in the citation distributions for different journals.10 IF does not correct for these large variations (by applying field-normalized bibliometric indicators) and does not address the spread of these discordant citation distributions. In addition, editors can game IF by decreasing the number of original research articles or by increasing the number of highly citable reviews, guidelines, statements, commentaries, and other nonoriginal material or can even commission reviews that limit literature coverage to works previously published in the same journal. In addition, citation stacking can occur in which 2 (or more) journals may work unethically together to cite the other’s work.9
The majority of research scientists seek to publish their findings in journals with the highest possible IF, despite widespread condemnation of its use in predicting the long-term importance of a new discovery.11,12 Indeed, IF continues to be inappropriately used in publication, employment, compensation, grant funding, and promotion and tenure decisions.11,12 This has persisted in spite of the fact that IF is well recognized as only a journal-level metric that is statistically flawed to determine the impact or influence of a specific article and to be applied to individuals.4,13
Despite a myriad of aspersions over an extended period of time, in the scientific publishing world IF still persists as the coin of the realm. Most major journals still consider IF as the most important comparative metric in the publishing field, monitor it closely, and advertise it on their websites. Chief among the reasons that journals compete for the highest possible IF is because IF is a major critical determinant of the scientific quality of a journal. Indeed, journals with high IFs are the preferred destinations for high-quality papers and vice versa. Thus, maintaining a high IF is essential for journals to remain valued and relevant. Several alternative citation-based metrics have been introduced to correct for some of the identified IF deficiencies. For example, eigenfactor, introduced by Jevin West and Carl Bergstrom in 2007, tabulates citations from highly ranked journals that are weighted to make a larger contribution to the score than those from poorly ranked journals, at least partially correcting for the influence of publication venue. SCImago Journal Rank, a variant of eigenfactor, is a size-independent indicator that rank-orders journals by their average prestige per article. The newer metrics all have limitations of their own, and none has received meaningful traction.14–16
For the basic biomedical scientist, translational scientist, or even the clinical scientist focusing on discovery research, it can be reasoned that citation-based metrics might be reasonable benchmarks of research quality and impact, especially if combined with other criteria. The readership of basic scientific journals is composed largely of research scientists, who generally contribute themselves to the literature as authors. Although all of the aforementioned criticisms of IF may still apply, the major issue with citation-based metrics for discovery scientists may be the comingling of original research reports with other kinds of communications that may overshadow the discovery’s true value. Discovery scientists want to be known for the influence of their original research findings on their field over time, a value that can arguably be estimated by the necessity of other scientists to cite their work. Thus, a separate IF only for original research articles, independently of reviews, editorials, and other nonoriginal material, might help to answer this specific need.
For clinical and population scientists, however, especially those evaluating the outcomes of existing medical interventions on human health, citation of their work by others, although important, may not be the only or even best measure of success. Indeed, because IF does not correct for differences in citation practices between medical fields, citation-based metrics cannot be used to estimate accurately the impact of medical intervention research.17 Furthermore, the readership for clinical or mixed basic/clinical journals is usually diverse and composed largely of practicing clinicians, the vast majority of whom will never author or cite a paper. Clinicians in practice read journals for varying reasons, such as learning about new concepts that may eventually be translated into clinical practice, clinical education and the knowledge of current clinical practice guidelines, understanding the opposing sides of medical controversies, and keeping up-to-date with changes in the medical community. Thus, IF provides little or nothing, aside from the assessment of original research quality, to rank-order journals for the interests of the practicing clinician.
Alternative methods that can assess how many times an article is read are urgently needed to determine the clinical impact of a journal. One such usage metric is measurement of the number of article downloads, which correlates only poorly with citation index.18 For the busy clinician, downloads may reflect the reader’s intention to assess the results in detail, to discuss the article later with others, or to preserve the article for future reference. Downloads may reflect the ultimate yield of specific and timely literature searches, with a compelling need to know how to handle specific patient problems or address immediate clinical questions. Similar to IF, downloads can be gamed by intentionally increasing the number to increase the relative article or journal rank. However, programs are now available to help detect mass downloading and record only legitimate downloads. Other possible altmetrics include views, articles discussed (using social media), and articles recommended (using, eg, the Faculty 1000 Prime website), among others.
Ultimately, IF in some form is likely here to stay for the foreseeable future. However, recognizing the value placed on original research findings by scientists and the public, a potentially useful modification of IF, would be to weight it in favor of, or even exclusively to focus on, original research reports that reflect the primary output of research journals. Undiluted by other nonoriginal material, such as reviews, guidelines, scientific statements, advisories, and other nonoriginal material, original reports would provide a more meaningful index of a journal’s true scientific influence. Another modification of IF, in light of the time needed for an article to become recognized within the scientific community and to accumulate citations, would be to use the 5-year IF, now available at Thomson Reuters. Even then, an accurate measure of long-term impact might be expected to take at least 10 (or even more) years. A third modification that might provide a less biased assessment of quality and influence would be to eliminate self-citation. These 3 changes would strengthen the assessment of journals that primarily report basic research, such as Circulation Research and Atherosclerosis, Thrombosis, and Vascular Biology.
For journals that report primarily clinical trials and population research, such as Circulation and its family of subspecialty journals, but including a lower proportion of basic research, I would argue for a hybrid measure that can assess both research quality and clinical readership. I suggest that the 5-year IF (limited to original research papers without self-citations) could serve as the evaluative component for discovery science. Although far from perfect, downloads might serve as a measure of readership. For the hybrid metric, the 2 scores could be averaged or alternatively both could be reported separately by the journals.
The medical and scientific community is perpetuating IF mania by continuing to acknowledge it as the sole or predominant impact metric.19 The time has come for the scientific publishing community and its leaders to thoughtfully choose the right set of metrics or promulgate a hybrid metric or series of metrics that can most accurately reflect the scope of readership and the long-term significance of its contributions to science.
There is a tide in the affairs of men,
Which, taken at the flood, leads on to fortune,
Omitted, all the voyage of their life
Is bound in shallows and in miseries.
On such a full sea we are now afloat,
And we must take the current when it serves,
Or lose our ventures.William Shakespeare, Julius Caesar, Act IV, Scene 3 1596–97
I thank Heather Goodell and Gayle Whitman of the American Heart Association staff for providing materials and Dr Richard Santen for offering suggestions to improve the article.
Dr Carey is Chair of the Scientific Publishing Committee of the American Heart Association.
The ideas expressed herein are solely those of the author and do not necessarily represent the ideas or policies of the American Heart Association or its Scientific Publishing Committee.
- © 2016 American Heart Association, Inc.
- 2.↵Anon. http://wokinfo.com/products_tools/analytical/jcr/. Accessed July 1, 2016.
- 3.↵Anon. Citable items: the contested impact factor denominator. http://scholarlykitchen.sspnet.org/2016/02/10/citable-items-the-contested-impact-factor-denominator/. Accessed July 1, 2016.
- Seglen PO
- Smith R
- Loscalzo J
- Nallamothu BK,
- Lüscher TF
- Heneberg P
- Cagan R
- Fersht A
- Falagas ME,
- Kouranos VD,
- Arencibia-Jorge R,
- Karageorgopoulos DE
- Wang D,
- Song C,
- Barabási AL