# Why the h-index is little use

I’m continually surprised how popular the h-index is becoming. This post is a reiteration of something I noted in an earlier post: it appears that for most scientists, the h-index can be computed to a good approximation from the formula:

h ~ sqrt(T) / 2,

where T is the total number of citations the scientist. Thus *the h-index contains little information beyond the total number of citations*, and is not properly regarded as a new measure of impact at all. The earlier post contains additional details, including some caveats, and comments on possible further improvements to this rule. It really needs a detailed study, but I don’t think the basic point is in doubt.

Comments are closed.

I’m not a huge fan of the h-index, but here is a minor technical objection to your argument: Using Google Scholar, you can easily calculate h by stepping through the author’s publications, sorted in order of number of citations, until you reach the h-th publication. As you step through the first h publications, you’ll probably see some noise; some publications that don’t belong to the given author. You can easily adjust for this by subtracting these publication from the h-index. Now, consider how you would calculate T using Google Scholar. Google gives you a total citation count, but can you trust it? There is probably some noise in it. But checking the count requires stepping through all of the hits, not merely the first h hits. Therefore the h-index should be preferred to your formula, because the h-index can be verified more readily.

Peter – Well, maybe with some tools the h-index is easier to compute. It doesn’t change my argument, namely, that h is not independent of another widely used measure.

I think it is ironic that we scientists spend our lives measuring and/or predicting measurements, yet we are so terribly bad at judging the value of new measures. Through familiarity with the measurements we use we forget that no measure is perfect. Almost all of them assign a single numeric value to a property whose realization in real life is substantially more complicated. Yet we routinely use measures with all their imperfections because they are powerful tools that allow us to reason about otherwise abstract objects.

The bar for a new measure is not perfection, but rather clear improvement over the previous ones. In that sense Peter’s comment is quite relevant. I would add to that that the h-index is harder to subvert than pure citation count . You see, a scientist can easily increase citation count by inserting a small constant number of self-citations which is why they are often excluded in citation counts (removing self-citations is, by the way, yet another imperfect compromise). On the other hand inflating your h-count is much harder, it takes about quadratically as many more papers, assuming again constant number of self-citations (more than a constant would likely be vetoed by the reviewers and/or editor).

The fact that is about equal to total number of citations which has long been on use, yet the h-index has been the subject of many more criticisms than total citation count, which brings us back to improper evaluation of suitability of a measure.

To be clear, I do share many of the concerns people have expressed about the h-index, and I cringe when committees use this coarse index as if it was a precise instrument. As a whole, its usefulness seems to be on par to a rule of thumb: good enough for informal quick-and-dirty evaluations but not good enough to make any actual decisions based on it.

You may be correct regarding this correlation in large groups, but it’s important to examine deviations from this correlation when deciding which is the more valuable measure. For example, I am a mid-career researcher who eschews high-volume publication. My T is about 60, but my h-index is 27. Your rule-of-thumb would estimate by h at about 4. I would argue that this reflects my attention to publication quality, but could also reflect my field or other factors. My point is that a general correlation does not tell us relative value.

Someone with a h-index of 27 has 27 publications with at least 27 citations each. That means a T of at least 729. It’s not possible to have a T of 60 and a h-index of 27.

Michael, you are correct – I should have paid more attention to the definition of T. I have a little over 3000 citations, so your equation does fit my case quite well. Nevertheless, if I were evaluating these two measures I would still focus on deviations. I’m also somewhat concerned that these indices count first, second, and last authorship in the same way as middle authorship in a long list, but that’s a separate issue that affects both T and h.