Why the h-index is little use
In 2005 Jorge E. Hirsch published an article in the Proceedings of the National Academy of Science (link), proposing the “h-index”, a metric for the impact of an academic’s publications.
Your h-index is the largest number n such that you have n papers with n or more citations. So, for example, if you have 21 papers with 21 or more citations, but don’t yet have 22 papers with 22 or more citations, then your h-index is 21.
Hirsch claims that this measure is a better (or at least different) measure of impact than standard measures such as the total number of citations. He gives a number of apparently persuasive reasons why this might be the case.
In my opinion, for nearly all practical purposes, this claim is incorrect. In particular, and as I’ll explain below, you can to a good approximation work out the h-index as a simple function of the total number of citations, and so the h-index contains very little information beyond this already standard citation statistic.
Why am I mentioning this? Well, to my surprise the h-index is being taken very seriously by many people. A Google search shows the h-index is spreading and becoming very influential, very quickly. Standard citation services like the Web of Science now let you compute h-indices automatically. Promotion and grant evaluation committees are making use of them. And, of course, the h-index has been extensively discussed in the blogosphere (e.g., here, here, here, here, here, here, and here).
Hirsch focuses his discussion on physicists, and I’ll limit myself to that group, too; I expect the main conclusions to hold for other groups, with some minor changes in the constants. For the great majority of physicists (I’ll get to the exceptions), the h-index can be computed to a good approximation from the total number of citations they have received, as follows. Suppose T is the total number of citations. For most physicists, the following relationship holds to a very good approximation:
(*) h ~ sqrt(T)/2.
Thus, if someone has 400 citations, then their h-index is likely to be about half the square root of 400, which is 10. If someone has 2500 citations, then their h-index is likely to be about half the square root of 2500, which is 25.
The relationship (*) actually follows from the data Hirsch analysed. He notes in passing that he found empirically that T = a h^2, where a is between between 3 and 5. Inverting the relationship, we find that (*) holds to within an accuracy of about plus or minus 15%. That’s accurate enough – nobody cares whether your h-index is 20 or 23, particularly since citation statistics are already quite noisy. Provided a is in this range, h contains little additional information beyond T, which is already a commonly used citation statistic.
What about the exceptions to this rule? I believe there are two main sources of exception.
The first class of exception is people with very few papers. Someone with 1-4 papers can easily evade the rule, simply because their distribution of citations across papers may be very unusual. In practice, though, this doesn’t much matter, since in such cases it’s possible to look at a person’s entire record, and measures of aggregate performance are not used so much in these cases, anyway.
The second class of exceptions is people who have one work which is vastly more cited than any other work. In that case the formula (*) tends to overstate the h-index. The effect is much smaller than you might think, though, since it seems to be that for the great majority of physicists their top-cited publication has many more citations than their next-most cited publication.
In any case, I hypothesize that this effect is mostly corrected by using the formula:
(**) h approx b sqrt(T’)
where T’ is a the total number of citations, less the most cited publication, and b is a constant which needs to be empirically determined. At a guess I’d believe that omitting the top two cited publications would work even better, but after that we’d hit the point of diminishing returns.
Returning to the main point, my counter-claim to Hirsch is that the h-index contains very little additional information beyond the total number of citations. It’s not that the h-index is irrelevant, it’s just that in the great majority of cases the h-index is not very useful, given that the total number of citations is likely to already be known.