Following on from yesterday’s post about the forthcoming Research Excellence Framework that plans to use citations as a measure of research quality, I thought I would have a little rant on the subject of bibliometrics.
Recently one particular measure of scientific productivity has established itself as the norm for assessing job applications, grant proposals and for other related tasks. This is called the h-index, named after the physicist Jorge Hirsch, who introduced it in a paper in 2005. This is quite a simple index to define and to calculate (given an appropriately accurate bibliographic database). The definition is that an individual has an h-index of h if that individual has published h papers with at least h citations. If the author has published N papers in total then the other N-h must have no more than h citations. This is a bit like the Eddington number. A citation, as if you didn’t know, is basically an occurrence of that paper in the reference list of another paper.
To calculate it is easy. You just go to the appropriate database – such as the NASA ADS system – search for all papers with a given author and request the results to be returned sorted by decreasing citation count. You scan down the list until the number of citations falls below the position in the ordered list.
Incidentally, one of the issues here is whether to count only refereed journal publications or all articles (including books and conference proceedings). The argument in favour of the former is that the latter are often of lower quality. I think that is in illogical argument because good papers will get cited wherever they are published. Related to this is the fact that some people would like to count “high-impact” journals only, but if you’ve chosen citations as your measure of quality the choice of journal is irrelevant. Indeed a paper that is highly cited despite being in a lesser journal should if anything be given a higher weight than one with the same number of citations published in, e.g., Nature. Of course it’s just a matter of time before the hideously overpriced academic journals run by the publishing mafia go out of business anyway so before long this question will simply vanish.
The h-index has some advantages over more obvious measures, such as the average number of citations, as it is not skewed by one or two publications with enormous numbers of hits. It also, at least to some extent, represents both quantity and quality in a single number. For whatever reasons in recent times h has undoubtedly become common currency (at least in physics and astronomy) as being a quick and easy measure of a person’s scientific oomph.
Incidentally, it has been claimed that this index can be fitted well by a formula h ~ sqrt(T)/2 where T is the total number of citations. This works in my case. If it works for everyone, doesn’t it mean that h is actually of no more use than T in assessing research productivity?
Typical values of h vary enormously from field to field – even within each discipline – and vary a lot between observational and theoretical researchers. In extragalactic astronomy, for example, you might expect a good established observer to have an h-index around 40 or more whereas some other branches of astronomy have much lower citation rates. The top dogs in the field of cosmology are all theorists, though. People like Carlos Frenk, George Efstathiou, and Martin Rees all have very high h-indices. At the extreme end of the scale, string theorist Ed Witten is in the citation stratosphere with an h-index well over a hundred.
I was tempted to put up examples of individuals’ h-numbers but decided instead just to illustrate things with my own. That way the only person to get embarrased is me. My own index value is modest – to say the least – at a meagre 27 (according to ADS). Does that mean Ed Witten is four times the scientist I am? Of course not. He’s much better than that. So how exactly should one use h as an actual metric, for allocating funds or prioritising job applications, and what are the likely pitfalls? I don’t know the answer to the first one, but I have some suggestions for other metrics that avoid some of its shortcomings.
One of these addresses an obvious deficiency of h. Suppose we have an individual who writes one brilliant paper that gets 100 citations and another who is one author amongst 100 on another paper that has the same impact. In terms of total citations, both papers register the same value, but there’s no question in my mind that the first case deserves more credit. One remedy is to normalise the citations of each paper by the number of authors, essentially sharing citations equally between all those that contributed to the paper. This is quite easy to do on ADS also, and in my case it gives a value of 19. Trying the same thing on various other astronomers, astrophysicists and cosmologists reveals that the h index of an observer is likely to reduce by a factor of 3-4 when calculated in this way – whereas theorists (who generally work in smaller groups) suffer less. I imagine Ed Witten’s index doesn’t change much when calculated on a normalized basis, although I haven’t calculated it myself.
Observers complain that this normalized measure is unfair to them, but I’ve yet to hear a reasoned argument as to why this is so. I don’t see why 100 people should get the same credit for a single piece of work: it seems like obvious overcounting to me.
Another possibility – if you want to measure leadership too – is to calculate the h index using only those papers on which the individual concerned is the first author. This is a bit more of a fiddle to do but mine comes out as 20 when done in this way. This is considerably higher than most of my professorial colleagues even though my raw h value is smaller. Using first author papers only is also probably a good way of identifying lurkers: people who add themselves to any paper they can get their hands on but never take the lead. Mentioning no names of course. I propose using the ratio of unnormalized to normalized h-indices as an appropriate lurker detector…
Finally in this list of bibliometrica is the so-called g-index. This is defined in a slightly more complicated way than h: given a set of articles ranked in decreasing order of citation numbers, g is defined to be the largest number such that the top g articles altogether received at least g2 citations. This is a bit like h but takes extra account of the average citations of the top papers. My own g-index is about 47. Obviously I like this one because my number looks bigger, but I’m pretty confident others go up even more than mine!
Of course you can play with these things to your heart’s content, combining ideas from each definition: the normalized g-factor, for example. The message is, though, that although h definitely contains some information, any attempt to condense such complicated information into a single number is never going to be entirely successful.
Comments, particularly with suggestions of alternative metrics are welcome via the box. Even from lurkers.
