An Impact driver

An Impact driver

Over the last year, I’ve been learning more and more about research metrics and evaluation. One of the common objections that I’ve heard voiced to the use of new metrics for evaluating research performance is that they may in part be driven by unsuccessful and disgruntled researchers who want to change the way that research is assessed in order to better suit themselves. The unspoken corollary being that the outputs and efforts that they seek credit for are less valid, or rigorous. In other words – easier.

I have to tread a little carefully here, I left academia in part because I felt I wouldn’t have the career that I wanted given the way that research is currently assessed and grant money awarded, particularly in the US. That said, I knew and still know many competent and active researchers whose contributions are significantly underrated. The need to publish articles in journals with high impact factors sometimes punishes those who are doing work that is no less important but has a narrower audience because it is more specialized or more challenging to understand.

You can’t get too far into a conversation about research assessment without somebody mentioning the Impact Factor (IF). I’ll spare you the potted history of the IF and I’ll leave aside the pedestrian exercise of listing the objections to it. There’s a good post here which lists them out and gives the arguments. Instead I’d like to pick apart a couple of them, just because I think that’s a more interesting thing to do.

Impact Factor is a mean and it should be a median

Most people, who haven’t done a course in statistics, and many who have would look at this objection and think that it’s just fussy and pedantic. Means and medians turn out the same usually, don’t they? If, like me you care a little too much about numbers or have seen how using the wrong type of statistic can lead to the wrong conclusion, this objection is worth taking a closer look at. So does it matter?

Yes…. and no.

Let’s start with the ‘yes’ part. If you plot citation frequency from a journal you’ll almost certainly see that the data doesn’t look like a classic bell curve, which sciencey types call a Gaussian distribution. It’s strongly peaked at a low number and has a long tail pointing to the right. This isn’t something that would surprise a statistician, citations are independent events; they’re discrete, not continuous (you can’t have half a citation), and they’re rare (statistically speaking). Even a high IF like 35 isn’t really a big number, mathematically speaking, and most papers don’t get cited much, if at all. This is why a small number of highly cited papers tend to have a disproportionate effect on the mean value (the IF), why recruiting more review articles is such an effective tactic to raise the IF, and in extreme cases, why a single paper can have a large effect. Many of the problems concerning IF are due in part to the fact that you shouldn’t use an arithmetic mean for a non-Gaussian data set. You should probably use the median (or if you’re so inclined, figure out which distribution it should be, and use that to calculate the expected value using clever maths that involve letters rather than numbers).

Now for the ‘no’ part. Arguably, it’s not all that important because, generally speaking, the IF correlates very well with the median 5 year citation rate for the journal. Perhaps that’s to be expected. In a recent conversation I had with Digital Science’s Jonathan Adams, he told me that the current thinking is that citation distributions are most likely a negative binomial, because well cited papers tend to go on to be better cited in the future. (I guessed Poisson distribution, at least for the 2 year IF. Shows what I know)

Therein lies the rub. Once a paper has its perceived value raised by being cited, it’s likely to get cited again. That’s related to the next objection which I will explore in part two of this blog, so stay tuned for next week’s posting.