Citation metrics

It seems like there is an increasing tendency to use metrics to make decisions. Essentially people want to have some measurable quantity that not only allows them to judge the quality of something, but also allows them to justify the decision that is made. In science, the quantity that is often used is number of citations that a person or scientific paper has. For those who are not familiar with this, it is essentially the number of times a particular piece of work is referred to in other pieces of scientific work. It is used when hiring people, when deciding if someone’s research proposal should be funded, and is likely to be used in the upcoming Research Excellence Framework (REF) that will in a few years time decide how much money each university should get from the Higher Education Funding councils.

Generally, citation numbers are not used in isolation and other factors are also considered. What is slightly worrying is the impression – that I am getting – that it is likely to get more and more important in the future. Why is this worrying? Although in some sense citations are a reasonable measure of quality, it is almost certainly relative. For example, it presumably must in some way depend on the size of the particular discipline or sub-discipline.

Let’s consider a field in which 100 papers are published every year and imagine each paper cites 10 other papers, none of which are more than 10 years old. This means that at any time there are 1000 papers that could be cited. A particular paper therefore has a 10 x 1/1000 chance of being cited in another paper. Since 100 papers are published every year, this means that on average a paper has an even chance of being cited once a year. Over its 10 year lifetime it is therefore likely to be cited about 10 times. This number doesn’t actually depend on the size of field. If I increase the number of papers published to 1000 every year but assume that only 10 other papers are cited in each paper published, each paper should still only be cited about 10 times in 10 years.

The problem is that the above number is an average. Some papers will not get cited at all and others will get more often than the average. The maximum numbers of citations that a paper can receive clearly depends on the size of the field. We might expect good papers to have many more citations than the average, but this is certainly limited by the total number of papers published in a particular field. If we decide to make citation counts an important metric in determining the amount of money a particular field (or particular researcher maybe) gets, this suggests that the biggest fields (or the best researchers in the biggest fields) will get the most money. At first this may seem alright, especially if the initial size of each field has been determined by some other objective measure. Over time, however, the biggest fields will get bigger and the smallest will suffer as a result, especially if the amount of money available means that only those with significantly more than the average number of citations are likely to funded.

One could argue that the smaller fields weren’t very interesting and therefore deserved to be penalised and the biggest fields deserve to get the most money since their size indicates how interested people are in this area. I would buy this argument if the potential of a field to grow didn’t depend strongly on the size of the field. The above also doesn’t consider different citation practices. I was talking recently to a reasonably eminent Cambridge professor who was arguing that we should all cite each other since this is what happens in other research areas and therefore we should do the same to make sure we aren’t disadvantaged.

Essentially I am concerned that if citations numbers become the primary mechanism for determining research quality, we could do a lot of damage to very interesting areas that are not large enough to be competitive according to this rather simplistic metric. This isn’t to say that we shouldn’t use them at all, but we should be aware of the various selection effects. Of course, one problem will be that most researchers probably work in the largest research fields and at least half of these people have better than average citations counts. Since the people making some of the decisions may well fall into this category, it’s not really in their interest to be more objective about how research quality should be determined since they will do perfectly well if citation counts becomes the primary metric for judging quality.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s