REF again!

Some interesting posts recently about the forthcoming Research Excellence Framework (REF2014). An interesting post by Dave Fernig called In Defence of REF. This post does make some valid points. REF, and previous RAEs, may well have encouraged more sensible hiring practices in which the quality of the applicant is taken more seriously than maybe it was in the distant past. Two comments I would make are that it I still think that teaching ability is still not taken seriously enough and, in my field at least, many places have adopted a very risky hiring strategy that – I hope – doesn’t come back to bite us in 5 years time. Dave Fernig also seems to feel that the panel, in his field, can distinguish between excellent, good and mediocre papers. This may well be true in his field, but I don’t think it is for my field (physics).

Peter Coles, who writes the Telescoper blog, has written a new post called Counting for the REF. I won’t say much about this post as you can read it for yourself, but I agree with much of what is said. Maybe the most concerning comment in the post was the suggestion that the weighting – when determing the funding distribution – would be 9 for 4* papers and 1 for 3* papers. Essentially, most of the funding would be determined by 4* paper and a very small amount would be associated with 3* papers. Fundamentally I think this is unfortunate as it gives very little credit to some very good papers and absolutely no credit to what might be quite good papers (there is no funding associated with 2*).

There is a more fundamental concern that is associated with what is discussed in Peter Coles’s post. In a recent post (Some more REF thoughts) I pointed out that in Physics fewer than 10% of all papers get more than 10 citations per year. The claim is that two members of the REF panel will read and assess each paper. However, as pointed out by others, this would require each panel member to read 2 papers per day for a year. Consequently, it is impossible for them to give these papers as much scrutiny as they would be given if they were being properly peer-reviewed. There is an expectation that metrics (citations for example) will play an important role in deciding how to rate the papers. How could you do this? You could set a threshold and say, for example, that since most papers get fewer than 10 citations a year that 4* papers will be those that receive more than 10 citations a year. The problem that I have (ignoring that citations are not necessarily a good indicator of quality) is that this would then be a very small fraction (about 5%) of all published papers. The distribution of REF funding would then be being determined by a minority of the work published since 2008. This means that small variations can have a big impact on how the money is distributed. One could imagine that just a few papers being judged 3* instead of 4* could have a massive impact on how much money a department gets (I accept that the money doesn’t actually go to the department, but you probably know what I mean).

Alternatively, if you want to avoid small variations having a big impact you would need 4* papers to make up a reasonable fraction of the assessed papers (maybe 10 – 20%). The problem here is that you’re now getting down to papers that are only collecting a few (5-10) citations per year, so where do you draw the boundary. Is 3 per year too few, but 5 a year okay. You could argue that these are just being used to to guide the assessment and that the panels’ reading of the paper will allow a distinction to be drawn between 4* and 3* papers. This doesn’t, however, change the fact that the panel members have to read a massive number of papers. It feels more like combining two completely flawed processes and hoping that what pops out the other side is okay.

I suggested in an earlier post (REF prediction) that, given the diverse nature of a typical academic department or university, that this might be an appropriate time to simply consider using some kind of metric. I did a quick analysis of all 42 physics departments’s h-indices and saw a reasonable correlation between their h-index and how they did in RAE2008. I noticed today that Deevy Bishop, who writes a blog called BishopBlog, has made a similar suggestion and carried out the same kind of analysis for psychology. Her analysis seems quite similar to mine and suggested that this would be An alternative to REF2014.

Anyway, it’s quite good to see others writing about REF2104 (whether for or against). I think it is a very important issue and I’m just disappointed that it is probably too late to make any changes that would make the REF2014 process simpler and more likely to produce a reasonable ranking.


The negative impact of REF

The more I learn about the Research Excellence Framework (REF) the less convinced I am about the merits of this whole exercise. That universities are assessed to get some idea of how best to distribute a pot of money is fine. The way in which it is done, and the “games” that appear to be played by universities and university departments, is what concerns me. For starters, something like 300 senior people are involved in actually carrying out the assessments and numerous others are involved in preparing the submissions. The cost of doing this must be substantial (plus these are meant to be our leading researchers who are spending a large fraction of their time assessing everyone else). Some might argue that the amount being distributed (billions) makes it worth spending all this money carrying out the assessment.

An alternative argument might be that if ever it was an appropriate occasion in which to use metrics, it would be when assessing a large diverse organisation like a university. The problem with metrics (like citations) is that comparing different fields (or even different areas within the same field) is difficult because there might different citation practices in different fields and the size of the field plays a role. A typical university, however, has so many different fields that these variations should – to a certain extent – cancel and one could probably get a pretty good idea of the quality of a university by considering citations statistics and other metrics (number of spin-out companies, patents, etc.). One could also be a bit cruder in the rankings. I don’t really believe that we can rank universities perfectly. Rather than first, second, third…, it could be top 3, next 5, next 5, etc.

What concerns me more are the implications of what universities and university departments seem to be willing to do to optimise their REF scores. You can include research fellows in REF submissions and so there will be lots of carrots dangled to try to ensure that no Fellows leave before the REF census date in October 2013. Some of these research fellows may also be offered permanent positions that will start when their Fellowships end, either to keep them or to attract them away from another university. These will clearly be very good researchers, but I have an issue with a hiring practice in which holding a Fellowship plays a significant role in whether or not you will be hired. Getting a Fellowship is a bit of a lottery in the first place and what about those whose Fellowships are just due to end. It becomes a bit of a career year lottery – if you have a number of years left on a Fellowship at the same time as a REF submission you are more likely to get a permanent academic job than if you don’t.

There are also other issues. Departments will potentially be creating a number of new posts at a very uncertain time. What if things do not work as expected. How do you pay these people once they come off their Fellowships. What about the stability of academic careers. A burst of hiring every 7 years to coincide with REF submissions doesn’t seem very sensible. I should add, however, that if anyone who actually reads this has managed to get a permanent job or a promise of a permanent job, well done to you. I should also add that my views are not really based on anything specific, just a sense that we are letting the REF dictate our behaviour in a way that may not be ideal and wouldn’t be how we’d behave if the REF wasn’t happening. You have to worry slightly about the validity of an assessment exercise that has such a potentially strong influence on the behaviour of the organisations it is trying to assess. Can’t really be regarded as independent.

Citation metrics

It seems like there is an increasing tendency to use metrics to make decisions. Essentially people want to have some measurable quantity that not only allows them to judge the quality of something, but also allows them to justify the decision that is made. In science, the quantity that is often used is number of citations that a person or scientific paper has. For those who are not familiar with this, it is essentially the number of times a particular piece of work is referred to in other pieces of scientific work. It is used when hiring people, when deciding if someone’s research proposal should be funded, and is likely to be used in the upcoming Research Excellence Framework (REF) that will in a few years time decide how much money each university should get from the Higher Education Funding councils.

Generally, citation numbers are not used in isolation and other factors are also considered. What is slightly worrying is the impression – that I am getting – that it is likely to get more and more important in the future. Why is this worrying? Although in some sense citations are a reasonable measure of quality, it is almost certainly relative. For example, it presumably must in some way depend on the size of the particular discipline or sub-discipline.

Let’s consider a field in which 100 papers are published every year and imagine each paper cites 10 other papers, none of which are more than 10 years old. This means that at any time there are 1000 papers that could be cited. A particular paper therefore has a 10 x 1/1000 chance of being cited in another paper. Since 100 papers are published every year, this means that on average a paper has an even chance of being cited once a year. Over its 10 year lifetime it is therefore likely to be cited about 10 times. This number doesn’t actually depend on the size of field. If I increase the number of papers published to 1000 every year but assume that only 10 other papers are cited in each paper published, each paper should still only be cited about 10 times in 10 years.

The problem is that the above number is an average. Some papers will not get cited at all and others will get more often than the average. The maximum numbers of citations that a paper can receive clearly depends on the size of the field. We might expect good papers to have many more citations than the average, but this is certainly limited by the total number of papers published in a particular field. If we decide to make citation counts an important metric in determining the amount of money a particular field (or particular researcher maybe) gets, this suggests that the biggest fields (or the best researchers in the biggest fields) will get the most money. At first this may seem alright, especially if the initial size of each field has been determined by some other objective measure. Over time, however, the biggest fields will get bigger and the smallest will suffer as a result, especially if the amount of money available means that only those with significantly more than the average number of citations are likely to funded.

One could argue that the smaller fields weren’t very interesting and therefore deserved to be penalised and the biggest fields deserve to get the most money since their size indicates how interested people are in this area. I would buy this argument if the potential of a field to grow didn’t depend strongly on the size of the field. The above also doesn’t consider different citation practices. I was talking recently to a reasonably eminent Cambridge professor who was arguing that we should all cite each other since this is what happens in other research areas and therefore we should do the same to make sure we aren’t disadvantaged.

Essentially I am concerned that if citations numbers become the primary mechanism for determining research quality, we could do a lot of damage to very interesting areas that are not large enough to be competitive according to this rather simplistic metric. This isn’t to say that we shouldn’t use them at all, but we should be aware of the various selection effects. Of course, one problem will be that most researchers probably work in the largest research fields and at least half of these people have better than average citations counts. Since the people making some of the decisions may well fall into this category, it’s not really in their interest to be more objective about how research quality should be determined since they will do perfectly well if citation counts becomes the primary metric for judging quality.