A new REF algorithm

In a previous post (REF prediction) I looked up the h-indices and the citations per publication for all Physics and Astronomy departments included in RAE2008. I ranked them in terms of their h-index, in terms of their citations per publication, and as average of these two. It looked alright but I did comment that one could produce something more sophisticated. At the time I did worry that using just the h-index would disadvantage smaller departments, but I couldn’t really think of what else to do and it was just a very basic exercise.

Deevy Bishop has, however, suggested an alternative way of ranking the departments. This is to basically relate the income they get with their h-index. For example, in RAE2008 each department was ranked according to what fraction of their papers were 4*, 3*, 2*, 1* and U. The amount of funding they received (although I think it technically went to the university, rather than to the department) was then scaled according to N(0.1×2* + 0.3×3* + 0.7×4*) where N was the number of FTEs submitted. This data can all be downloaded from the RAE2008 website. Deevy Bishop did an analysis for psychology and discovered that the level of funding from RAE2008 correlated extremely well the department’s h-index. What was slightly concerning was that the correlation was even stronger if one also included whether or not a department was represented on the RAE2008 panel.

I’ve now done the same analysis for Physics and Astronomy. I’ve added various figures and text to my REF prediction post, but thought it worth making it more prominent by adding it to a new post. The figure showing RAE2008 funding plotted against h-index is below. According to my quick calculation, the correlation is 0.9. I haven’t considered how this changes if you include whether or not a department was represented on the RAE2008 panel. The funding formula for REF2014 might possibly be N(0.1×3* + 0.9×4*). I’ve redone the figure below to see what the impact would have been if this formula had been used instead of the RAE2008 formula. It’s very similar and – if you’re interested – it’s included at the bottom of my REF prediction post. It does seem that if all we want to know is how to distribute the money, relating it to a department’s h-index seems to work quite well (or at least it would have worked well if used for RAE2008). I’m not quite sure how easy it would be to produce an actual league table though. Given that the REF2014 formula may depend almost entirely on the fraction of 4*, one could simply divide the h-index by the number of FTEs to get a league table ranking, but I haven’t had a chance to see if this produces anything reasonable or not. Of course, noone really trusts league tables anyway, so it may be a good thing if we don’t bother producing one.

A plot of h-index against the RAE2008 funding formula - N(0.1x2* + 0.3*3* + 0.7*4*).

A plot of h-index against the RAE2008 funding formula – N(0.1×2* + 0.3×3* + 0.7×4*).

Advertisements

REF again!

Some interesting posts recently about the forthcoming Research Excellence Framework (REF2014). An interesting post by Dave Fernig called In Defence of REF. This post does make some valid points. REF, and previous RAEs, may well have encouraged more sensible hiring practices in which the quality of the applicant is taken more seriously than maybe it was in the distant past. Two comments I would make are that it I still think that teaching ability is still not taken seriously enough and, in my field at least, many places have adopted a very risky hiring strategy that – I hope – doesn’t come back to bite us in 5 years time. Dave Fernig also seems to feel that the panel, in his field, can distinguish between excellent, good and mediocre papers. This may well be true in his field, but I don’t think it is for my field (physics).

Peter Coles, who writes the Telescoper blog, has written a new post called Counting for the REF. I won’t say much about this post as you can read it for yourself, but I agree with much of what is said. Maybe the most concerning comment in the post was the suggestion that the weighting – when determing the funding distribution – would be 9 for 4* papers and 1 for 3* papers. Essentially, most of the funding would be determined by 4* paper and a very small amount would be associated with 3* papers. Fundamentally I think this is unfortunate as it gives very little credit to some very good papers and absolutely no credit to what might be quite good papers (there is no funding associated with 2*).

There is a more fundamental concern that is associated with what is discussed in Peter Coles’s post. In a recent post (Some more REF thoughts) I pointed out that in Physics fewer than 10% of all papers get more than 10 citations per year. The claim is that two members of the REF panel will read and assess each paper. However, as pointed out by others, this would require each panel member to read 2 papers per day for a year. Consequently, it is impossible for them to give these papers as much scrutiny as they would be given if they were being properly peer-reviewed. There is an expectation that metrics (citations for example) will play an important role in deciding how to rate the papers. How could you do this? You could set a threshold and say, for example, that since most papers get fewer than 10 citations a year that 4* papers will be those that receive more than 10 citations a year. The problem that I have (ignoring that citations are not necessarily a good indicator of quality) is that this would then be a very small fraction (about 5%) of all published papers. The distribution of REF funding would then be being determined by a minority of the work published since 2008. This means that small variations can have a big impact on how the money is distributed. One could imagine that just a few papers being judged 3* instead of 4* could have a massive impact on how much money a department gets (I accept that the money doesn’t actually go to the department, but you probably know what I mean).

Alternatively, if you want to avoid small variations having a big impact you would need 4* papers to make up a reasonable fraction of the assessed papers (maybe 10 – 20%). The problem here is that you’re now getting down to papers that are only collecting a few (5-10) citations per year, so where do you draw the boundary. Is 3 per year too few, but 5 a year okay. You could argue that these are just being used to to guide the assessment and that the panels’ reading of the paper will allow a distinction to be drawn between 4* and 3* papers. This doesn’t, however, change the fact that the panel members have to read a massive number of papers. It feels more like combining two completely flawed processes and hoping that what pops out the other side is okay.

I suggested in an earlier post (REF prediction) that, given the diverse nature of a typical academic department or university, that this might be an appropriate time to simply consider using some kind of metric. I did a quick analysis of all 42 physics departments’s h-indices and saw a reasonable correlation between their h-index and how they did in RAE2008. I noticed today that Deevy Bishop, who writes a blog called BishopBlog, has made a similar suggestion and carried out the same kind of analysis for psychology. Her analysis seems quite similar to mine and suggested that this would be An alternative to REF2014.

Anyway, it’s quite good to see others writing about REF2104 (whether for or against). I think it is a very important issue and I’m just disappointed that it is probably too late to make any changes that would make the REF2014 process simpler and more likely to produce a reasonable ranking.

REF prediction

I’ve come to feel more strongly that although the Research Excellence Framework (REF) is trying to do something reasonably decent, it is doing it in a ridiculous and counterproductive way. Not only does it take an awful lot of effort and time, it also has a big impact on how universities and university departments behave. As I’ve mentioned before, I think the amount of effort expended assessing the various university departments in order to give them a REF score seems excessive and that using metrics might be more appropriate. I don’t particularly like the use of metrics, but if ever there was an appropriate time it would be when assessing a large diverse organisation like a University.

To put my money where my mouth is, I decided to see if I could come up with a ranking for all of the 42 Physics and Astronomy departments that were included in RAE2008 (the precursor to REF). For REF2014, each department will submit 4 papers per submitted academic and these papers must be published or in press between January 2008 and October 2013. What I did is went to Web of Science and found all the papers published in Physics and Astronomy for each of the 42 departments included in RAE2008. Since it is currently October 2011, I used papers published between January 2006 to October 2011. I also didn’t exclude reviews or conference papers. For each department I then determined the h-index of their publications and the number of citations per publications. I then ranked the departments according to these two metrics and then decided that the final ranking would be determined by the average of these two rankings. The final table is shown below. It is ordered in terms of the average of the h-index and citations per publications ranking, but these individual rankings are also shown. I also show the ranking that each department achieved in RAE2008.

I don’t know if the above ranking has any merit, but it took a couple of hours and seems – at first glance at least – quite reasonable. The departments that one would expect to be strong are near the top and the ones that one might expect to be weaker are near the bottom. I’m sure a more sophisticated algorithm could be determined and other factors included but I predict (I’ll probably regret doing this) that the final rankings that will be reported sometime in 2015 will be reasonably similar to what I’ve produced in a rather unproductive afternoon. We’ll see.

Addendum – added 21/03/2013
Deevy Bishop, who writes a blog called BishopBlog, has carried out a similar excercise for Psychology. In her post she compares the h-index rank with the RAE2008 position and also works out the correlation. I thought I would do the same for my analysis. Slightly different in that Deevey Bishop considered the h-index rank for the time period associated with RAE2008, while I’ve considered the h-index rank associated with a time period similar to that for REF2014, but it should still be instructive. If I plot the RAE2008 rank against h-index rank, I get the figure below. The correlation is 0.66, smaller than the 0.84 that Deevey Bishop got for Psychology, but not insignificant. There are some clear outliers and the scatter is quite large. Also, this was a very quick analysis and something more sophisticated, but still simpler than what is happening for REF2004, could certainly be developed.

h-index rank from this work plotted against RAE2008 rank for all Physics departments included in RAE2008.

h-index rank from this work plotted against RAE2008 rank for all Physics departments included in RAE2008.

Additional addendum
Deevy Bishop, through a comment on my most recent post, has described a sensible method for weighting the RAE2008 results to take into account the number of staff submitted. The weighting (which essentially ranks them by how much the funding each institution received) is N(0.1×2* + 0.3×3* + 0.7×4*) where, N, is the number of staff submitted and 2*, 3*, 4* are the percentage of the submitted papers at each rating. If I then compare the h-index rank from above with this new weighted rank I get the figure below which (as Deevey Bishop found for psychology) shows a much stronger correlation than my figure above. Deevey Bishop checked the correlation for physics and found a value of 0.8 using the basic data, and a value of 0.92 if one included whether or not an institution had a staff member on the panel. I did a quick correlation and found a value of 0.92 without taking panel membership into account. Either way, the the correlation is remarkably strong and seems to suggest that one could use h-indices to get quite a good estimate of how to distribute the REF2014 funding.

Plot showing the h-index rank (x-axis) and a weighted RAE2008 ranking (y-axis) for all UK Physics institutions included in RAE2008.

Plot showing the h-index rank (x-axis) and a weighted RAE2008 ranking (y-axis) for all UK Physics institutions included in RAE2008.

Another addendum
I realised that in the figure above I had plotted RAE2008 funding level rank against h-index rank, rather than simply RAE2008 funding level against h-index. I’ve redone the plot and the new one is below. It still correlates well (correlation of 0.9 according to my calculation). I’ve also done a plot showing h-index (for the RAE2008 period admittedly) against what might be the REF2014 formula, which is thought to be N(0.1×3* + 0.9*4*). It still correlates well but, compared to the RAE2008 plot, it seems to shift the bottom points to the right a little. This is presumably because the funding formula now depends strongly on the fraction of 4* papers, and so the supposedly weaker institutions suffer a little compared to the more highly ranked institutions. Having said that, the plot using the possible REF2014 funding formula, does seem very similar to the RAE2008 figure, so I hope I haven’t made some kind of silly mistake. Don’t think so. Presumably it just means that for RAE2008, (0.1×3* + 0.9×4*) is similar to (0.1×2*+0.3×3*+0.7×4*).

A plot of h-index against the RAE2008 funding formula - N(0.1x2* + 0.3*3* + 0.7*4*).

A plot of h-index against the RAE2008 funding formula – N(0.1×2* + 0.3*3* + 0.7*4*).


A plot showing h-index (for RAE2008 period) plotted against a possible REF2014 formula - N(0.1x3* + 0.9x4*).

A plot showing h-index (for RAE2008 period) plotted against a possible REF2014 formula – N(0.1×3* + 0.9×4*).