Declaration on Research Assessment

Just thought I would highlight the San Francisco Declaration on Research Assessment. It’s been a long day and I’m quite tired, so I don’t want to say too much. You can read most of this yourself, but basically a group of editors and publishers of scholarly journals met during the Annual Meeting of The American Society for Cell Biology (ASCB), and have come up with a set of recommendations about the use of journal-based metrics. The basic motivation was

  • the need to eliminate the use of journal-based metrics, such as Journal Impact Factors, in funding, appointment, and promotion considerations;

  • the need to assess research on its own merits rather than on the basis of the journal in which the research is published; and

  • the need to capitalize on the opportunities provided by online publication (such as relaxing unnecessary limits on the number of words, figures, and references in articles, and exploring new indicators of significance and impact).

  • A lot of it seemed to focus on the use of Journal Impact factors when assessing individual bits of research (which, as many have already pointed out, is horribly flawed) but there was indication that this was attempting to go further than just that. Citation metrics can be useful but can also be problematic. There can be a huge range of different practices even with the same basic area and so using them alone to assess an individual can disadvantage some potentially excellent researchers. It can also be advantageous to some who aren’t particularly good but who just happen to work in an area that is currently popular and in which papers are collecting citations easily.

    Anyway, this is an encouraging step and I hope it has some impact and that it’s taken seriously by funding agencies, interview panels and promotion boards. I suspect it’s too late to have much effect on the REF2014 panels, but maybe there’s hope for REF2021.


    Climate change – statistical significance

    I’ve written before about the claim that there has been no warming for the last 16 years. This is often made by those who are skeptical – or deny – the role of man in climate change. As I mentioned in my earlier post, this is actually because the natural scatter in the temperature anomaly data is such that it is not possible to make a statistically significant statement about the warming trend if one considers too short a time interval.

    The data that is normally used for this is the temperature anomaly. The temperature anomaly is the difference between the global surface temperature and some long-term average (typically the average from 1950-1980). This data is typically analysed using Linear Regression. This involves choosing a time interval and then fitting a straight line to the data. The gradient of this line gives the trend (normally in oC per decade) and this analysis also determines the error using the scatter in the anomaly data. What is normally quoted is the 2σ error. The 2σ error tells us that the data suggests that there is a 95% chance that the actual trend lies between (trend + error) and (trend – error). The natural scatter in the temperature anomaly data means that if the time interval considered is too short, the 2σ error can be larger than the gradient of the best fit straight line. This means that we cannot rule out (at a 95% level) that surface temperatures could have decreased in this time interval.

    We are interested in knowing if the surface temperatures are rising at a rate of between 0.1 and 0.2 oC per decade. Typically we therefore need to consider time intervals of about 20 years or more if we wish to get a statistically significant result. The actual time interval required does vary with time and the figure below shows, for the last 30 years, the amount of time required for a statistically significant result. This was linked to from a comment on a Guardian article so I don’t know who to credit. Happy to give credit if someone can point put who deserves it. The figure shows that, in 1995, the previous 20 years were needed if a statistically significant result was to be obtained. In 2003, however, just over 10 years was required. Today, we need about 20 years to get a statistically significant result. The point is that at almost any time in the past 30 years, at least 10 years of data – often more – was needed to determine a statistically significant warming trend and yet noone would suggest that there has been no warming since 1980. That there has been no statistically significant warming since 1997 is therefore not really important. It’s perfectly normal and there are plenty of other periods in the last 30 years were 20 years of data was needed before a statistically significant signal was recovered. It just means we need more data. It doesn’t mean that there’s been no warming.

    Figure showing the number of years of data necessary for the warming trend to be statistically significant at the 2σ level.

    Figure showing the number of years of data necessary for the warming trend to be statistically significant at the 2σ level.

    Extremes of the distribution

    This could be a very boring post (even more so than normal) and could be complete nonsense, but anyway. I was watching the final of the Australian Open tennis on Sunday and it struck me that it was quite amazing that in many of these tournaments, it quite often goes to form. Typically those in the final and semi-finals are those who were ranked highest. This may seem obvious but naively one might imagine it to be quite different. These tournaments comprise 100 or so of the best tennis players in the world. Being so much better than everyone else, you might imagine that they would be similar and that a player ranked 30th could, quite often, beat a player in the top 10. You might expect the rankings to change quite regularly. It seems, however, that this isn’t really the case. When a player reaches the top 10 they can stay there for quite some time and that even though these are some of the best tennis players in the world, a player in the top 10 will typically beat a player ranked 30.

    I wondered if this wasn’t quite a nice illustration of the properties of a normal distribution. Imagine we could plot a distribution of the abilities of all tennis players. We might expect it to have a normal distribution like that shown in the figure below. There will be a few really bad tennis players, most will be average, and there will be a few very good tennis. I don’t know if this is really what it would be like, but if the sample is sufficiently large, the central limit theorem suggests that it is likely to settle to a normal distribution.

    Graph showing a typical Normal Distribution, taken from the Wikipedia article about the Standard Deviation.

    Graph showing a typical Normal Distribution, taken from the Wikipedia article about the Standard Deviation.

    The form of the Normal Distribution is

    where N is the size of the sample and σ is the standard deviation (essentially how variable the distribution is). If the distribution is very narrow (i.e., everyone’s abilities are very similar) the standard deviation is small. If the distribution is wide (i.e., the abilities are quite varied) the standard deviation would be big.

    What the figure above also shows is the percentage of the sample in each standard deviation interval. For example 34.1% of the sample lie between 0 and 1σ and 13.6% lie between 1σ and 2σ. The table below shows the percentage for each interval up to 6σ
    In the above I’ve only considered the intervals above the mean. If one was presenting some data analysis with errors, the error would normally be some number of standard deviations and would tell you how significant the result is. For example 1σ errors would tell you that there was a 68.2% chance that the result lies in the reported range (i.e., 34.1% times 2 since you’re considering the region on either side of the most likely value). If you report a 5σ error (as for the Higgs Boson result) this means that there is a 99.99994% chance of the actual value lying within the reported range (i.e., 100 – 0.00003×2) – although in this case it may actually be that there is a 99.99994% chance that the signal is real, rather than an extremely unlikely noise spike.

    Where am I going with this? There appear to be a few tennis players in the world who will typically beat almost anyone else almost all the time. If I assume that there are 10 million active tennis players in the world (I have no idea if this is a reasonable number or not) then the table above would suggest that only 0.00003% of them would have abilities 5-6σ better than the average. This means that there would only typically be 3 players who have this level of ability (i.e., 0.00003/100×10000000). Essentially, if tennis ability is normally distributed you would actually expect there to be only a few players in the world who are significantly better than the average. So, maybe this makes perfect sense. As you consider the extremes, there are fewer and fewer players and if you have a big enough sample (and given that many play tennis, the sample is probably quite large) you have a chance of finding a small number who are so much better than the rest that they would typically beat almost anyone else. Alternatively, this is all nonsense and I have no idea what I’m talking about.

    A little more Climate Change Analysis

    There was an item on BBC news last night about the Met Office’s revisal of the the temperature trend for the next few years. They were predicting that the period between 2012 and 2016 would be higher than the long-term mean by 0.54oC, with a range of 0.36-0.72oC. It has now been revised to 0.43oC with a range of 0.28-0.59oC. What was a little frustrating was a comment made by David Shukman on the BBC news last night that this would mean 20 years of no significant warming. This is, I imagine, based on the claim that there has been no significant warming for the last 16 years.

    As I pointed out in in an earlier post, there is a difference between the trend not being statistically significant and there being no significant warming. Given that the trend appears to be between 0.1 and 0.2oC per decade and the scatter in the anomaly data is about 0.1oc, the error in the trend over a period of about 15 years is likely to be about 0.15oC per decade. It will be similar to, or bigger than, the trend and hence, if one only consider a 15 year period, the trend will not be statistically significant. This doesn’t mean that there isn’t a warming trend. It just means that we can’t yet measure it. We would need to probably consider 20 years or longer to get a statistically significant result. In 4 years time, if we were to consider the previous 20 years, the error will be smaller than it currently is when we only consider 16 years of data, and the trend will probably be statistically significant. We don’t know this for sure, but we do know that 16 years is too short a time to determine a statistically significant trend.

    I thought I would show this by considering the data that exists for the last 132 years. The data I’m using is the GISS Surface Temperature Analysis data from NASA. This gives the monthly temperature anomaly since 1880. Although I’ve written a code of my own to analyse this data, for this quick test I’ve used the Skeptical Science Trend Calculator. I’ve considered a series of 15 year intervals, starting in 1885 and progressing through to today. The values are shown in the table below. With the exception of 1985-2000, the trend in every interval is statistically insignificant. These are 2σ errors which means that there is a 95% chance that the trend lies between the (mean value – error) and the (mean value + error). Given that the error is bigger than the trend, we can’t rule out – over each of these 15 year time periods – that that the global surface temperature didn’t decrease. Even the 1985-2000 interval produces a trend that is only marginally significant.


    Therefore if, at any time, we were only to consider the previous 15 years (I know this isn’t 16, but it’s close enough) we would always conclude that there is no statistically significant warming trend. The bottom row of the above table, however, shows the warming trend from 1880-2012. It is 0.064oC per decade with an error of 0.007oC per decade. This is clearly statistically significant. Below I repeat the figure that I included in my previous post which shows the anomaly data from 1880-2012 together with the best-fit trend line. It’s very clear that there has been warming since 1880 and that the mean surface temperature is about 0.85oC warmer today than it was in 1880 (which is the same as 0.064oC per decade over a period of 13 decades).

    What I’m trying to get at here is firstly that just because something is not statistically significant, does not mean that it is not significant. It just means that we don’t yet know with any certainty. Secondly, if one is going to make claims about the significance of global warming one needs to use a time interval over which a statistically significant trend can be determined. If we only ever consider the past 16 years then the measured trend will always be statistically insignificant even if there is real, long-term, warming trend. It’s a very important issue and I’m more than happy to discuss and debate this with others. I’m just not willing to do so with those who misuse, or don’t understand, the data or the anaylsis.

    The income distribution and probability

    I haven’t posted for some time so thought I would write something short on a subject about which I feel quite strongly; income inequality. There is lots of data to suggest that the UK and the USA have become more unequal in the last few decades. This is indicated by the Gini coefficient, which has been increasing steadily. The concern I have is that many governments (the UK in particular) are becoming more conservative. The coalition government in the UK at the moment seem to be making many changes that ultimately benefit the rich and disadvantage the poor.

    The conservative viewpoint, in a simple sense, seems to be that everyone should simply pull up their socks, work harder and therefore earn more. We shouldn’t penalise those who’ve done well. The problem that I have is that the income distribution (shown in the figure below) is essentially a probability distribution. The mean is about £25000, the mode is about £16000 and the median is about £21000. Assuming everyone has the same opportunities, you then have a 50% chance of earning below £21000, your most likely salary is £16000, you have a 1 in 10 chance of earning over £45000, and a 1 in 100 chance of earning over £150000. An individual could work extremely hard, be very creative and do very well, but the entire population cannot. Unless something is done to adjust the income distribution, 10% of the population will earn below £10000, 25% will earn below £12000, and 50% will earn below £21000.

    UK Income Distribution

    An approximate income distribution graph for the UK.

    Of course, opportunities aren’t all the same. Some are disadvantaged the moment they are born and others have an immediate advantage. A person born into a poor household probably has a much smaller than 1 in 100 chance of ever earning the equivalent of £150000 or more. I guess the point I’m trying to make is that simply suggesting that everyone should simply work harder and earn more is, in itself, nonsense. If the coalition government wants to reduce public spending, they need to look at how to adjust the income distribution so that the lowest earners can still afford the basics. I don’t quite know how to do this. One way is to use taxation. Tax high salaries to discourage paying these high salaries and encourage increasing lower salaries. Alternatively, use taxation to redistribute income. The disadvantage here is that it then doesn’t encourage businesses to pay higher salaries (in a sense it is subsidising businesses and allowing them to pay salaries lower than they probably should). One could argue that it spreads the risk a little and might help to stabilise employment (i.e., we probably don’t want companies continually making people redundant when they can’t afford to pay salaries). Or, as seems to be happening, remove some of the support for lower earners and consequently encourage business to increase the lower salaries. The latter may be fine in principle, but I can’t, however, see it happening if there isn’t some support for doing so. At the moment it seems like the government wants to remove the support for the lower earners while keeping their salaries low so as to make us internationally competitive.

    Essentially, what I’m suggesting is that if people could understand that the income distribution is really a probability distribution it may encourage the government to put more effort into influencing the income distribution. If they want the lowest earners to be self-sufficient, they need to act to narrow the income distribution. If they don’t care, they can let it continue to widen and then deal with the resulting poverty.

    Distribution functions

    A discussion at work made me decide to try and finish this post about distribution functions. Essentially I think we should put more effort into teaching people about distribution functions. The reason being – in my view at least – is that distribution functions provide much more information about something than any single number can.

    An example of a distribution function is shown in the image below (I can’t remember where I got this so can’t give it an appropriate citation – apologies if it is yours). The figure could be showing the distribution of anything, but let’s imagine that it is the distribution of income. The x-axis would then be income (in pounds for example) and the y-axis would be number of people per income interval (per pound for example). It can’t simply be “number of people” on the y-axis as that doesn’t really make sense. Asking how many people have annual incomes of £16000 isn’t really a logical question (i.e., it’s possible that no one earns exactly £16000 per year). What would make more sense would be asking how many people have annual incomes between (for example) £16000 and £16500. If the y-axis is the number of people per pound interval, then to determine how many people have annual incomes between £16000 and £16001 you could simply read that off the distribution function graph. Determining how many earn between £16000 and £17000 would require determining the area under the graph in the interval between £16000 and £17000.

    There are 3 things labelled on the graph, the mean, the median, and the mode. The mean is simply the average. Add up the total amount of money paid out to people in the form of income and divide by the number of people. The median is the salary which divides the population into two equal portions; i.e., half the population earn above the median and half earn below. The mode is the peak of the distribution and is essentially (since we’re talking about income distributions) the most likely income. The mean doesn’t really tell us anything about the income distribution, since it is largely independent of the distribution. It’s the same if one person earns all the money, or if it is divided equally between everyone. It might tell us something about how wealthy we are compared to another country, but doesn’t really tell us about how income is distributed in this country. The median is more useful because it tells us the middle salary. The larger the difference between the mean and the median, the more unequal the income is distributed. Having written some of this post, I’ve partly lost track of what I was trying to say but to a certain extent it’s a sense that it’s difficult to understand the how unequal income, for example, is distributed without some understanding of distribution functions.

    Something else that really bugs me is the apparent grade inflation that’s taking place in the UK school system. Every year more people get As compared with the previous year and this is presented as a success. The problem I have is that it would seem that there must be a range of abilities for almost everything. If everyone in the world ran 100 metres as fast as they could, there would be a wide range of different times which you could plot as a distribution function. Similarly, if we consider mathematics, the abilities of those leaving school in a given year will vary. What the exam results are meant to do is reflect this variation in ability and, in a sense, should reflect reality. Of course, one could set an exam that most students would do very well at, by making it fairly easy, but what’s the point of that? We should, in my opinion at least, be setting an exam which produces results that reflect relative abilities. How can we tell who is particularly good at mathematics if more and more people get top grades? One might ask how this is done, but as far as I can tell, it’s fairly easy. I regularly set exams and generally the results are distributed in a fairly sensible way. Very few get above 90%. A reasonable number (10% of students for example) get between 70% and 90%. A large group get between 55% and 70%. A smaller number get between 40% and 55%, and some students get below 40% and are regarded as having failed. I can’t see that it would be difficult to set A-level (for examples) exams that allowed one to determine who was truly brilliant at something (grades above 90%), in the top 10% (grades between 75% and 90%), about average (grades between 60% and 75%), below average but competent (grades between 40% and 60%), largely incapable of carrying out such work (grades below 40%). This grade inflation is extremely damaging, in my opinion, as it reduces the amount of information that the exam results can tell us and suggests that these results no longer reflect reality.

    Well, this post has become somewhat rambling and doesn’t really have a particularly clear message or motivation. But, what’s the point of blogging if you can’t simply write whatever you want, even if it doesn’t really make much sense to others.

    Abdel Baset al-Megrahi

    The recent appearance of Abdel Baset al-Megrahi at a pro-Gadaffi rally in Libya has reinvigorated the debate about whether or not he should have been released from Jail on compassionate grounds. Firstly I should state that the 1988 bombing of the Pan Am flight that crashed in Lockerbie, Scotland was a truly atrocious act. All 259 people on the flight died as did 11 residents of Lockerbie.

    Abdel Baset al-Megrahi was convicted of the bombing and sentenced to life in prisonment. He was, however, released in August 2009 on compassionate grounds as he was suffering from terminal cancer and was given only 3 months to live. This was a highly controversial decision and was widely criticised. That Abdel Baset el-Megrahi is still alive (in 2011) has also cause some (including William Hague, the Foreign Secretary) to claim that the medical evidence was flawed (in fact William Hague apparently suggested it was worthless).

    I have no knowledge of whether or not the medical evidence was flawed, but this does remind me of a fantastic book by Stephen Jay Gould, called Full House, in which he discusses his own cancer diagnoses and how doctors misinterpreted the statistical data. I read the book a long time ago, so don’t remember it exactly, but my memory is as follows. Stephen Jay Gould was diagnosed with cancer and the medical staff seemed very reluctant to discuss his prognosis. He discovered that the cancer was very aggressive and that often patients would die within 8 months. However, he discovered that this was actually the median lifetime and that the distribution was very skewed. Half of patients diagnosed would die within 8 months but the other half had a reasonable chance of living for many years.

    Stephen Jay Gould was diagnosed in 1992 and died in 2002, so in fact lived for 20 years after being diagnosed despite the median lifetime being 8 months. I don’t know if this is relevant to the al-Megrahi case, but it does suggest that him still being alive does not necessarily mean that the medical diagnosis was flawed. It could well be that he has an aggressive cancer and that typically patients would die within a few months. If a few months is the median lifetime, then him still being alive after 2 years may not be that unlikely. Although the diagnosis my be correct, it could well be that the prognosis was flawed. If the medical evidence had suggested that he had a reasonable chance of living for a number of years, the decision as to whether or not to release him may have been different. I should add that I don’t really have a view as to whether or not he should have been released. I’m simply suggesting that him still being alive does not mean that there was anything wrong with the medical diagnosis.