The Bottom Line

Apologies to the few of you who actually read my blog. I’ve been rather ignoring it lately. Anyway, I came across a youtube video recently that I thought I would post here. I initially saw it here. It might have a bit more hyperbole than I would normally use myself, but I thought it quite compelling.

One of the reasons I found it interesting is that I was recently at a meeting where I met and spoke with another physicist who was working (in academia) as a financial modeller. It was an interesting conversation, but what struck me was that they appeared to think only in terms of the stability of the market. The idea being – I think – is that if the market is stable and optimal, then that is best for the economy. The issue that I could see is that there was no obvious link between the market and the real economy on which it was based. Or rather, there seemed to be nothing in the modelling that essentially said, this market is associated with an economy that ideally should be providing employment, products and services for a particular country. Maybe I’m wrong about this and maybe there are good reason why they do model the markets in this way. It does seem as though, typically, we do ignore many important things when deciding on the value of our markets, which is essentially the case being made in the video below.

Climate change – statistical significance

I’ve written before about the claim that there has been no warming for the last 16 years. This is often made by those who are skeptical – or deny – the role of man in climate change. As I mentioned in my earlier post, this is actually because the natural scatter in the temperature anomaly data is such that it is not possible to make a statistically significant statement about the warming trend if one considers too short a time interval.

The data that is normally used for this is the temperature anomaly. The temperature anomaly is the difference between the global surface temperature and some long-term average (typically the average from 1950-1980). This data is typically analysed using Linear Regression. This involves choosing a time interval and then fitting a straight line to the data. The gradient of this line gives the trend (normally in oC per decade) and this analysis also determines the error using the scatter in the anomaly data. What is normally quoted is the 2σ error. The 2σ error tells us that the data suggests that there is a 95% chance that the actual trend lies between (trend + error) and (trend – error). The natural scatter in the temperature anomaly data means that if the time interval considered is too short, the 2σ error can be larger than the gradient of the best fit straight line. This means that we cannot rule out (at a 95% level) that surface temperatures could have decreased in this time interval.

We are interested in knowing if the surface temperatures are rising at a rate of between 0.1 and 0.2 oC per decade. Typically we therefore need to consider time intervals of about 20 years or more if we wish to get a statistically significant result. The actual time interval required does vary with time and the figure below shows, for the last 30 years, the amount of time required for a statistically significant result. This was linked to from a comment on a Guardian article so I don’t know who to credit. Happy to give credit if someone can point put who deserves it. The figure shows that, in 1995, the previous 20 years were needed if a statistically significant result was to be obtained. In 2003, however, just over 10 years was required. Today, we need about 20 years to get a statistically significant result. The point is that at almost any time in the past 30 years, at least 10 years of data – often more – was needed to determine a statistically significant warming trend and yet noone would suggest that there has been no warming since 1980. That there has been no statistically significant warming since 1997 is therefore not really important. It’s perfectly normal and there are plenty of other periods in the last 30 years were 20 years of data was needed before a statistically significant signal was recovered. It just means we need more data. It doesn’t mean that there’s been no warming.

Figure showing the number of years of data necessary for the warming trend to be statistically significant at the 2σ level.

Figure showing the number of years of data necessary for the warming trend to be statistically significant at the 2σ level.


I’ve been working on a new research topic for a few months now. It’s not something that I’m particularly familiar with and I’ve quite enjoyed learning something new. What I’ve found interesting is – in a sense – how much I’ve learned and how much I’ve relied on things that I thought I’d forgotten. When I first started working on this problem, I stared at a set of equations without any real sense of how I would work through to get to the point where I could use them to solve the problem I was trying to solve. I then recognised something and saw how to get started. I didn’t get it right first time, but this small spark of recognition is what got me going. What stuck me was that something that I learned a long time ago and had largely forgotten, came back to me very quickly and now I feel completely comfortable using it.

It’s now been quite a few months that I’ve been working on this problem, and I haven’t quite finished but I’ve really enjoyed persevering through it. Even if it doesn’t lead to any publications, that’s fine. I now understand something about this research area that I didn’t really understand before. It’s also been nice, in a sense, going back to basics. My office is now littered with bits of paper with algebraic calculations on them. It’s not something that I’ve done for quite some time.

So why am I writing this? Well, partly to simply write something a little more positive than is the norm for me. There were two thoughts I had when working on this problem. One was that as someone who teaches, it is really good to work on something basic and fundamental and to use some of the tools I learned as a student. In a sense this is why I think it is good for active researchers to teach at this level. We use the tools we’re teaching to solve real problems and hence understand their significance. The other thing I realised was just how quickly you remember how to use the mathematical/scientific tools that you learn as a student. Students often think they learn things that they’ll never use, but you never know what tools you may need in the future and it seems remarkable how quickly you become familiar, again, with what you learned many years before. Anyway, its been fun and enjoyable working through something both basic and complex at the same time. I hope I can use what I’ve learned to do some interesting research but even if nothing comes of it, it’s still been a very interesting and useful experience.

Extremes of the distribution

This could be a very boring post (even more so than normal) and could be complete nonsense, but anyway. I was watching the final of the Australian Open tennis on Sunday and it struck me that it was quite amazing that in many of these tournaments, it quite often goes to form. Typically those in the final and semi-finals are those who were ranked highest. This may seem obvious but naively one might imagine it to be quite different. These tournaments comprise 100 or so of the best tennis players in the world. Being so much better than everyone else, you might imagine that they would be similar and that a player ranked 30th could, quite often, beat a player in the top 10. You might expect the rankings to change quite regularly. It seems, however, that this isn’t really the case. When a player reaches the top 10 they can stay there for quite some time and that even though these are some of the best tennis players in the world, a player in the top 10 will typically beat a player ranked 30.

I wondered if this wasn’t quite a nice illustration of the properties of a normal distribution. Imagine we could plot a distribution of the abilities of all tennis players. We might expect it to have a normal distribution like that shown in the figure below. There will be a few really bad tennis players, most will be average, and there will be a few very good tennis. I don’t know if this is really what it would be like, but if the sample is sufficiently large, the central limit theorem suggests that it is likely to settle to a normal distribution.

Graph showing a typical Normal Distribution, taken from the Wikipedia article about the Standard Deviation.

Graph showing a typical Normal Distribution, taken from the Wikipedia article about the Standard Deviation.

The form of the Normal Distribution is

where N is the size of the sample and σ is the standard deviation (essentially how variable the distribution is). If the distribution is very narrow (i.e., everyone’s abilities are very similar) the standard deviation is small. If the distribution is wide (i.e., the abilities are quite varied) the standard deviation would be big.

What the figure above also shows is the percentage of the sample in each standard deviation interval. For example 34.1% of the sample lie between 0 and 1σ and 13.6% lie between 1σ and 2σ. The table below shows the percentage for each interval up to 6σ
In the above I’ve only considered the intervals above the mean. If one was presenting some data analysis with errors, the error would normally be some number of standard deviations and would tell you how significant the result is. For example 1σ errors would tell you that there was a 68.2% chance that the result lies in the reported range (i.e., 34.1% times 2 since you’re considering the region on either side of the most likely value). If you report a 5σ error (as for the Higgs Boson result) this means that there is a 99.99994% chance of the actual value lying within the reported range (i.e., 100 – 0.00003×2) – although in this case it may actually be that there is a 99.99994% chance that the signal is real, rather than an extremely unlikely noise spike.

Where am I going with this? There appear to be a few tennis players in the world who will typically beat almost anyone else almost all the time. If I assume that there are 10 million active tennis players in the world (I have no idea if this is a reasonable number or not) then the table above would suggest that only 0.00003% of them would have abilities 5-6σ better than the average. This means that there would only typically be 3 players who have this level of ability (i.e., 0.00003/100×10000000). Essentially, if tennis ability is normally distributed you would actually expect there to be only a few players in the world who are significantly better than the average. So, maybe this makes perfect sense. As you consider the extremes, there are fewer and fewer players and if you have a big enough sample (and given that many play tennis, the sample is probably quite large) you have a chance of finding a small number who are so much better than the rest that they would typically beat almost anyone else. Alternatively, this is all nonsense and I have no idea what I’m talking about.

Distribution functions

A discussion at work made me decide to try and finish this post about distribution functions. Essentially I think we should put more effort into teaching people about distribution functions. The reason being – in my view at least – is that distribution functions provide much more information about something than any single number can.

An example of a distribution function is shown in the image below (I can’t remember where I got this so can’t give it an appropriate citation – apologies if it is yours). The figure could be showing the distribution of anything, but let’s imagine that it is the distribution of income. The x-axis would then be income (in pounds for example) and the y-axis would be number of people per income interval (per pound for example). It can’t simply be “number of people” on the y-axis as that doesn’t really make sense. Asking how many people have annual incomes of £16000 isn’t really a logical question (i.e., it’s possible that no one earns exactly £16000 per year). What would make more sense would be asking how many people have annual incomes between (for example) £16000 and £16500. If the y-axis is the number of people per pound interval, then to determine how many people have annual incomes between £16000 and £16001 you could simply read that off the distribution function graph. Determining how many earn between £16000 and £17000 would require determining the area under the graph in the interval between £16000 and £17000.

There are 3 things labelled on the graph, the mean, the median, and the mode. The mean is simply the average. Add up the total amount of money paid out to people in the form of income and divide by the number of people. The median is the salary which divides the population into two equal portions; i.e., half the population earn above the median and half earn below. The mode is the peak of the distribution and is essentially (since we’re talking about income distributions) the most likely income. The mean doesn’t really tell us anything about the income distribution, since it is largely independent of the distribution. It’s the same if one person earns all the money, or if it is divided equally between everyone. It might tell us something about how wealthy we are compared to another country, but doesn’t really tell us about how income is distributed in this country. The median is more useful because it tells us the middle salary. The larger the difference between the mean and the median, the more unequal the income is distributed. Having written some of this post, I’ve partly lost track of what I was trying to say but to a certain extent it’s a sense that it’s difficult to understand the how unequal income, for example, is distributed without some understanding of distribution functions.

Something else that really bugs me is the apparent grade inflation that’s taking place in the UK school system. Every year more people get As compared with the previous year and this is presented as a success. The problem I have is that it would seem that there must be a range of abilities for almost everything. If everyone in the world ran 100 metres as fast as they could, there would be a wide range of different times which you could plot as a distribution function. Similarly, if we consider mathematics, the abilities of those leaving school in a given year will vary. What the exam results are meant to do is reflect this variation in ability and, in a sense, should reflect reality. Of course, one could set an exam that most students would do very well at, by making it fairly easy, but what’s the point of that? We should, in my opinion at least, be setting an exam which produces results that reflect relative abilities. How can we tell who is particularly good at mathematics if more and more people get top grades? One might ask how this is done, but as far as I can tell, it’s fairly easy. I regularly set exams and generally the results are distributed in a fairly sensible way. Very few get above 90%. A reasonable number (10% of students for example) get between 70% and 90%. A large group get between 55% and 70%. A smaller number get between 40% and 55%, and some students get below 40% and are regarded as having failed. I can’t see that it would be difficult to set A-level (for examples) exams that allowed one to determine who was truly brilliant at something (grades above 90%), in the top 10% (grades between 75% and 90%), about average (grades between 60% and 75%), below average but competent (grades between 40% and 60%), largely incapable of carrying out such work (grades below 40%). This grade inflation is extremely damaging, in my opinion, as it reduces the amount of information that the exam results can tell us and suggests that these results no longer reflect reality.

Well, this post has become somewhat rambling and doesn’t really have a particularly clear message or motivation. But, what’s the point of blogging if you can’t simply write whatever you want, even if it doesn’t really make much sense to others.

A probability question!

After writing yesterday’s post about probabilities and statistics I remembered a particular question that I would often ask people to illustrate how easy it is to get probabilities wrong. Since I’ve never done a poll before, I thought I would try one and post it here. Very few people are reading this, so I may well get no responses. This question also happens to actually be true for me, but that isn’t really relevant.

False positives

I have always been quite interested in situations in which people (myself included) misunderstand probabilities. The classic is the Monty Hall problem. There are 3 doors, behind one of which is a prize. You choose a door and the quiz master then opens one of the other two doors, but always opens one that does not have a prize. You’re then given the option of changing from your first choice to the remaining closed door. Should you switch? The answer (in terms of probability) is yes. The reason is that if you have chosen a door without a prize, the quiz master then opens the other door without a prize and the remaining door has the prize. Therefore if you switch you win. The only time you don’t win is if your initial choice had the prize. However, since two doors don’t have prizes you would expect that 2 times out of 3, your first choice door wouldn’t have a prize and hence if you switch you win 2 times out of 3.

What has taken me quite some time to understand is the issue of false positives. I think I first encountered this in Simon Singh’s book, Fermat’s Last Theorem , but also read about it in an article yesterday. The basic issue is, if you have a test for a disease (for example) that is 90% accurate, what should you tell those who test positive?

Imagine a situation in which you’re testing for a disease and that you expect (or know) that 50% of those tested actually have the disease. Consider a sample of 100 people. The test is 90% accurate. Of the 50 who actually have the disease, 45 test positive and 5 test negative. Of the 50 who do not, 45 test negative and 5 test positive. Overall, the test has resulted in 50 positives and 50 negatives. Of those who tested positive 45 do have the disease and 5 don’t. Of the 50 who tested negative 45 don’t and 5 do. You would therefore tell those who tested positive that there is a 90% chance that the result is correct, and to those who tested negative you might say there is a 10% chance that they have the disease (or 90% chance that they don’t).

Seems fairly straightforward. It becomes less so if the you’re testing for something that is rare. Imagine you’re testing for a disease that only 1% of those tested will have. If 1000 people are tested then 990 do not have the disease and 10 do. If the test is 90% accurate, then 891 of those who don’t have the disease test negative and 99 test positive. Of the 10 that do have the disease 9 test positive and 1 tests negative. The test has then resulted in 892 negatives and 108 positives. Of the 108 positives, only 9 actually have the disease so those who tested positive actually have a less than 10% chance of actually having the disease. Of the 892 who tested negative only 1 actually has the disease. As I write this, I worry that I have it slightly wrong (I was expecting 900 negative and 100 positives) but I think it is essentially correct. The bottom line is that a test for something that is rare is unlikely to be 100 % accurate and so consequently the negative results may be very accurate (i.e., only 1 wrong out of 892 above) but there will be a large number of false positives (99 out of 108). You know that most (90% in this case) of those who are positive have tested positive, but you also know that a large fraction of those who tested positive are actually negative. This is the reason why they are referred to as false positives.

I partly wrote this because I just find this interesting (and I hope my explanation is correct) but it is also illustrates how important it is to have a reasonable understanding of probability and statistics as many decisions can be made based on an incorrect understanding of what the “numbers” are actually telling you.