Distribution functions

A discussion at work made me decide to try and finish this post about distribution functions. Essentially I think we should put more effort into teaching people about distribution functions. The reason being – in my view at least – is that distribution functions provide much more information about something than any single number can.

An example of a distribution function is shown in the image below (I can’t remember where I got this so can’t give it an appropriate citation – apologies if it is yours). The figure could be showing the distribution of anything, but let’s imagine that it is the distribution of income. The x-axis would then be income (in pounds for example) and the y-axis would be number of people per income interval (per pound for example). It can’t simply be “number of people” on the y-axis as that doesn’t really make sense. Asking how many people have annual incomes of £16000 isn’t really a logical question (i.e., it’s possible that no one earns exactly £16000 per year). What would make more sense would be asking how many people have annual incomes between (for example) £16000 and £16500. If the y-axis is the number of people per pound interval, then to determine how many people have annual incomes between £16000 and £16001 you could simply read that off the distribution function graph. Determining how many earn between £16000 and £17000 would require determining the area under the graph in the interval between £16000 and £17000.

There are 3 things labelled on the graph, the mean, the median, and the mode. The mean is simply the average. Add up the total amount of money paid out to people in the form of income and divide by the number of people. The median is the salary which divides the population into two equal portions; i.e., half the population earn above the median and half earn below. The mode is the peak of the distribution and is essentially (since we’re talking about income distributions) the most likely income. The mean doesn’t really tell us anything about the income distribution, since it is largely independent of the distribution. It’s the same if one person earns all the money, or if it is divided equally between everyone. It might tell us something about how wealthy we are compared to another country, but doesn’t really tell us about how income is distributed in this country. The median is more useful because it tells us the middle salary. The larger the difference between the mean and the median, the more unequal the income is distributed. Having written some of this post, I’ve partly lost track of what I was trying to say but to a certain extent it’s a sense that it’s difficult to understand the how unequal income, for example, is distributed without some understanding of distribution functions.

Something else that really bugs me is the apparent grade inflation that’s taking place in the UK school system. Every year more people get As compared with the previous year and this is presented as a success. The problem I have is that it would seem that there must be a range of abilities for almost everything. If everyone in the world ran 100 metres as fast as they could, there would be a wide range of different times which you could plot as a distribution function. Similarly, if we consider mathematics, the abilities of those leaving school in a given year will vary. What the exam results are meant to do is reflect this variation in ability and, in a sense, should reflect reality. Of course, one could set an exam that most students would do very well at, by making it fairly easy, but what’s the point of that? We should, in my opinion at least, be setting an exam which produces results that reflect relative abilities. How can we tell who is particularly good at mathematics if more and more people get top grades? One might ask how this is done, but as far as I can tell, it’s fairly easy. I regularly set exams and generally the results are distributed in a fairly sensible way. Very few get above 90%. A reasonable number (10% of students for example) get between 70% and 90%. A large group get between 55% and 70%. A smaller number get between 40% and 55%, and some students get below 40% and are regarded as having failed. I can’t see that it would be difficult to set A-level (for examples) exams that allowed one to determine who was truly brilliant at something (grades above 90%), in the top 10% (grades between 75% and 90%), about average (grades between 60% and 75%), below average but competent (grades between 40% and 60%), largely incapable of carrying out such work (grades below 40%). This grade inflation is extremely damaging, in my opinion, as it reduces the amount of information that the exam results can tell us and suggests that these results no longer reflect reality.

Well, this post has become somewhat rambling and doesn’t really have a particularly clear message or motivation. But, what’s the point of blogging if you can’t simply write whatever you want, even if it doesn’t really make much sense to others.

Rutherford Fellowship reviewing

I’ve just finished reviewing a couple of Rutherford Fellowships. Obviously I can’t give any details. They were both good, but I suspect that the current round is going to be extremely competitive. One of the changes this year was that rather than having many reviewers each reviewing one proposal, there was a smaller pool of reviewers each reviewing a number of proposals. This seems quite a sensible change, partly because the reviewers can maybe get a better sense of the standard, but also because STFC contacted reviewers a while ago to be part of this pool, and so it feels a bit more like you’re formally part of the process and, hopefully, encourages reviewers to take this seriously.

What I found a little unfortunate about the review process (and this doesn’t just apply to the Rutherford Fellowships) is how much emphasis appears to be put on the “quality” and “suitability” of the candidate. I suspect I’m in a minority here, but my personal view is that one should review the proposal and it’s merits without necessarily considering the applicant. Once the quality of the proposal has been assessed, one can then assess whether or not the applicant is likely be able to do the work and, if so, do they also have a record of timely publications and papers that have been well received. Although I have never sat on a UK panel, my feeling is that the perceived quality of the candidate trumps any deficiencies in the proposal. I’m not suggesting that a poor proposal would get funded, but that a candidate perceived to be strong would get funded with an okay proposal over someone who wrote a fantastic proposal but is perceived to be not of as high a calibre.

Many would probably argue that funding strong candidates is ultimately what it is all about, and there is a lot of merit in this view. Funding those with a proven track record is pretty safe and it typically will produce good science. It does, however, feel a bit risk-free. We should, in my view at least, be aiming to fund excellent and potentially exciting science, and that will generally require taking some risks. Funding, for example, the person with the excellent proposal who hasn’t yet proven themselves over the person who has a good track record but didn’t write a particularly good proposal. I should acknowledge that I don’t really believe that the process doesn’t try very hard to get a decent balance, or that the panels simply chose those with the strongest track record. It is simply a sense (that without having sat on a panel I can’t really confirm) that the applicant’s track record plays a much more significant role in their likely success than I feel is appropriate.