A discussion at work made me decide to try and finish this post about distribution functions. Essentially I think we should put more effort into teaching people about distribution functions. The reason being – in my view at least – is that distribution functions provide much more information about something than any single number can.

An example of a distribution function is shown in the image below (I can’t remember where I got this so can’t give it an appropriate citation – apologies if it is yours). The figure could be showing the distribution of anything, but let’s imagine that it is the distribution of income. The x-axis would then be income (in pounds for example) and the y-axis would be number of people per income interval (per pound for example). It can’t simply be “number of people” on the y-axis as that doesn’t really make sense. Asking how many people have annual incomes of £16000 isn’t really a logical question (i.e., it’s possible that no one earns exactly £16000 per year). What would make more sense would be asking how many people have annual incomes between (for example) £16000 and £16500. If the y-axis is the number of people per pound interval, then to determine how many people have annual incomes between £16000 and £16001 you could simply read that off the distribution function graph. Determining how many earn between £16000 and £17000 would require determining the area under the graph in the interval between £16000 and £17000.

There are 3 things labelled on the graph, the mean, the median, and the mode. The mean is simply the average. Add up the total amount of money paid out to people in the form of income and divide by the number of people. The median is the salary which divides the population into two equal portions; i.e., half the population earn above the median and half earn below. The mode is the peak of the distribution and is essentially (since we’re talking about income distributions) the most likely income. The mean doesn’t really tell us anything about the income distribution, since it is largely independent of the distribution. It’s the same if one person earns all the money, or if it is divided equally between everyone. It might tell us something about how wealthy we are compared to another country, but doesn’t really tell us about how income is distributed in this country. The median is more useful because it tells us the middle salary. The larger the difference between the mean and the median, the more unequal the income is distributed. Having written some of this post, I’ve partly lost track of what I was trying to say but to a certain extent it’s a sense that it’s difficult to understand the how unequal income, for example, is distributed without some understanding of distribution functions.

Something else that really bugs me is the apparent grade inflation that’s taking place in the UK school system. Every year more people get As compared with the previous year and this is presented as a success. The problem I have is that it would seem that there must be a range of abilities for almost everything. If everyone in the world ran 100 metres as fast as they could, there would be a wide range of different times which you could plot as a distribution function. Similarly, if we consider mathematics, the abilities of those leaving school in a given year will vary. What the exam results are meant to do is reflect this variation in ability and, in a sense, should reflect reality. Of course, one could set an exam that most students would do very well at, by making it fairly easy, but what’s the point of that? We should, in my opinion at least, be setting an exam which produces results that reflect relative abilities. How can we tell who is particularly good at mathematics if more and more people get top grades? One might ask how this is done, but as far as I can tell, it’s fairly easy. I regularly set exams and generally the results are distributed in a fairly sensible way. Very few get above 90%. A reasonable number (10% of students for example) get between 70% and 90%. A large group get between 55% and 70%. A smaller number get between 40% and 55%, and some students get below 40% and are regarded as having failed. I can’t see that it would be difficult to set A-level (for examples) exams that allowed one to determine who was truly brilliant at something (grades above 90%), in the top 10% (grades between 75% and 90%), about average (grades between 60% and 75%), below average but competent (grades between 40% and 60%), largely incapable of carrying out such work (grades below 40%). This grade inflation is extremely damaging, in my opinion, as it reduces the amount of information that the exam results can tell us and suggests that these results no longer reflect reality.

Well, this post has become somewhat rambling and doesn’t really have a particularly clear message or motivation. But, what’s the point of blogging if you can’t simply write whatever you want, even if it doesn’t really make much sense to others.