I have always been quite interested in situations in which people (myself included) misunderstand probabilities. The classic is the Monty Hall problem. There are 3 doors, behind one of which is a prize. You choose a door and the quiz master then opens one of the other two doors, but always opens one that does not have a prize. You’re then given the option of changing from your first choice to the remaining closed door. Should you switch? The answer (in terms of probability) is yes. The reason is that if you have chosen a door without a prize, the quiz master then opens the other door without a prize and the remaining door has the prize. Therefore if you switch you win. The only time you don’t win is if your initial choice had the prize. However, since two doors don’t have prizes you would expect that 2 times out of 3, your first choice door wouldn’t have a prize and hence if you switch you win 2 times out of 3.
What has taken me quite some time to understand is the issue of false positives. I think I first encountered this in Simon Singh’s book, Fermat’s Last Theorem , but also read about it in an article yesterday. The basic issue is, if you have a test for a disease (for example) that is 90% accurate, what should you tell those who test positive?
Imagine a situation in which you’re testing for a disease and that you expect (or know) that 50% of those tested actually have the disease. Consider a sample of 100 people. The test is 90% accurate. Of the 50 who actually have the disease, 45 test positive and 5 test negative. Of the 50 who do not, 45 test negative and 5 test positive. Overall, the test has resulted in 50 positives and 50 negatives. Of those who tested positive 45 do have the disease and 5 don’t. Of the 50 who tested negative 45 don’t and 5 do. You would therefore tell those who tested positive that there is a 90% chance that the result is correct, and to those who tested negative you might say there is a 10% chance that they have the disease (or 90% chance that they don’t).
Seems fairly straightforward. It becomes less so if the you’re testing for something that is rare. Imagine you’re testing for a disease that only 1% of those tested will have. If 1000 people are tested then 990 do not have the disease and 10 do. If the test is 90% accurate, then 891 of those who don’t have the disease test negative and 99 test positive. Of the 10 that do have the disease 9 test positive and 1 tests negative. The test has then resulted in 892 negatives and 108 positives. Of the 108 positives, only 9 actually have the disease so those who tested positive actually have a less than 10% chance of actually having the disease. Of the 892 who tested negative only 1 actually has the disease. As I write this, I worry that I have it slightly wrong (I was expecting 900 negative and 100 positives) but I think it is essentially correct. The bottom line is that a test for something that is rare is unlikely to be 100 % accurate and so consequently the negative results may be very accurate (i.e., only 1 wrong out of 892 above) but there will be a large number of false positives (99 out of 108). You know that most (90% in this case) of those who are positive have tested positive, but you also know that a large fraction of those who tested positive are actually negative. This is the reason why they are referred to as false positives.
I partly wrote this because I just find this interesting (and I hope my explanation is correct) but it is also illustrates how important it is to have a reasonable understanding of probability and statistics as many decisions can be made based on an incorrect understanding of what the “numbers” are actually telling you.