The
base rate is the rate at which a
phenomenon occurs without any special
intervention. The failure to consider base rates is at the root of many
logical fallacies and
failures of intuitive statistics. Why might this matter? Consider:
Our new insect repellent makes 99.99% of all houses insect-free!
Our new zombie repellent makes 99.99% of all houses zombie-free!
Which of these products is more effective? We have to look at the
base rate. Quite a few houses -- let's say 20% -- have insect infestations. But only about 0.1% have zombies. So it's clear that while the number of pest-free houses is equal, the insect repellent does a lot more to actually fix things. The zombie-repellent has less work to do because more houses are
already zombie-free. (we won't argue about which one is more important).
The base rate is also significant when we're trying to figure out whether a condition or a situation exists -- say, when we're testing for an infection. To look at the issue more completely, let's divide cases into four groups, based on this handy
matrix:
I NI
_____________________
| | |
| "hit" | "false | I = Infected
+ | | alarm" | NI = Not Infected
| | | + = Test is positive (says there's an infection)
----------------------- - = Test is negative (says there's no infection)
| | |
| "miss" | "correct |
- | | rejection"|
| | |
---------------------
A good test will be very accurate (lots of hits and correct rejections) and avoid incorrect diagnoses (few misses and false alarms). But the lower the base rate of the infection, the higher the test's accuracy has to be. Imagine that the test is 98% accurate (sounds pretty good), and that 50% of people have the infection (the base rate). Out of a thousand people, you'd expect to get:
490 hits (infected people diagnosed with the infection)
490 correct rejections (uninfected people not diagnosed with the infection)
10 misses (sick people not diagnosed)
10 false alarms (uninfected people diagnosed)
But what if only 0.1% of people have the infection?
1 hit
979 correct rejections
0 misses
20 false alarms
Our 98% accurate test is generating far more
false alarms that correct diagnoses! The accuracy didn't change, but now we're testing it on many more well people than sick people. Thus, the base rate problem becomes
very relevant when your
doctor calls you to say that you've tested positive for a
rare form of
cancer.
Can we
convict a person on the basis of a
lie detector? Can we deny someone a job because a
personality test indicates that they are
prone to stealing? Can we take a student out of the running for
scholarships because they fail an
academic test? In all these cases, we have to consider not just the test's ability to catch
bad things (
liars,
thieves,
slackers) but also how many
innocent people it will
catch in the process.