base rate (thing) by enkidu

The base rate is the rate at which a phenomenon occurs without any special intervention. The failure to consider base rates is at the root of many logical fallacies and failures of intuitive statistics. Why might this matter? Consider:

Our new insect repellent makes 99.99% of all houses insect-free!

Our new zombie repellent makes 99.99% of all houses zombie-free!

Which of these products is more effective? We have to look at the base rate. Quite a few houses -- let's say 20% -- have insect infestations. But only about 0.1% have zombies. So it's clear that while the number of pest-free houses is equal, the insect repellent does a lot more to actually fix things. The zombie-repellent has less work to do because more houses are already zombie-free. (we won't argue about which one is more important).

The base rate is also significant when we're trying to figure out whether a condition or a situation exists -- say, when we're testing for an infection. To look at the issue more completely, let's divide cases into four groups, based on this handy matrix:


       I          NI     
   _____________________
  |         |           |
  |  "hit"  |  "false   |  I = Infected
+ |         |  alarm"   |  NI = Not Infected
  |         |           |  + = Test is positive (says there's an infection)
  -----------------------  - = Test is negative (says there's no infection)
  |         |           |
  | "miss"  | "correct  |
- |         | rejection"|
  |         |           |
   ---------------------

A good test will be very accurate (lots of hits and correct rejections) and avoid incorrect diagnoses (few misses and false alarms). But the lower the base rate of the infection, the higher the test's accuracy has to be. Imagine that the test is 98% accurate (sounds pretty good), and that 50% of people have the infection (the base rate). Out of a thousand people, you'd expect to get:

490 hits (infected people diagnosed with the infection)
490 correct rejections (uninfected people not diagnosed with the infection)
10 misses (sick people not diagnosed)
10 false alarms (uninfected people diagnosed)

But what if only 0.1% of people have the infection?

1 hit
979 correct rejections
0 misses
20 false alarms

Our 98% accurate test is generating far more false alarms that correct diagnoses! The accuracy didn't change, but now we're testing it on many more well people than sick people. Thus, the base rate problem becomes very relevant when your doctor calls you to say that you've tested positive for a rare form of cancer.

Can we convict a person on the basis of a lie detector? Can we deny someone a job because a personality test indicates that they are prone to stealing? Can we take a student out of the running for scholarships because they fail an academic test? In all these cases, we have to consider not just the test's ability to catch bad things (liars, thieves, slackers) but also how many innocent people it will catch in the process.

The Lottery Fallacy	Riemann Hypothesis	autoerotic defenestration	The Sure Fire Way to Win the Lottery
Matrix	doctor