Benford's Law

The law governing the distribution of the first (most-significant, leftmost) digit of numbers arising from an unbounded distribution (i.e. it works for lengths of rivers, but not for phone numbers). Contrary to initial expectations, the leftmost digit is a `1' in more than 30% of the numbers, and `9' in less than 4.6%! The actual fraction of numbers starting in digit d is log10(d+1)-log10(d) (note that no numbers start with a `0'). For different bases, change the base of the logarithm; bases like 1000 are relevant to the decimal case, too.

What's going on? Well, suppose a distribution actually exists for numbers like the lengths of rivers. Clearly it can't be related to the units used to measure the length, so multiplying by a constant can't change matters. So it has to be logarithmic.

If you know any probability, you realise that the above paragraph is meaningless! There is no unbounded uniform distribution, which would seem to work best with the argument. And you cannot prove anything about the distribution if it doesn't exist! Nonetheless, people have managed to prove some versions of Benford's Law, by making "reasonable" assumptions. And empirically, it works!

This is the law:
1---------------2---------3-------4-----5----6---7--8--9
Or, the probability of the first digit in a number in statistical data being a "d" is
                  log_10( 1+ (1/d) )
The law was first discovered by a mathematician named Simon Newcomb in 1881, when he discovered that in book with logarithmic tables, the pages with lower numbers were more worn than the pages with higher numbers. But it took until 1938 when Frank Benford published the results from analysis of more than 20000 numbers from various sources; price lists, electrical bills and street addresses.

In 1996 professor Ted Hill suggested an explanation of this phenomenon. The law applies to large amounts of numbers that are derived from somewhere else, not on random numbers. If we take several sources of random numbers - which are all each distributed to a normal distribution or some other random distribution - and combine these numbers, the would be a distribution of distributions. This distribution is Benford's law, and it has most numbers in the lower part of the range.

Benford's law is commonly used by accounting firms and others who work with large amounts of numbers. If the numbers aren't tampered with, the first digit should be 1 in 30.1% of the numbers and 9 in only 4.6% of them. If the distribution is more even, then something is probably wrong... Of course, some numbers are by nature more common than others, such as amounts of $24, which happens to be the largest amount you can expense report in America, without having a receipt.

Another slightly counterintuitive fact from probability statistics is that improbable results do occur more often that you'd think. Theodore Hill used to give his students the following homework: Flip a coin 200 times and write down the results. Many of the students got tired after 20 flips, and just wrote down made-up results, that they thought would seem probable. The thing is, in a series of 200 flips of a coin, it is highly probable that there will be a series of 6 heads (or tails) in a row. The students that cheated rarely had more than 3 of the same in a row.


Graphic by Kevin Brown, http://www.seanet.com/~ksbrown/index.htm

Benford's Law makes wonderful sense in situations of exponential growth.

Imagine you put $100.00 in a savings account that earns you 10% interest a year. At the end of the first year you have $110.00, at the end of the second you'll have $121.00, and the third will leave you with $133.10. The leading digit will remain a one until the eighth year (at which point you'll have $214.35). Two will be the leading digit for the next four years (at the end of which you'll have $313.84). Three more years will get you into the four hundreds (with $417.72), but you'll reach the five hundreds only two years after that. The more money you have in your account, the less time you'll spend with any particular leading digit. That is, until you've more then $1000.00 in your account. At this point it will again take you eight (or so) years to get to $2000.00, four to get to $3000.00, three to $4000.00, et cetera. A similar ratio from year to year will be present regardless of how high the interest rate is.

It is thus makes perfect sense that a "random" sampling of saving account balances will have about twice as many ones as their leading digits as twos, since the average account will spend almost twice as much time with a one as its leading digit. If we calculated the above for continuously compounded interest the numbers would match those predicted by Benford's law even more closely.

Thus we expect things like the size of cities and the price of stocks to follow Benford's Law, since both also grow exponentially. What's freaky is how many unexpected things also follow the law; apparently logarithmic scales are more popular in nature and society then we might think.

Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.