Discover how a peculiar concept called Benford’s Law helps detect accounting and business fraud, and get an Excel spreadsheet to help with your financial sleuthing.
Take a data set – say, the lengths of rivers in England, or the value of expense claims from companies in the Dow Jones Index.
You’d expect the first digit of any value in that data set to be random. That is, there’s an equal chance of the first digit being any number between 1 and 9.
But that isn’t necessarily the case.
Although it’s counterintuitive, some numbers appear more frequently than others in many data sets. In fact, the digit 1 is the leading digit 30% of the time, 2 is the leading digit 18% of the time, 3 is the leading digit 12% of the time, and larger numbers decreasingly so.
This is known as Benford’s Law.
Many different types of data sets follow this rather strange rule, including the
- number of Twitter users by followers,
- population of cities in France,
- expense claims,
- length of rivers in South America,
- height of buildings in North America,
- and file sizes on your hard disk.
Forensic accountants and other sleuths often use Benford’s Law to detect fraud. When people “massage” numbers in an attempt to defraud, they tend to use more 8’s and 9’s as the first digit than expected from the Law. This disparity is a red flag, and is often used as a reason for more scrutiny.
Why Does Benford’s Law Work?
There’s a certain logic to Benford’s Law
- A number that begins with 1 needs to increase by 100% to become a 2
- A number that begins with 5 needs to increase by 20% to become a 6
- A number that begins with 9 needs to increase by 11% to become a 0
History of Benford’s Law
In 1881, Simon Newcomb, an astronomer and mathematician, noticed something peculiar. Whilst leafing through much-used log tables at the library, he found that earlier pages were more worn than later pages. He concluded that his fellow scientists looked up numbers beginning with the digit one more often than numbers beginning with the digit two, and so on.
He concluded that the probably of the first digit being d was
This table gives the first digit frequencies given by this equation.
In 1938 a physicist called Frank Benford came to the same conclusion independently, also by noticing the earlier pages of a book of log tables were more worn than the later pages.
He extended the analysis to the 2nd, 3rd and higher digits with the following equation, where d is any digit from 0 to 9, and k is the position of the digit.
The digital frequencies predicted by this equation are given in the table below.
Benford validated his hypothesis with many different data sets, such as the population of cities, river lengths and areas, atomic weights and much more. Again, lower numbers appeared more frequently as the leading digits than higher numbers.
Diaconis and Freedman (1979) suspected that Benford massaged his data set to better fit his conclusions. However, many validated data sets follow Benford’s first digit law.
In the 1990s, a rather inquisitive accountant, Mark Nigrini used Benford’s Law to detect accounting fraud. For example, in a court case (State of Arizona vs Wayne James Nelson, CV92-18841), a state employee wrote fraudulent checks, with the initial digits of the values having an unusually large number of 7’s, 8”s and 9’s.
Limitations and Dissenting Voices
Benford’s Law cannot be applied to
- small sample sizes and/or small variations (the data has to span several orders of magnitude)
- truly random numbers (such as lottery tickets)
- data sets with a defined minimum or maximum (e.g. the number of passengers in a plan has an obvious maximum) or inherent constraints (e.g. human heights)
- artificially generated numbers, such as telephone numbers or social security numbers
- data sets influenced by human psychology (like prices that end in .99)
Benford’s Law in Excel
Now to the Excel spreadsheet. It’s a simple enough tool to use. Simply copy and paste your data set into the indicated column.
Excel will then extract the leading digit of every number, and calculate the frequency at which it appears. You’ll also get a graph that plots the actual frequency and predicted frequency of the first digit.