This website uses cookies to enhance browsing experience. Read below to see what cookies we recommend using and choose which to allow.
By clicking Accept All, you'll allow use of all our cookies in terms of our Privacy Notice.
Essential Cookies
Analytics Cookies
Marketing Cookies
Essential Cookies
Analytics Cookies
Marketing Cookies
In 1938, Frank Benford compiled over 20,000 observations of random empirical data, ranging from areas of rivers to molecular weights of chemical compounds, cost data, address numbers, population sizes and physical constants. All the various datasets followed an exponentially diminishing distribution, where the leading significant digit was more likely to be small.
Benford’s Law, also referred to as the Law of Anomalous Numbers, holds a prominent place in statistical folklore regarding observations about the frequency distribution of leading digits in many naturally occurring numerical datasets. It describes a theoretical probability distribution wherein the number 1 appears first with a frequency of about 30%, while the number 9 appears first less than 5% of the time. The leading significant digits are not uniformly distributed, but instead follow a
particular logarithmic distribution.
A set of numbers is said to satisfy Benford’s Law if the leading digit d (d ϵ {1, ..., 9}) occurs with probability (P):
P(d) = log10(d +1) – log10(d) = log10 ((d +1)/ d) = log10 (1+(1/ d))
The leading significant digits in such a dataset are distributed as in the figure above.
Benford’s Law can apply in many empirical contexts. This review investigates whether geological assay data follow Benford’s Law and assesses the potential to detect data quality issues, or even duplicitous data, from observing patterns that deviate from the Benford curve.
Assay data from eight metallic ore deposits were randomly selected for review. To
evaluate the data, histograms of the first significant digit were abstracted and plotted in Microsoft Excel. Most of the reviewed assay data trends are in conformity with Benford’s Law, with good consistency. Failure to follow the significant-digit
Benford trend would not necessarily indicate poor or fabricated data, as not all data may behave as forecast. For example, many ore deposits are drilled and sampled
selectively, which may skew the results of the assay population.
To investigate the potential to detect data quality issues and possible fraudulent data, several series of random numbers were generated with a uniform distribution. When the random numbers were added to real data, the Benford trend was affected; however, it takes a substantial amount of artificial data, upwards of 20%, before an obvious anomaly is easily discernible. For investigation of potentially fraudulent geological data, though, a Benford analysis is likely insufficient to identify falsified assay results with any surety. However, it might be possible to identify manipulation in datasets over time in populations that previously followed the Benford trend but have now deviated.
Assay data analysis utilising Benford’s Law may be considered for investigating
data quality concerns, including finding possible repeating data transcription errors or recognising historical values that occur more frequently than anticipated by the logarithmic distribution of significant digits, such as laboratory detection limits varying over time or data variance between past project operators. While Benford’s Law should not be used in isolation, it may be a useful screening tool to indicate that a deeper QA/QC analysis is required.