A simple and quick general test to screen for numerical anomalies is presented. It can be applied, for example, to electoral processes, both electronic and manual. It uses vote counts in officially ...published voting units, which are typically widely available and institutionally backed. The test examines the frequencies of digits on voting counts and rests on the First (NBL1) and Second Digit Newcomb—Benford Law (NBL2), and in a novel generalization of the law under restrictions of the maximum number of voters per unit (RNBL2). We apply the test to the 2004 USA presidential elections, the Puerto Rico (1996, 2000 and 2004) governor elections, the 2004 Venezuelan presidential recall referendum (RRP) and the previous 2000 Venezuelan Presidential election. The NBL2 is compellingly rejected only in the Venezuelan referendum and only for electronic voting units. Our original suggestion on the RRP (Pericchi and Torres, 2004) was criticized by The Carter Center report (2005). Acknowledging this, Mebane (2006) and The Economist (US) (2007) presented voting models and case studies in favor of NBL2. Further evidence is presented here. Moreover, under the RNBL2, Mebane's voting models are valid under wider conditions. The adequacy of the law is assessed through Bayes Factors (and corrections of p-values) instead of significance testing, since for large sample sizes and fixed α levels the null hypothesis is over rejected. Our tests are extremely simple and can become a standard screening that a fair electoral process should pass.
How Many Digits are Needed? Herbst, Ira W.; Møller, Jesper; Svane, Anne Marie
Methodology and computing in applied probability,
03/2024, Volume:
26, Issue:
1
Journal Article
Peer reviewed
Open access
Let
X
1
,
X
2
,
.
.
.
be the digits in the base-
q
expansion of a random variable
X
defined on 0, 1) where
q
≥
2
is an integer. For
n
=
1
,
2
,
.
.
.
, we study the probability distribution
P
n
of ...the (scaled) remainder
T
n
(
X
)
=
∑
k
=
n
+
1
∞
X
k
q
n
-
k
: If
X
has an absolutely continuous CDF then
P
n
converges in the total variation metric to the Lebesgue measure
μ
on the unit interval. Under weak smoothness conditions we establish first a coupling between
X
and a non-negative integer valued random variable
N
so that
T
N
(
X
)
follows
μ
and is independent of
(
X
1
,
.
.
.
,
X
N
)
, and second exponentially fast convergence of
P
n
and its PDF
f
n
. We discuss how many digits are needed and show examples of our results.
Tourism data is crucial for effective tourism management since it enables national and local authorities to shape public policies in tourism and also enables the tourism industry to make appropriate ...business decisions. In 2016 new tourism data information system, called eVisitor, was introduced in Croatia, and this new system significantly eased collection and data processing. The aim of this paper is to assess the quality of statistical data on tourist traffic and to determine whether the technical improvement of the data collection system, which eased reporting on tourist traffic to information providers, contributed to the quality of collected data. This is done by applying Benford’s distribution of first digits, i.e. Benford’s law, to the collected data. Benford’s law is based on the thesis that the first digits in numbers are not uniformly distributed and gives an expected pattern of numbers in the tabular data. Data that is not manipulated, accidentally or intentionally, should follow Benford’s distribution of first digits, and deviations from Benford’s distribution indicate that the data is compromised in some way. The conducted analysis has shown that the introduction of a new user-friendly data system did not affect the quality of collected data, but that the origin of the tourists was more important: data for domestic tourists have shown a statistically significant deviation from the expected Benford’s distribution, so it can be concluded that their quality is lower than the data for foreign tourists.
First Digits' Shannon Entropy Kreiner, Welf Alfred
Entropy (Basel, Switzerland),
10/2022, Volume:
24, Issue:
10
Journal Article
Peer reviewed
Open access
Related to the letters of an alphabet, entropy means the average number of binary digits required for the transmission of one character. Checking tables of statistical data, one finds that, in the ...first position of the numbers, the digits 1 to 9 occur with different frequencies. Correspondingly, from these probabilities, a value for the Shannon entropy H can be determined as well. Although in many cases, the Newcomb-Benford Law applies, distributions have been found where the 1 in the first position occurs up to more than 40 times as frequently as the 9. In this case, the probability of the occurrence of a particular first digit can be derived from a power function with a negative exponent
> 1. While the entropy of the first digits following an NB distribution amounts to H = 2.88, for other data distributions (diameters of craters on Venus or the weight of fragments of crushed minerals), entropy values of 2.76 and 2.04 bits per digit have been found.
Claims of inconsistency in epidemiological data have emerged for both developed and developing countries during the COVID-19 pandemic.
In this paper, we apply first-digit Newcomb-Benford Law (NBL) ...and Kullback-Leibler Divergence (KLD) to evaluate COVID-19 records reliability in all 20 Latin American countries. We replicate country-level aggregate information from Our World in Data.
We find that official reports do not follow NBL's theoretical expectations (n = 978; chi-square = 78.95; KS = 4.33, MD = 2.18; mantissa = .54; MAD = .02; DF = 12.75). KLD estimates indicate high divergence among countries, including some outliers.
This paper provides evidence that recorded COVID-19 cases in Latin America do not conform overall to NBL, which is a useful tool for detecting data manipulation. Our study suggests that further investigations should be made into surveillance systems that exhibit higher deviation from the theoretical distribution and divergence from other similar countries.
The Benford law applied within complex networks is an interesting area of research. This paper proposes a new algorithm for the generation of a Benford network based on priority rank, and further ...specifies the formal definition. The condition to be taken into account is the probability density of the node degree. In addition to this first algorithm, an iterative algorithm is proposed based on rewiring. Its development requires the introduction of an ad hoc measure for understanding how far an arbitrary network is from a Benford network. The definition is a semi-distance and does not lead to a distance in mathematical terms, instead serving to identify the Benford network as a class. The semi-distance is a function of the network; it is computationally less expensive than the degree of conformity and serves to set a descent condition for the rewiring. The algorithm stops when it meets the condition that either the network is Benford or the maximum number of iterations is reached. The second condition is needed because only a limited set of densities allow for a Benford network. Another important topic is assortativity and the extremes which can be achieved by constraining the network topology; for this reason, we ran simulations on artificial networks and explored further theoretical settings as preliminary work on models of preferential attachment. Based on our extensive analysis, the first proposed algorithm remains the best one from a computational point of view.
PurposeLebanon has faced one of the most severe financial and economic crises since the end of 2019. The practices of the Lebanese banks are blamed for dangerously exposing economic agents and ...precipitating the current financial collapse. This paper examines the patterns of manipulation of the 10 biggest banks before and after implementing the financial engineering mechanism.Design/methodology/approachThe authors apply Benford law for the first and second positions of the reports of condition and income and four out of the six aspects of the CAMELS rating system (Capital Adequacy, Assets Quality, Management expertise, Earnings Strength, Liquidity and Sensitivity to the market) by excluding Management and Sensitivity. The deviations from BL frequencies are tested using Z-statistic and Chi-square tests.FindingsBanks seem to have manipulated their Capital Adequacy, Liquidity and Assets Quality in the pre-financial engineering and considerably in the post-financial engineering periods. Fraudulent manipulations in the banking sector can distort depositors, shareholders and regulating authorities.Research limitations/implicationsThis study has many implications for governmental authorities, commercial banks, depositors, businesses, accounting and auditing firms, and policymakers. The Lebanese government needs to implement corrective fiscal and monetary policies and apply amendments to the bank secrecy and capital control law. The central bank should revamp its organizational structure, improve its disclosure practices and significantly reduce its ties to the government and the political elite.Practical implicationsThe study findings suggest that the central bank should revamp its organizational structure, improve its disclosure practices and significantly reduce its ties to the government and the political elite.Originality/valueThe study is the first to examine the patterns of fraudulent manipulation in the Lebanese banking industry using Benford Law (BL).
Denial of Service attacks and the distributed variant of this type of attack called DDoS are attack types which are easy to start but hard to stop especially in the DDoS case. The significance of ...this type of attack is that attackers use a large number of packets usually created with programs and scripts for creating specially crafted types of packets for different types of attack such as SYN flood, ICMP smurf, etc. These packets have similar or identical attributes such as length of packets, interval time, destination port, TCP flags etc. Skilled engineers and researchers use these packet attributes as indicators to detect anomalous packets in network traffic. For fast detection of anomalous packets in legitimate traffic we proposed Interactive Data Extraction and Analysis with Newcombe-Benford power law which is able to detect matching first occurrences of leading digits – size of each packet that indicate usage of automated scripts for attack purposes. Power law can be used to detect the same first two, three, or second digits, last one or two digits in data set etc. We used own data set, and real devices.
Internet research on search engine quality and validity of results demand much concern. Thus, the focus in our study has been to measure the impact of quotation marks usage on the internet search ...outputs in terms of Google search outcomes’ distributions, through Benford Law. The current paper is focused on applying a Benford Law analysis on two related types of internet searches distinguished by the usage or absence of quotation marks. Both search results values are assumed as variables. We found that the first digit of outcomes does not follow the Benford Law first digit of numbers in the case of searching text without quotation marks. Unexpectedly, the Benford Law is obeyed when quotation marks are used, even if the variability of search outcomes is considerably reduced. By studying outputs demonstrating influences of (apparently at first) “details”, in using a search engine, the authors are able to further warn the users concerning the validity of such outputs.