[Kemal Ergenekon] Mythbusting with Math - Conspiracy Debunked

Day 2,420, 02:19 Published in USA USA by Kemal Ergenekon


Mythbusting with Math - Conspiracy Debunked

Greetings readers of Ekonomi Politik,

Today we are going shelve our tinfoil hats and learn some statistics. We will start with an observation that was very puzzling for John Largo and his conspiracy theorist frien😛

For those of you who didn’t know, in the eUS, the person who wins the first six hours is almost certainly the one who wins the Presidency. Why is this? Why is it that the votes in the first 6 hours mean so much? Are votes after 6:00eRep worth less than those before that time? 6:00eRep is 9AM eastern, and 6AM Pacific. Why would an eAmerican election be decided before half the country got up? There is still roughly 2/3 of a day after that point. Why is that point in time crucial?

Basically, he asks the following: Why does the first one third of votes almost certainly approximate the final results? His conclusion: conspiracy!

Of course we don't want to put on the tinfoil hat straight away, and hence we will use math to see whether his statement is credible or not.

Our question: To what extent do the first 1/3 votes correctly approximate the final results if there is no rigging? Mathematically "no rigging" = "votes are independent random variables". Cool.

Since you would probably find reading the related statistics articles too tedious, we will go with "seeing is believing". We are going to do some simulations instead of using theory. For simplicity's sake, I am going to do the simulations as if there were 2 candidates, so a voting decision will exactly be a Bernoulli random variable: It has the value 1 with p probability and 0 with 1-p probability.

For example, let 1 indicate a vote for DMJ. Let 0 indicate a vote for JL. p is the probability that a random voter votes for DMJ. We are going to assume some value for this true parameter, and run simulations. Each vote will be like a coin toss, where DMJ gets a vote with probability p and JL gets a vote with probability 1-p.

Let p = 3/5. I will simulate 1000 votes, and plot the sample mean:



The horizontal axis is the number of votes counted. For instance 300 designates the point where only 300 of the 1000 votes is tallied.

The blue line is the sample mean thus far. Its value indicates the number of votes for DMJ thus far divided by number of total votes thus far.

The magenta line is the true mean of the random variable, which we assumed to be 3/5. Notice how fast the blue line converges to the purple one.

The red line is the 1/2 line. It is the winning threshold, since the candidate with more votes wins. If the final point of the blue curve is above red, DMJ wins. Below red, JL wins.

Well, this was just a single simulation. Let's have more simulations:







In each case, notice that once one third of the votes are known (333rd vote), the sample mean thus far is extremely close to the true sample mean achieved at 1000th vote. This phenomenon - the convergence of the sample mean to the true mean - is called the Law of Large Numbers. The interested can read more here: http://en.wikipedia.org/wiki/Law_of_large_numbers

Let's come back to our initial question: To what extent do the first 1/3 votes approximate the final results if there is no rigging?

I will run tons of simulations with the same framework and measure whether the winner at the 333rd vote coincides with the winner at the 1000th vote.

p = 3/5
number of simulations = 1000
winners coincide = 999/1000

That's right. I did the simulation 1000 times and the leader at 333rd vote turned out to be the winner in every single simulation, save for one.

For a closer race, say p = 0.55, things change:

p = 0.55
number of simulations = 1000
winners coincide = 962/1000

In the closer race where 55% backs the "1" candidate, the winner was correctly predicted by the first 1/3rd of the vote 962 times in 1000 simulations.

Let's make it a head to head race, i.e. p = 0.50

p = 0.50
number of simulations = 1000
winners coincide = 709/1000

Even when both candidates have the same backing, you can correctly predict the outcome after 1/3rd of the votes are cast 70.9% of the time.

I hope our future POTUS candidates and conspiracy theorists will first try to learn some math and statistics before seeing demons behind every corner. Congratulations, you just rediscovered the law of large numbers! (exactly 301 years after it was first discovered by Bernoulli: http://www.math.ethz.ch/~wueth/Positions/2013_Bernoulli.pdf )

MATLAB code:

For a single simulation:

p = 3/5;
draws = rand(1,1000);
votes = (draws😛);

for ii=1:length(votes)
sample_mean_thus_far(ii) = mean(votes(1:ii));
true_mean(ii) = p;
winning_threshold(ii) = 1/2;
end

clf
figure(1)
hold on
plot(sample_mean_thus_far,'blue')
plot(true_mean,'magenta')
plot(winning_threshold,'red')
hold off
ylim([0 1])

For repeated simulations:

p = 0.50;

for sim_no = 1:1000

draws = rand(1,1000);
votes = (draws😛);

for ii=1:length(votes)
sample_mean_thus_far(ii) = mean(votes(1:ii));
true_mean(ii) = p;
winning_threshold(ii) = 1/2;
end

winner_at_333 = sample_mean_thus_far(333) winner_at_1000 = sample_mean_thus_far(1000) winners_coincide(sim_no) = (winner_at_333==winner_at_1000);

end

mean(winners_coincide)