[GBM] Paul Proteus Predicts the Future, and the Election

Day 3,210, 00:02 Published in USA USA by Paul Proteus
Shout out to Penguin4512 who helped collect the data and create this model, and who is smarter than Paul, and thus, by the transitive property, also smarter than the average bear.



Goodbye Blue Monday: Now with images almost as old as Cromstar

Good Morning eMerica. The beginning of the month is always a volatile time. Presidential elections remain one of the most exciting aspects of eRepublik. So much uncertainty! So much competition! Well, we here at Goodbye Blue Monday are here to suck the fun right out of elections with the sweet cold embrace of soulless math.


And I will be your guide

So, the goal of this article will be to correct for the flaws in last month's models, and present a reasonable, analysis-based, forecast for this month's competition. Alright, let's begin,

The Elephant in the Room: Voter Turnout

Last month, my model, while generally not way too wrong, had several glaring flaws, that most obviously led to inflated estimates for both candidates. Now I pretty much said it would, it was working with a small sample size and I essentially did it in an hour. This month, however, I gave myself considerably more time and essentially collected data from the past two years, which generally coincides with the advent of the current party system. And the first step was finding a way to model voter turnout.

Voter Turnout over Time

So....not great

In modeling voter turnout I ran into one fairly uncomfortable truth. The ravages of time do an incredibly good job explaining voter turnout. Nothing else really does. As you can see in the graph above, a simple variable measuring time explains about 76% of variation in voter turnout. I tested for whether a close election would have an effect. I gathered significantly more data going further back and tried to test for a seasonal effect. Neither were statistically significant. On the positive side of that, it makes integrating turnout into our analysis pretty easily. On the other hand, it's depressing as shit. On the other hand, graphing turnout over a longer period of time does tend to show a slower decline, so we can perhaps hope to avoid the complete collapse of the eUS for a few more months. Ajay's votes explained by time, in contrast, have an R2 of .13 over the past year, and are remarkably consistent around an average of 42, so at some point presumably we'll fall below 42 votes in turnout, and Ajay can finally become President. Until then, however, we have models to build. In the past two years, though, the decline in turnout has been relatively linear, and for ease we'll assume that will remain the case for our purposes.

Model Building 2: This Time its Personal


This image represents a more productive and less time intensive hobby than writing this article

So, last month, I used a general model which valued all party endorsements as the same. This was because I had a serious lack of data making it impossible to measure them independently. Well, that's no longer the case, so the model becomes more complicated. At it's core, however, it's essentially the same. I began with a model that included each party, as well as a variable for name recognition, still proxied by status as a former eUS CP, a variable for incumbency, MU membership, and several other similar variables. What should be fairly apparent is that a lot of this modeling consists of finding ways to model an intangible lurking variable of recognition, support, elitism, whatever you want to call it. From what I've found, former CP does a decent job of this (remove it as a variable, for example, and the BSP nomination suddenly becomes an extremely significant negative on a candidate). Obviously, however, this spills over into our other variables as well.

This leads us to some fundamentally interesting aspects of the model. The Black Sheep endorsement, as the model is constructed, really isn't significant at all. This isn't entirely nonsensical, the Black Sheep Party have only endorsed the winner 16% of the time over the past two years. If Party Endorsements indicate some broader community support, it makes sense that the self-described Black Sheep of the eUS' endorsement would be less indicative of that. That is not to say the BSP endorsement does not draw votes, it certainly does. It is, however, less useful in our forecast. That the Fed and USWP endorsement are significant is not surprising, both parties are powerhouses and highly connected to the meta of politics and frequently endorse or run winning candidates. That the SFP's endorsement is significant suggests a slightly more causal effect, as they have only actually endorsed a winning candidate in 28% of elections over the past two years. On the other hand, the most significant variable in our model is perhaps the party with the least voting power, if Congress results are to be believed. WTP is like Ohio. Nobody really knows what's there, nobody really cares, but damnit, election season roles around and they're voting for the winner. 76% of the time to be exact. Unlike Ohio, this really isn't a question of causation. Luckily for us, it doesn't matter. We're just trying to forecast the winner, so while interesting, we don't really have any desire to pick apart the actual power of any of these endorsements.


The obscene amounts of text obscure the general shoddiness

There are of course other lurking variables that our model likely cannot accurately capture, and certainly cannot be easily measured. For example, it'd be naive to believe multiple accounts have no influence on these elections (again, look at Ajay's voter base, which appears impervious both to time and their choice of candidate). However, by their very illicit nature, there's really no way to incorporate their existence into our model. Another frequently cited variable in elections, a candidate's effort, in media, messaging, etc. cannot really be accurately measured as a whole, as there's no record left of such activity that we can look back on years later. Additionally, this model treats endorsements as final, which, perhaps they are, if one is to believe that endorsements have an outsized effect on two-clickers and players less engaged in the meta, however, it's logical (if not inevitable) to imagine that a close Primary, like in this month's Federalist AND WTP primary, where Orik won each cont by only 1 vote, may possess less power than usual. While it's probably true that close primaries indicate more split party votes on the 5th, I have nowhere near the data nor manpower to really accurately measure the validity of that. So it cannot be included in the model. There are a lot of these limitations we hit with limited data available. Essentially, we have to accept that this model is not perfect. Most models aren't. They can still, however, be illuminating.

And while most of my attempts to introduce new variables to the model were unsuccessful (for example, total candidates on the ballot does not appear significant. Incumbency remains insignificant. Etcetera.), there are a few other changes to the model. While margin of victory does not appear to significantly affect turnout, there does appear to be a relation between the closeness of the election and the proportion of votes split between the top two candidates versus spoiler candidates down the ballot, so I introduced a dummy, somewhat arbitrarily, through analyzing margins of victory over the past two years, designating a "close" election as an election where the margin of victory is under 15%, to hopefully make our forecast slightly more accurate.

Forecast Time


GBM artist real time illustration of Paul Proteus as of the writing of this article

Where is all of this leading? Well, to a model of course. And not the hot kind. And again, this article is long enough as it is, so if you want to know more about my methodology, tell me easy ways to make this better or get a glimpse into the infinite excel spreadsheets, F-tests and R output that led to this, or just want to tell me how wrong every aspect of this is, shoot me a PM, I'd be happy to talk. And again, this is eRepublik, there are certain assumptions in the model designed to make it significantly easier to run that also make it less accurate than it could be. For example, I'm assuming the power of each endorsement stays constant over time. A couple caveats so that you can test this model yourself, as the data goes back 25 elections, to model this election, t=26. Also, this model is only designed to model first and second place candidates. It has absolutely nothing to say re. candidates in third, fourth, and fifth place. That's another improvement over last month's model in terms of accuracy, but will surely disappoint those trying to figure out how close to 48 votes Ajay will be in every election from now to our deaths. Also, perhaps most disappointing, this model is designed for fun, so when it's slightly wrong, please don't murder me. I will say, however, that over the past two years, applied retroactively, this model would have selected the winner correctly 84% of the time. So, you know, good enough for government work. Definitely good enough for fake government work. For those curious, our adjusted R2 is 0.7323.

So, let's assume a model of:
Votes = B0 + B1(t) + B2DEZC + B3DFmrCP + B4DFeds + B5DUSWP + B6DSFP + B7DWTP + B8DClose

We then fin😛
B0 = 197.533, B1 = -7.791 , B2 = 45.384 , B3 = 41.941 , B4 = 61.133 , B5 = 64.497 , B6 = 66.716 , B7 = 86.001 , B8 = 24.859.

Prediction


The level of accuracy we're going for

Now we get to the juicy part. Though of course, using the information I've already given you, you could do this at home pretty much every month, but that'd involve adding and multiplication, which, let's be honest, is never fun. So, I've taken the liberty of doing it for you.

Forecast for the Presidential Election:
Edited to reflect that the election will likely end with a margin of victory under 15%.

Pfeiffer: (predicted): 171.6549
95% CI: Lower: 24.86084 Upper: 318.4489
Orikfricai: (predicted): 257.6557
95% CI: Lower: 110.9989 Upper: 404.3124


Giving the people what they want: Graphs!


How Wrong is this Prediction?

I've taken the liberty of quickly calculating how wrong this model is, applied retroactively over the past two years. As I mentioned, the New Model tm correctly predicts the winner 84% of the time. As for forecasting the exact amount of votes, on average it's off by around 4.85%. Which isn't nothing, but, as that's an average, it's particularly affected by outliers (and December 2015 was a particularly brutal outlier for this model), and also, honestly that's as good as we're going to get without significantly more rigorous analysis. For comparison, retroactively applying the model I created last month, the Old Model tm would also be correct 84% of the time, but its off by an average of 24%. So, through a significant amount of effort, this month's model is marginally better. But we also have the comfort of knowing it's significantly better designed. Also, given our prediction intervals, there's significant overlap, so it shouldn't be particularly surprising should Pfeiffer beat Orik. In that sense, this really is unable to tell us anything concrete, but it does give us some indication as to what the election will look like. Statistics!


Xlympics, Poetry, and More~


Goodbye Blue Monday suggests investing in gooooooooold

Make sure to keep checking out Hadrian's Xlympic Results. Spoiler: Broforce is kicking ass~

https://xfilesbyhx.wordpress.com/xlympic-games-i-xli/



If you adjust for MU/Party size, I'm pretty confident we're going to win this thing 😉 Which, you know, obviously you'd do that, right? right?

On the subject of Xlympic Poetry...

Also, for the preliminary round I made the mistake of publishing my poem before it was to be judged. So, learning from that, I didn't publish my final poem early, but now that all the judging has occurred, I'm going to take advantage of this opportunity to publish something that has absolutely nothing to do with excel. And luckily for me, people seemed to at least tolerate my entry.

In fact, it earned such plaudits as:

Uncompelling - Gnilraps
-Inwegen
and
This is ... work - Fingerguns

So, without further ado, here it is:

Do Fireflies Really Burn?

When streaks of lightning weave through blurry nights
are embers left to blanket nascent fields,
and do they burn to ash and disappear?
Should I not try to capture one mid-flight
for fear of burning flesh that never heals?
Does fire fly and singe the summer air?
And in cupped hands does life just fade away,
a spark without the air it needs to grow,
or can you fill a jar, and make it glow
and light a room from its unearthly gray?
Is fire born intended to decay,
as soon as there’s no audience to show
its trick, no space that’s left for it to know?
Can fire truly fly from day to day?


Not as good as Tenshibo's, but oh well, we can't all have that level of swag. We can only hope to accept the swag we're born with. Anyway, that's all for now.


Until next time, see you around,
Paul Proteus~


Again, image creds to Fionia. Best Xlympics ever