A Question of Polling

9/5/17 UPDATE

The results from the second round of the French election are in, and Macron was comfortably elected President. In the second round, the public polls were not particularly accurate, perhaps surprisingly, given their first round performance. They underestimated Macron’s eventual vote share, although this made little difference to the result, given his large winning margin. We will hopefully explain some of our results, and the difference between the public polls in the two rounds in a future blog post.

In the run-up to the second round of the French election, it wasn’t just on the continent where people were eagerly awaiting the result. Obviously, the French people care deeply about their next leader, but the result had a wider set of implications for the rest of the EU, the UK and the world.

In our professional capacity as data scientists, we were also very interested to see what the election would reveal about the accuracy of the public polls. In contrast to other recent elections, the public polls on the first round of the French election quite closely predicted the results. But would they maintain their performance?

ASI specialises in applying complicated maths to solve real world problems. Since polling is interesting, difficult, and probably requires some new technology to work consistently, we’ve been exploring what value we could add in this area over the last two years.

The reason that polls fail is reasonably well understood. Most pollsters depend on finding a random sample from a large population. 50 years ago, randomly dialling telephone numbers was a good way to gather this sample. Then, as communications technology developed, fewer and fewer people bothered to use landlines. Today, even when the pollsters do get through, many people just hang up on them. In 1980 the response rate to a phone poll was about 72%, it is now around 0.9%.

But this methodology was invented in the 1930s, and many things have changed since then. In particular, we have more powerful computers, more powerful algorithms and easier ways of reaching people. Is it possible to reimagine polling given all the advantages of the 21st century?

If you spend any time in Silicon Valley, it won’t be long before you hear someone say “Software is eating the world”. This is a quote from Marc Andressen, an investor and the creator of Netscape. His quote is a pithy way of summarising a wider argument that all businesses will become software businesses. According to this theory, because software is cheaper than people, and easy to iterate fast, software companies will out-compete those that are not.

So how does this apply to polling? In current polling methods, the complexity lies in collecting a representative sample. In the methods we’ve been developing, we move this complexity into software, reducing the requirement for a good sample. We try to correct for the errors in the maths, not in the sample. The trade off, though, is that this requires a lot of data. The technique is still in development, but we have been able to experiment with predictions for two elections, in the UK referendum and in the French election (unfortunately there isn’t enough time, due to the short run-in, for us to do a third test on the UK General Election this year). It’s too soon to discuss how well our models did, since we’re still learning lessons from these recent results, but watch this space for future updates.

The basis of this technique was developed in the US, by Professor Andrew Gelman, who used questions on the Xbox to gather the data. In Gelman’s demonstration of the technique, the non-representative polling out-performed the average of the public polls for the 2012 election of Obama.

Share on: Twitter, Facebook, LinkedIn or Google+