This week, Google DeepMind’s AlphaGo is challenging one of the greatest Go players of all time – Lee Sedol – in a five match series. As Go remains one of the few remaining games in which humans can still claim superiority to machines, a win for AlphaGo would be an extremely significant milestone.
However, this is not the first time that DeepMind has challenged a professional Go player. Six months ago, the original implementation of AlphaGo played the European champion, Fan Hui, and beat him 5-0. Given ASI’s specialism in data science consultancy (essentially, applying machine learning and artificial intelligence to commercial problems), and our interests in rational prediction and old-fashioned gambling, this is very much in our wheelhouse. We only spent a few hours figuring out a prediction, so this is a little rough and ready, but here is how we broke it down:
The gap between Lee and Fan
It is clear that, while Fan Hui is an extremely accomplished Go player (the European champion!), Lee Sedol is in a different class. Go players are traditionally rated on a scale on 1st Dan to 9th Dan (interestingly, this seems to be the root of Karate rankings, post Black belt).
Fan Hui is a 2nd Dan professional, and Lee Sedol is a 9th Dan. This certainly sounds like a big difference, but what does it translate to in winning percentage, so that we can use it in our forecasts? If you take their relative Elo ratings - Lee is at 3500 and Fan is at 3000 - Lee would have a 95% chance of beating Fan, possibly more. Lee is a very, very talented player.
As all good forecasters know, an accurate prediction starts with a good base rate. A base rate is a historical comparison to anchor your forecast. Unfortunately, it is not always obvious what comparison makes a good base rate. We thought that there might be three relevant numbers.
Taking the prevailing expert opinion is a good place to start forecasting in an uncertain domain. But this does come with complications. All experts are not equal, particularly given human psychology. Phillip Tetlock (the inventor of the methods for forecasting that we use here) suggests that foxes (experts with a great breadth of knowledge) beat hedgehogs (experts with a great depth of narrow expertise).
In our context, we want experts who know something about Go and something about AI, not just very deep Go experts. Our survey of online commentary suggests that these foxy experts rate AlphaGo’s chances of winning at 50-55%. Conveniently, this was also the estimate from prediction markets.
For full disclosure, I’m friends with Shane Legg and know Demis Hassabis a little. Both rate in my “off-the-charts” category, so when I read that they had decided to challenge the world champion, I was willing to wager my life savings on AlphaGo (don’t be too impressed, that amounts to a pretty small bet). Just to note, though, I hadn’t talked to anyone from DeepMind about this work before writing this post, so this is entirely based on openly available information.
However, in making a forecast, it is important to step outside the parochial view and take a more balanced approach. How often do companies challenge humans too early? I think it is clear from the result of the match against Fan Hui that it is a question of when DeepMind will beat the world champion, not if. But other factors out of DeepMind’s control could have, at least in principle, forced the game to happen before they were completely ready.
To assess the risk of this, we looked back at previous challenges, taking draughts, chess, scrabble and jeopardy into account. In many of these cases, the humans were challenged before the algorithms were fully ready, and the algorithms lost. We placed AlphaGo’s chances at 25% from these historical comparisons.
AlphaGo FH vs Lee Sedol
A final calibration could be sought by asking about how Lee Sedol would fare against the original AlphaGo algorithm that played Fan Hui, which I’m calling ‘AlphaGo FH’ (for AlphaGo Fan Hui, in case that wasn’t obvious).
In this case, it seems clear that Lee would have conclusively won. Lee’s ELO rating is said to be 3500, and DeepMind’s assessment of AlphaGo’s ELO rating is about 3200. This suggests that the chance of AlphaGo FH winning 3 games is about 1 in 150, i.e. very small. Advantage, Lee.
Base rate Conclusions
While the base rate does not conclusively favor either party, I would suggest that we learnt a few important things. Firstly, experts think AlphaGo stands a chance. Secondly, that companies do sometimes commit too early. Thirdly, Lee would thrash AlphaGo FH. I’d put our base rate at about 60-70% to Lee.
But we know that DeepMind will not have been sitting on their hands for the last 6 months. Improvements will have been made, and probably at an incredible rate. After all, DeepMind has some of the best people in the field, and given the potential publicity, I’m sure Larry was willing to open his wallet for some extra people and fancy hardware.
The new version of AlphaGo, which I’m going to call AlphaGo LS, is going to be considerably more powerful than the original. Estimating how much more is tricky – this is certainly the part that I feel least comfortable about.
To try to get a feel for progress, we can look back at the improvements in DQN on Atari games (the other DeepMind Nature paper) after it reached human level performance. Within a year, it reached superhuman performance in considerably more games than in the original paper. This suggests that human-level performance is not a particularly important metric, at least for the uncaring universe, and we shouldn’t anticipate further improvements beyond AlphaGo FH to be significantly harder for DeepMind.
Given the performance increases they have seen over the 18 months of the project; from the best previous algorithm, Crazy Stone (ELO ~2000), to AlphaGo FH (ELO ~3200) we can estimate the progress to be about ~400 ELO points per 6 months.
If we speculate on where this might come from, I’d guess that larger volumes of training data, more self-play and further algorithmic optimization combined could give something like this improvement.
Drawing on a thoughtful discussion by Miles Brundage, it seems clear that there are diminishing returns available by continuing to increase the hardware power available. That being said, it still seems that there is some improvement possible by cranking the number of computations.
Looking at the matches against Fan Hui, it seems that AlphaGo did better when given more time. In the games against Lee, the rules allow for more time. I’m also imagining that the hardware will be twice as powerful (an arbitrary guess). I’d peg the combination of these two parts to give an extra 100 ELO points for AlphaGo LS on AlphaGo FH.
Given all of this, where do the stones fall? Lee Sedol has an ELO rating of 3500, and AlphaGo FH had an ELO rating of about 3200. From our reasoning, we conclude that the AlphaGo LS is likely to have an ELO of 500 points higher than AlphaGo FH, resulting in a rating of 3700, giving it an 80% chance of winning against Lee. This would lead to a 4-1 outcome for AlphaGo.
Given our uncertainty in algorithmic progress and in the hardware that DeepMind will actually use, we might anticipate this moving even further in AlphaGo’s favour. But, because of our base rate estimates, we’d like to hedge down somewhat. I think that the most likely outcome is for AlphaGo to win 4-1, but 3-2 is more likely than 5-0. Let’s see what happens.
Thanks to Nick Robinson, Alessandra Staglianò, Jess Riedel, Andrew Brookes and Aida Mehonic for discussion and editing.
Update: AlphaGo wins the series 4-1.