Adjusting your set: Predicting rare TV-watching behaviour with machine learning

Data Science / Data Strategy / Data science projects / Machine Learning / Artificial Intelligence / Fellowship / Fellow / ASI / Fellows Network / Data in Society / Careers / Sky / TV

It's been over half a century since a record audience of 32.3 million people watched England's triumph over West Germany in the 1966 World Cup final, and Britain remains a nation of avid TV watchers. According to Ofcom’s most recent statistics, 95% of households contain a working television set and Britons spend an average of over three and a half hours per day parked in front of the box.

In an age when broadcast TV and on-demand programming are colliding on the same device, how can providers like Sky strike a balance between these two very different types of content?

Maybe Saturday night is movie night, but on weekday afternoons your children simply want endless on-demand repeats of Peppa Pig. Wouldn't it be great if someone could work out these patterns and offer you the right content at the right time, without forcing you to trawl through the same menus every week?

That’s exactly the problem I was tasked with solving on my project with Sky on the ASI Fellowship. This specific problem also proved to be a great example of a much more general question: How do you identify rare events when your data is really unbalanced?

How unbalanced? Well, despite the recent explosion in on-demand content, it turns out that people still spend about two thirds of their time watching ordinary live TV. Compare that to the less than 2% of time spent carrying on with a boxset they’re already part-way through.

The simplest recommender identifies the most-popular options and suggest them every time. The catch is creating a system which answers every question with “Why not watch BBC1?” is a bit like hiring the Ninja Turtles as wedding caterers: sure, everyone likes pizza, but nobody’s going to be impressed when they turned up expecting a banquet.

Instead I chose a Naive Bayesian classifier. This type of model relies on the probabilities that the data displays certain features given that the user took a specific action. The challenge then becomes one of feature engineering: can we identify a set of variables from which separations between the classes naturally emerge?

In the case of television, I found the trick was to exploit timing patterns in people’s viewing habits. Live broadcasters are clearly tied to a schedule, but so too are on-demand viewers who bookend their Breaking Bad boxset binges between work and other commitments.

Because of this, the features I chose were time-based, focusing on the viewer’s actions at a number of previous time points ranging from an hour to a week ago. In this way, someone who records a programme while they’re at work and watches it when they get home every night at 9pm will have a history which reflects that behaviour. Separating out actions with this specific history therefore filters out a large fraction of the live TV watches that might otherwise overwhelm the prediction.

In this way, by reflecting on the structure of the data and engineering features to match these patterns, I was able to cut a path through the jungle of noise and spot the rare beasts lurking within.

Andy Perch (Fellowship VI)

Andy's data science career began with a PhD in particle physics at UCL, which involved using neural networks to identify rare electron neutrino interactions against a large background of other particles.

For his Fellowship project, Andy worked at Sky to develop a model that predicts users' viewing intentions based on their past behaviour, allowing for more personalised programme recommendations to be made.

Share on: Twitter, Facebook, LinkedIn or Google+