How to Do Great Data Science

Getting Started / Data Science / Impactful / Tractable / Actionable / Data science for good / Effective Altruism / NESTA / Data science projects

At ASI Data Science, we run a Data Science Fellowship, do our own data science consulting, and are building a data science platform. Collectively, this means we spend a fair amount of our waking hours trying to figure out how to do really great data science.

Data science is hard. You need talented people, deep technical knowledge, a good team ethic and general operational excellence. For what it’s worth, we’ll explain our view on all of these things in future blog posts (spoiler alert: you can solve the talent problem today by working with our fellowship!).

But for now, I’d like to talk about a framework that we’ve invented to help companies define a great data science question; a necessary, but not sufficient, part of great data science. We’ve worked with some of the largest organisations in the world (like the NHS) to some of the smallest start-ups that are still in stealth mode, and found the framework helpful at all levels. Recently, this has even been used by NESTA and the Greater London Authority to help figure out their first steps into the world of data science.

The framework is simple; for any great data science project, you should be able to answer “yes” to these three questions: Is it impactful? Is it tractable? Is it actionable? For anyone who knows anything about effective altruism, you may recognise the connection to their framework. For anyone that doesn’t know about effective altruism, it’s really cool; I suggest that you start here or here.

Impactful

We want to answer questions that are important to the organisation. Effectively, this means that we should privilege questions where the answers will have large effects on the main revenue or cost centres of the organisation.

Tractable

Determining whether a question is tractable is slightly more complex. Initially, we ask whether there is, in principle, an algorithmic solution to the problem. Then we have to understand whether, in practice, the necessary data exists. There isn’t a great deal of leeway here - if you fail to answer either question, your project is doomed to failure.

Actionable

This is the most forgotten/ignored stage. It’s so easy to focus on the huge and solvable problems that people often forget to ask “what will I change when I know the answer to this question?”. Surprisingly often, you’ll find that when you really think about it, your grand idea will actually lead to solutions that are very hard to implement in the real world. If the result of your analysis is that you need to completely change the culture of a several thousand person team, at the very least, you’re going to struggle to get buy-in.

So, how does one go about answering these questions? Well, as you’ll notice, they span both technical data science (tractable), business strategy (impactful) and frontline operations (actionable). This is by design, since we believe that great data science cannot be done in any other manner. This still represents a significant challenge: getting all these disparate people together and holding a thoughtful discussion to cross the chasms in the different languages used by the different groups. We’ve developed a series of workshops to help us in this process, that we’ll be blogging about next.

Share on: Twitter, Facebook, LinkedIn or Google+