I’m making the move from particle physics to data science because I want my skills to have a more direct, positive impact on people. My project with ASI Data Science and Lambeth Council gave me that chance.
Rogue landlords in the UK illegally make £4.2 billion pounds per year by forcing two million people to live in homes with severe hazards. These homes are often overcrowded, have a high risk of carbon monoxide poisoning, cockroach and rat infestations, faulty electrics, and mould and damp. Despite rogue landlords being well publicised in the media, last year there were fewer than five hundred convictions; this is, in part, due to the complex and lengthy process of gathering enough evidence for a prosecution. Rogue landlords build up a complex web of lies. They have many different properties, registered in many different names, associated with many different companies.
Over a hundred hours of evidence collection by expert fraud offices goes into investigating a single suspected rogue landlord. Starting with the name or address of a suspected rogue landlord, investigators spend hour after hour manually searching databases to build up a profile. The results of their searches are written on reams of paper, which makes it difficult to spot the complex connections between people, companies, and properties.
If investigators had a way to visualise all the information in one place, thousands of hours per year could be saved, resulting in taxpayer savings, in addition to more convictions being made. I realised investigators need a tool that was easy to use, quick to run and could be accessed by anyone anywhere. They need a way to see the complex network, and papers lists are not the solution.
In 6 weeks, I designed and built a web application that turns a manual process that used to take 20 hours of expert time into a search tool that automates the process and within two minutes returns the results in an interactive network graph for the investigators to explore.
The biggest hurdle was accessing the data. I used publicly available APIs provided by Companies House and the Land Registry, but both required further authorisation. Let’s say I learned a great deal about bureaucracy - a skill which will no doubt be invaluable during my career as a data scientist!
After persevering and being granted access to the data, I could put my technical skills to work. I wrote recursive algorithms to repeatedly query the APIs and used natural language processing to match between databases. I got to grips with NoSQL to store the recursion results in a Neo4j graph database, which means all the information can be easily accessed when a search is repeated. For me, the biggest challenge was making a web app that looked as good as the results were. In particle physics we don’t care too much about making things that look good, just things that work.
Lambeth Council love the product and are now using it in all fraud investigations. I’m looking forward to finding my next project as a data scientist.
Sam Short took part in the ASI Fellowship September 2016.