By André Richter, ASI Fellow, May 2017
The emergency responses of the London Fire Brigade (LFB) are complex operations involving a variety of specialised skills.
Not all skills are shared by all firefighters, so unanticipated disruptions to the team composition (for instance due to sickness) frequently render teams unable to respond to emergencies without appropriate substitutes for the absent colleagues.
This makes it necessary for the LFB to reallocate a significant number of firefighters across London each and every day on an ad hoc basis: a resource-intensive process for the LFB and a major inconvenience for firefighters.
My Fellowship project was to work together with the Insights team of the LFB, led by Apollo Gerolymbos and Andrew Mobbs, to understand how their rich dataset and Machine Learning could help improve this process.
The ability to anticipate disruptive staff shortages enables the LFB to prevent or prepare better, and hence my challenge was to establish how well Machine Learning could predict the unavailability of fire engines due to understaffing.
To get under the skin of the project, I met with many senior analysts, senior emergency operations staff, various members of the human resources department as well as actual firefighters.
We identified a variety of ways in which Machine Learning could either improve the processes or inform stakeholders within the organisation to make better-informed decisions. We decided to focus on predicting disruptive staff absences.
So-called "off-the-run events" (which is when a fire engine is unable to operate due to understaffing) occur in about 11% of all day-shift-station cases. This can be formulated as a supervised learning problem which requires adjustments for imbalanced classes. The overall procedure I used is as follows:
First, the data was randomly split into training, validation and test sets. The splitting was stratified so as to keep the fraction of off-the-run events similar in each split. On the training data, I addressed the class imbalance by randomly oversampling the minority class. I then fit classification algorithms to the resulting training data. I tested a variety of classifiers, with the main contenders being Random Forest and Gradient Boosted Decision Trees. I used a grid-search approach to tune the estimators' hyperparameters.
Gradient Boosted Decision Trees outperformed all other estimators.
The recall of the model on the test data is 72%, which highlights the enormous potential of Data Science and Machine Learning to predict even seemingly random or hard-to-predict outcomes.
André Richter took part in the ASI Fellowship May 2017. Prior to the Fellowship he completed a PhD in Economics from Stockholm University.