Setrak Balian, ASI Data Science, London

Imagine a virtual world in which policymakers can try out different policies without having to suffer the real-world consequences.

ASI Data Science teamed up with HM Department for Education to build a simulated world of teachers and schools in order to investigate effects of different policies on the long-standing issues of teacher recruitment and retention.

Basic anatomy of virtual reality

For any complex simulation, the ingredients can be boiled down to three. First, there must be actors (or variables) in the simulation. For the teacher world, there were two types: teachers and schools.

Second, there must be some interaction between the actors so that they don’t just act independently or at random. This ingredient drives collective changes in time (dynamics) and was modelled as teachers moving between different schools as well as in and out of the school system as a whole. Policies can then be applied to either certain teachers or to the system as a whole. The effectiveness of policies may then be assessed by observing the behaviour of teachers in time - i.e., how many of them move to a more attractive school? Do they move out of the school system?

Third, and most important of all, is data. Data provides the crucial link to the real world and gives credibility to any simulation. The data used was DfE’s extensive teacher database [1] which includes teacher-level information such as subjects and hours taught, age and gender, Ofsted scores of their schools and much more.

Artificial intelligence to the rescue

Suppose every individual teacher was included in the simulation. It will be extremely difficult to keep track of all these teachers. Unsupervised machine learning, a type of artificial intelligence, can help by reducing the entire teacher population into representative groups based on their historical characteristics in the database. The machine learning algorithm used by the ASI-DfE team reduced the number of secondary teachers from about 200,000 to 8 representative actors.

Eight actors are much easier to track than hundreds of thousands but that’s not the only advantage. This so-called clustering is also useful in creating a narrative: it uses data to segment teachers into distinct types. These types can then be used for effectively targeting policies.

What’s next?

The outcome was a prototype “teacher world” whose inhabitants were representative teachers identified using unsupervised machine learning on the DfE teacher database. Different policies can now be applied in the simulation and their effects on teacher behaviour compared. The clustering into eight may also prove valuable in targeting policies.

  1. The Department for Education teacher database was anonymised prior to analysis and the analysis was performed on a highly secure computing environment. ↩︎