Data and Modelling in the Work of the Goalkeepers


Last Updated: August 19th, 2022


I developed two models. The pre-shot model (Model 0) was meant to be comparable to Paul’s model and used location (shot xy co-ordinates) and contextual features. The post-shot model (Model 1) used location, contextual and placement features (Ball height and distance from goal centre).

The features in each model are shown below:

FeatureNotesPre-shot model featuresPost-shot model features


I used nine seasons of EPL data (2010 to 2018) with a 70-30 train-test split to develop the pre-shot model.  I used an xGBoost classifier, and optimised the parameters using a grid search.  The MSE for the pre-shot model  in the test set was 0.16 and the AUC was 78%. For the post-shot model, the MSE was 0.13 and the AUC was 87%, a considerable improvement. Sometimes I get tired of numbers and betting, when that happens you can play casino games without a Gamstop.

Keeper performance was assessed on the 2017 and 2018 seasons.  This was slightly different to Paul, who included data from the 2019 season-to-date, but as I did not have that data to hand, I did not use it. The table below shows the results. The columns labelled “Ratings” are the ratios of the expected to actual goals conceded; a keeper with a ratio above one concedes fewer goals than predicted by the relevant xG model, and a keeper with a ratio below one concedes more.

Rank Preshot ModelRank Postshot ModelRank PRKeeperShots OT FacedGoals ConcededxG Conceded Preshot ModelxG Conceded Postshot ModelRating Preshot ModelRating Postshot ModelRating PR

The correlation between Paul’s ratings and my pre-shot ratings is .92, which is quite high. The differences seem largely due to differences in the data sample, and small sample sizes. For example, Nick Pope turns out to be high in my rankings and only middling in Paul’s.  Pope didn’t play in the EPL in 2018 due to a serious injury, so my ranking is based only on the 2017 season which isn’t really enough. In addition since returning from injury, he hasn’t yet been able to replicate his 2017 performance.

However, that is not really important for our present purposes. The crucial point is how much my pre-shot and post-shot rankings differ. 

We can see that when post-shot information is included, some keepers rise quite a lot in the rankings and some fall. Kepa for example looks a much better shot-stopper when shot placement is taken into account, while Petr Cech looks significantly worse.

In conclusion, when evaluating goalkeepers, I would always include post-shot information and if you need to place a bet and are a self-excluded punter, you can take advantage of offers from betting sites without Gamstop.

Author: Willie Price