World Cup 2014

World Cup: How Germany's 7-1 destruction of Brazil broke statistical models | Central Winger

Tuesday night, in the 2014 World Cup semifinal, Brazil out-shot Germany 18 to 14. So given that Brazil and Germany are the two greatest footballing nations of all-time, and given that they were the two favorites heading into this tournament, and given that the game was in Brazil itself, it’d be natural to assume it was a fairly even encounter. Context, right?

Maybe it even tilted slightly in favor of the Brazilians due to their slight lead in the shots column. And this is where one of the most popular metrics in the soccer analytics community gets both important and confusing.

"Total Shots Ratio" or TSR, expresses what percentage of the game’s total shots a team had in a match. Brazil had 56 percent of the game's shots to Germany's 44 percent. TSR is highly predictive, and comes in very handy when predicting results, but it clearly fell down last night. Why?

One of the assumptions implicit to TSR is that all shots are made equal, which is in large part due to its roots in hockey analytics where there is much less variation of chance quality.

Not so in our game. Of the 1677 shots taken so far in the 2014 World Cup, approximately 11 percent have resulted in goals. If each of Germany's 14 shots last night were indeed made equally (at about 11 percent goal probability each), they would have been expected to score 7 or more goals once every 3,000 attempts. This kind of statistical outlier is just about as model-breaking as you get, and such a high-profile failure just might put the final nail in the coffin of the nearly-dead-anyway TSR.

The soccer analytics community is smart, and has realized for quite a while that raw shooting volume wasn't as useful as it was publicly available. Out of this frustration grew the concept of "Expected Goals" or ExG.

As discussed in this series on a handful of occasions, different dimensions of a shot can be weighed to determine just how likely that shot is to result in a goal. As you’ve probably guessed, one of the major predictors in this model is the shot's initial distance from goal.

Germany took the four closest shots to goal and buried them all. Who needs shot volume when you have shot quality (and Miroslav Klose)?

Attaching Opta's estimated ExG to each of the 32 shots taken in the match, Germany would have normally been expected to score 3.1 goals given the quality of their opportunities compared to Brazil's not-so-paltry 1.7 goals.

While ExG is much more illustrative of the thrashing in Belo Horizonte than TSR, it doesn't do the record-smashing 7-1 scoreline adequate justice. There are a few reasons for this, the most important being that ExG doesn't capture how well the opportunities were taken – the Germans were phenomenally clinical with their chances.

Even considering that inflated chance quality, though, Germany would have been expected to score seven or more goals once every 100 attempts. While not quite as model-breaking as the 1-in-3000 probability attached to the raw shot-volume estimate, this is still an incredible outlying performance – and yet, one that jibes with conventional wisdom. After all, wouldn't you expect a 7-1 match of this magnitude to be an outlier in the first place?

The second reason is that most ExG models are trained using large swaths of data across thousands of games and players. Therefore, ExG actually represents the probability of an "average" player scoring a particular opportunity. While it should be no surprise that the German strike force with the likes of Muller and Klose performed better than the average player, it also sheds light on just how poor the Brazilian finishing was last night -- it was quite literally sub-par.

The story out of Belo Horizonte shouldn't be the raw scoreline, it should be the other-worldly finishing that the Germans put on display. Germany has broken TSR; long live Expected Goals.