As soccer analytics continues its slow growth from infancy to childhood, it's natural for the movement to look at strides made in other sports and try to emulate them. Bill James’ Pythagorean Expectation is no exception.
A quick synopses of the baseball Pythagorean: It is a formula that fairly accurately estimates the number of games a baseball team should have won based on the number of runs they scored and the number of runs they conceded. The formula's name, of course, came from its similarity to the Pythagorean Theorem that most of us surely grappled with sometime during high school.
Soccer's equivalent, however, has proven to be significantly more difficult to derive than a2 + b2 = c2 (no offense to Pythagoras). Since soccer games can end in draws, this approximation simply does not work – and there is no easy way around it. Luckily, Dr. Howard Hamilton (@soccermetrics) of Soccermetrics Research (a thought-leader in the current state of soccer analytics) has put in the hard work to unravel this problem.
Building off the extensive theoretical work on the Pythagorean Win-Loss formula in Baseball by Steven Miller, Howard derived an elegant methodology for estimating not just the amount of wins and losses a soccer team would accrue over the course of the season, but also the number of draws (and by extension: points). With the recent conclusion of the 2012 MLS season, Howard has published his end-of-year results. This is the table replicated here:
|San Jose Earthquakes||34||19||9||6||72||43||+29||66||19||7||8||64||+2|
|Sporting Kansas City||34||18||9||7||42||27||+15||63||16||10||8||58||+5|
|Red Bull New York||34||16||9||9||57||46||+11||57||15||8||11||53||+4|
|Real Salt Lake||34||17||6||11||46||35||+11||57||15||9||10||54||+3|
|Seattle Sounders FC||34||15||11||8||51||33||+18||56||17||9||8||60||-4|
|Impact de Montréal||34||12||6||16||45||51||-6||42||11||9||14||42||+0|
|New England Revolution||34||9||8||17||39||44||-5||35||11||9||14||42||-7|
On the left side of the table are the final results for the 2012 MLS Supporters’ Shield race. And, appended, is the team's pythagorean expectation including the “delta” (or residual) which denotes just how far off the estimation is from reality.
In other words, this is Howard fact-checking his own system. And they compare remarkably well to the actual results of the regular season – placing only a handful of teams out of order. Howard mentioned to me that the average residual for the soccer pythagorean is between three and four, and that residuals beyond that should garner the most attention.
So, that's where we will investigate.
According to the soccer pythagorean, the most over-performing teams in the 2012 MLS regular season were the Columbus Crew (+7) and Chivas USA (+7). On the other end, the New England Revolution (-7) and Toronto FC (-9) appear to be the most under-performing squads.
Now, remember what this table is estimating (points) and what it is based on (goals for and against). Essentially, it is suggesting that the New England Revolution – after scoring 39 goals and conceding 44 goals – would have usually been expected to earn 42 points this season instead of 35.
Having ventured to Gillette Stadium a few times this year, I think I can provide decent commentary for this result. Of the 17 games that the Revolution lost this season, only two were by a more than a single goal. Understandably, this method (and a normal human being) would expect that of the 15 single-goal losses suffered by the Revolution this season, at least a few would have sneaked into a draw. I expect that some of the other outlying teams are suffering (or benefiting) from a similar circumstance.
The question here ultimately boils down to “what causes these outlying residuals?” Some people would quickly pin this to “luck” – and they might be correct.