Central Winger: Pythagorean Expectation tested in MLS

Union celebrate, Revs don't - Central Winger

Photo Credit: 
Getty Images

As soccer analytics continues its slow growth from infancy to childhood, it's natural for the movement to look at strides made in other sports and try to emulate them. Bill James’ Pythagorean Expectation is no exception.

A quick synopses of the baseball Pythagorean: It is a formula that fairly accurately estimates the number of games a baseball team should have won based on the number of runs they scored and the number of runs they conceded. The formula's name, of course, came from its similarity to the Pythagorean Theorem that most of us surely grappled with sometime during high school.

Soccer's equivalent, however, has proven to be significantly more difficult to derive than a2 + b2  = c2 (no offense to Pythagoras). Since soccer games can end in draws, this approximation simply does not work – and there is no easy way around it. Luckily, Dr. Howard Hamilton (@soccermetrics) of Soccermetrics Research (a thought-leader in the current state of soccer analytics) has put in the hard work to unravel this problem. 

Building off the extensive theoretical work on the Pythagorean Win-Loss formula in Baseball by Steven Miller, Howard derived an elegant methodology for estimating not just the amount of wins and losses a soccer team would accrue over the course of the season, but also the number of draws (and by extension: points). With the recent conclusion of the 2012 MLS season, Howard has published his end-of-year results. This is the table replicated here:

  League Table Pythagorean
Team GP W D L GF GA GD Pts W D L Pts Δ
San Jose Earthquakes 34 19 9 6 72 43 +29 66 19 7 8 64 +2
Sporting Kansas City 34 18 9 7 42 27 +15 63 16 10 8 58 +5
D.C. United 34 17 7 10 53 43 +10 58 15 9 10 54 +4
Red Bull New York 34 16 9 9 57 46 +11 57 15 8 11 53 +4
Real Salt Lake 34 17 6 11 46 35 +11 57 15 9 10 54 +3
Chicago Fire 34 17 6 11 46 41 +5 57 14 9 11 51 +6
Seattle Sounders FC 34 15 11 8 51 33 +18 56 17 9 8 60 -4
LA Galaxy 34 16 6 12 59 47 +12 54 15 8 11 53 +1
Houston Dynamo 34 14 11 9 48 41 +7 53 14 9 11 51 +2
Columbus Crew 34 15 7 12 44 44 +0 52 12 9 13 45 +7
Vancouver Whitecaps 34 11 10 13 35 41 -6 43 11 10 13 43 +0
Impact de Montréal 34 12 6 16 45 51 -6 42 11 9 14 42 +0
FC Dallas 34 9 12 13 42 47 -5 39 11 9 14 42 -3
Colorado Rapids 34 11 4 19 44 50 -6 37 11 9 14 42 -5
Philadelphia Union 34 10 6 18 37 45 -8 36 10 9 15 39 -3
New England Revolution 34 9 8 17 39 44 -5 35 11 9 14 42 -7
Portland Timbers 34 8 10 16 34 56 -22 34 8 9 17 33 +1
Chivas USA 34 7 9 18 24 58 -34 30 5 8 21 23 +7
Toronto FC 34 5 8 21 36 62 -26 23 8 8 18 32 -9

On the left side of the table are the final results for the 2012 MLS Supporters’ Shield race. And, appended, is the team's pythagorean expectation including the “delta” (or residual) which denotes just how far off the estimation is from reality.

In other words, this is Howard fact-checking his own system. And they compare remarkably well to the actual results of the regular season – placing only a handful of teams out of order. Howard mentioned to me that the average residual for the soccer pythagorean is between three and four, and that residuals beyond that should garner the most attention.

So, that's where we will investigate.

According to the soccer pythagorean, the most over-performing teams in the 2012 MLS regular season were the Columbus Crew (+7) and Chivas USA (+7). On the other end, the New England Revolution (-7) and Toronto FC (-9) appear to be the most under-performing squads. 

Now, remember what this table is estimating (points) and what it is based on (goals for and against). Essentially, it is suggesting that the New England Revolution – after scoring 39 goals and conceding 44 goals – would have usually been expected to earn 42 points this season instead of 35.

Having ventured to Gillette Stadium a few times this year, I think I can provide decent commentary for this result. Of the 17 games that the Revolution lost this season, only two were by a more than a single goal. Understandably, this method (and a normal human being) would expect that of the 15 single-goal losses suffered by the Revolution this season, at least a few would have sneaked into a draw. I expect that some of the other outlying teams are suffering (or benefiting) from a similar circumstance.

The question here ultimately boils down to “what causes these outlying residuals?” Some people would quickly pin this to “luck” – and they might be correct.