Numerology: MLS stats gurus leading revolution (Pt. 1)
Welcome to the rabbit hole. No one is really sure just how deep the soccer analytics field goes, but more than a few people are spending a lot of time trying to find out.
The following is the first of a four-part interview our own stats guru, Devin Pleuler of Central Winger fame, conducted with four analysts employed by MLS teams: Timothy Crawford (New England), David Lee (New York), Sean Rubio (San Jose), and Rui Xu (Sporting KC).
They have degrees in statistics, economics or analysis. They multi-task, preparing spreadsheets, video sessions and more. They interact with the coaches and players. ("I'm also the one fixing the players' computers and iPads when they go wrong!" Lee says.) And they are all much, much more than just the geek in the corner.
In Part 1, we'll tackle two questions: What exactly is performance analysis, and what are the main hurdles in doing it properly?
Pleuler: What is performance analysis? What are your roles and responsibilities to your club and/or first-team?
Timothy Crawford: Primarily I am working on a lot of long-term projects rather than match-to-match analysis. We are trying to gather a lot of data in spreadsheet and database formats and will hopefully find some great answers about how the league, teams and players function. I haven’t done much from the video analysis aspect to this point. Instead I focus my attention on numbers to try and analyze how the MLS game is working.
Numerology: Breakable records
David Lee: My main roles are to support the head coach and coaching staff in their weekly preparations for each match by producing opposition analysis reports for the staff and all players accompanied with a preview video highlighting the key areas of each team to help prepare our game plan. This preview video is then presented to the players just before the match to identify for each of them the key areas we need to focus on during the match.
Sean Rubio: The International Society of Performance Analysis of Sport has a pretty concise definition: “Performance Analysis is an objective way of recording performance so that key elements of that performance can be quantified in a valid and consistent manner.” I’ll go with that.
Rui Xu: At its heart, performance analysis is simple: It is using data to make informed decisions. How the job manifests varies greatly between teams. Most performance analysts around the world are focused on video analysis (which, of course, is a form of data), whereas with my baseball and economics background, I’m more concerned with the statistical analysis of the sport.
Pleuler: What do you feel are the main roadblocks in modern soccer analysis? What strides need to be made for these roadblocks to be overcome?
Crawford: Right now, the biggest issue we have is getting the data we want. The fact that fans of other sports can look at box scores in the newspapers, or go to thorough statistical websites like fangraphs.com for free means that the challenge only lies in the analysis. For soccer, the data isn’t as readily available. It exists, but it’s mostly all maintained by companies whose economic model is built on selling or licensing the data.
Lee: Simply the biggest problem, in my opinion, is the relatively early stage of the development of soccer analytics and the lack of useful data for the average fan to be able to gain access to. The significant development in baseball sabermetrics, and ultimately Moneyball, was the fact that the "average" fan was able to devote their time and knowledge to the statistical analysis of the sport and create ideas and theories about how it should be viewed which could be tested and improved, the bulk of the work wasn't originally done by people in front offices of baseball organizations.
Of course, the other problem which is mentioned ad nauseam with regard to soccer analytics, and it is a very valuable point, is how difficult the fluid nature of the game is to break down into statistics and individual events to record. Even with the level of detail we have now with recording each pass or on the ball action it's not a complete statistical picture: Is an unsuccessful pass the fault of the passer or receiver? Is it 50-50? If not, how do you associate who gets credit for each action?
Rubio: While it’s not a roadblock that I see being overcome any time soon, allowing more advanced metrics to exist in the public sphere would be a huge step forward. As a long-time baseball fan, I’ve come to appreciate the work in analytics that has been based in open-source data, not necessarily third party companies and/or corporations. So much of the work in soccer analysis is being done behind closed doors — and good work, let it be said — so the pool of people able to make strides in the industry is a very small one.
Xu: Data, data, data. To get at usable soccer data, you would need to know the location (X,Y and possibly Z coordinates) of the ball and every single player during every event, which is incredibly difficult to keep track of. Creating usable soccer data to objectively evaluate a player isn’t just a matter of sifting through box scores; it’s a systemic change in how game events are collected and saved. Collecting XYZ data of every player and every event for thousands of games is basically impossible with human input, but with the improving technology, it is slowly becoming more possible.
The reason we need the location data is context. If you look at the boxscore, and you see Player R has completed X number of passes, that number means nothing other than Player R completed X number of passes. It is the equivalent — to borrow a line from early-1900s baseball writer F.C. Lane, who decried the use of batting average — of having three quarters, two dimes and a nickel, and telling everybody you have six coins rather than a dollar. A cross in from the wing to a heavily defended penalty box has a much different value than a through ball that creates a one-on-one situation with the goalie. Each different pass raises or decreases your probability of scoring a goal to varying degrees, and to lump them all together and try to derive value from that is just foolish.
That ends Part 1 of the interview. Check back on Wednesday for some MLS-specific discussions, including working within the salary cap and some hints at team-specific tools.