Soccer Analytics-Q&A_16x9

As part of Thursday's Soccer Analytics series, asked StatsPerform Senior Data Analyst Jonny Whitmore to answer some of your burning questions on the topic. You can find his answer below.

This always surprises people, but event data is collected by people watching the matches. At Stats Perform, we have one analyst who codes the actions of the home team, another analyst who codes the away team and a third match checker who oversees this process. This is followed by an extensive post-match review process which takes place within 24 hours of the full time whistle to further ensure Opta data is of the highest accuracy. More details in this video.

"How do you get started on that profession/specialty and what software apps and computer hardware and such are most commonly used?" -- George Rafael

The ability to code (the most popular languages are R and Python) is an essential skill in my role as a data analyst and there are lots of football-specific resources available right now so there is no better time to learn. Check out Devin Pleuler’s (Director of Analytics for Toronto FC) analytics handbook for some great tips and resources to do this!

There are already great metrics out there that credit positive and negative contributions of a player for all of their on-the-ball actions (e.g. Possession Value). Metrics like these can reward players who were typically undervalued by traditional metrics such as goals and assists. While the advancement of tracking data will see off the ball player movements get more recognition too, I still believe there are too many intricacies and roles in soccer to accurately assign all of these actions to one number.

This will be different at each club, but I believe that set pieces are still hugely underutilized in soccer. A set piece gives a team the opportunity to make an uncontested pass from a controlled situation and teams should be far more efficient at retaining the ball or creating chances from these. While these may only be marginal gains in a single match, these can be the differences at the top level, and I wouldn’t be surprised to see more teams follow the examples of Liverpool in hiring set-piece specialists.

While the expertise of the staff will have a huge influence on how teams use the data that is available to them, these benefits will only be realized if they have the buy-in with key decision makers. Whether this is the manager, a director of football or even the owner, the team’s that work most effectively with data are those where the hierarchy trusts in its use as a tool and use it to supplement their existing processes.

Soccer is a low scoring game that is decided by narrow margins (and often a decent amount of luck) and so it is almost impossible to attribute data analytics directly to a return in results. Data is not a silver bullet but should be seen as a tool that supports many departments within a club to improve efficiencies and reduce risks in their decisions.

What basic box score stat is most useful when analyzing a game?

It depends on the aspect of a game you’re looking to analyze but in isolation, none of them. An individual stat can only give you an inclination of what could’ve happened, but you need the additional context to give a true reflection. A team may have conceded 10 shots but what were the quality of these chances? Did they concede these shots because they were already winning and so were defending deep (Atletico Madrid conceded 34 shots in their second leg victory over Liverpool in the Champions League this season)? Does a team’s game plan involve deliberately having less possession (Leicester City won the Premier League in 2015/16 but averaged only 43% of possession in games)?

Which player position is most short-changed by what soccer analytics offers today?

The emphasis in analytics has always been on the more glamorous attacking contributions rather than the actions of defenders. Traditional defensive metrics such as tackles, or interceptions only really act as indicators of defensive style (active or passive) rather than being reflective of a defender’s quality. Legendary Italian defender Paolo Maldini once said that “if I have to make a tackle, then I have already made a mistake”, an opinion that has also been echoed by Spanish midfielder Xabi Alonso.

Metrics do exist that weight these actions by the quality of chances they prevent and tracking metrics are available that quantify a defender’s ability to dominate the areas around them. However, I believe that there is still work to be done in this area of analytics before there is a widely accepted measure for defensive quality.

What's one piece of analysis you've done in your career that completely floored you?

Early on in my time at Opta, I worked on a project with a Premier League club to analyze team styles and suggest suitable loan destinations for one of their young players. While circumstances didn’t lead to an immediate transfer, I woke up one morning during the following transfer window to the headline of this player moving to my top recommended destination. This was a very satisfying moment for me and was demonstration of how data can actively assist with very traditional practices in soccer (even if the decision was nothing to do with my analysis!).

What are the pros and cons of the expected goals stat?

Expected goals (xG) has appeared far more frequently in mainstream media over recent seasons and is a great metric for measuring the underlying performance of a team or player over a period of time. Expected goals is a pre-shot model and so naturally only measures the quality of the chance before a shot is taken. It can give us an indication that a team has been performing better than their results suggest or that a striker has been particularly unfortunate with the chances that they have had.

The biggest downfall of xG is people interpreting and using it incorrectly. Expected goals is used most effectively by people who understand its limitations. Traditional xG models don’t account for player information and so the likelihood of goal being scored from a given set of parameters is based on the ability of an "average" player. As a result, some elite players may always overperform their respective xG values. The metric is also subject to the same randomness and luck associated with goals being scored and so it is more effectively used over larger samples rather on a game-by-game basis.

What are game states and why do they matter?

Game states in soccer are the context of whether a team is winning, drawing or losing. This is essential in analysis as teams and players will behave differently and apply different tactics depending on the current score line. Do you need to push for an equalizer/winner? Are you happy to allow the opposition to have more possession given that you’re already winning?

Analyzing your opponent’s behavior at these different game states enables you to prepare in advance for how you may have to react during a game. If your opponent’s score an earlier goal, they may sit deep and defend. Does this change your own game plan in order to create chances more effectively?

Question from editorial staff: Which player position/type of player has benefited most from soccer analytics in terms of understanding their value to a team?

Forwards. For obvious reasons, everyone wants to focus on the players who are scoring or creating goals for their teams. The contributions of these players are a lot easier to quantify as you can directly associate their actions with the outcome of a goal being scored. The emergence and mainstream acceptance of metrics such as expected goals are a testament to this!

The average shot distance in soccer is consistently falling (see the graphic below for the Premier League changes over the last 10 years) and many people believe this change stems from the emergence of data analytics in soccer. Teams are now encouraging their players to keep possession in the hope of creating more statistically likely goal scoring opportunities.