Unless you've been living under a rock for the last decade, you understand that soccer is now rarely played in banks of four with two strikers hovering ahead of them. One major tactical byproduct of the modern game has been the introduction of players that "play between the lines."
Naturally, we adapted our previous naming convention to include more "banks" of players. Instead of just the 4-4-2 or the 4-3-3, we began to identify derivatives such as the 4-2-3-1 or 4-3-1-2 as their own formations.
But this remains slightly problematic. Adding extra "banks" of players doesn't succeed in more accurately generalizing a tactical system, especially with the relatively recent addition of lopsided or asymmetrical formations. At some point, it became obvious that it was important not just where your players were playing, but also what they were doing when they were there.
Therefore, in our impossible quest to generalize tactical systems, we have begun naming these different roles. And there are already far too many: the "holding" midfielder, the "box-to-box" midfielder, the "false nine" or even the "central winger." All of these terms are very nebulous and players can often find themselves categorized into many of them at the same time – or none at all.
Today, we are going to take our first shot at quantifying some of these player roles. For this analysis, I looked at a few main things: a player's average position up/down the field, a player's average horizontal distance from the middle of the field and the quotient of the standard deviations of these two former metrics – meaning the ratio between the vertical and horizontal standard deviations.
After calculating these metrics, I used k-means clustering to partition 226 MLS players into six different groups. Here are the players plotted and partitioned. The results are encouraging.
To put it simply, this is a mathematical representation of where MLS players get their touches. And from a general perspective, it shows us one thing: How teams use their fullbacks may be the defining tactical difference between MLS teams.
The smallest in green at the bottom right corner is the goalkeepers (partition No. 6). Every goalkeeper in my data set was correctly identified. But, this isn't much of an accomplishment – they're rather trivial to pick out.
At the top, partition No. 2 includes strikers such as Alan Gordon, Brian Ching, Eddie Johnson, Kenny Cooper, Robbie Keane and Chris Wondolowski. Advanced midfielders were also lumped into this partition, including players such as Brad Davis, Graham Zusi, Nick LaBrocca and Darlington Nagbe.
The other pink section marked with "+" signs (partition No. 3) are our holding midfielders, like Dax McCarty, Clyde Simms, Jeff Larentowicz, Osvaldo Alonso or Rafa Márquez.
Below them in the blue zone (partition No. 4) are "static" central defenders. This includes the likes of A.J. Soares, Geoff Cameron, Heath Pearce, Matt Besler, Jay DeMerit or Austin Berry. Everybody fits, everybody plays roughly the same way.
However, the two red sections are more difficult to define. Partition No. 5 tends to include mostly defensive-minded players that tend to join the attack, such as Steven Beitashour, Seth Sinovic, Chance Myers, Young-Pyo Lee and Chris Tierney. Partition No. 1, on the other hand, includes players such as Tony Beltran, Chris Wingert and Kevin Alston.
And that's the tactical point. Analytically, there's a difference between "outside defenders" and "wing backs" - or more of a difference, anyway, than there are in other commonly defined positions.
This process of player classification isn't yet at the level where it can evaluate a player more accurately than a well-trained eye, but the metrics we are using to partition these players are incredibly rudimentary. Instead of using simple positional stats, we can instead look at other stats such as interceptions, shots or even pass completion rate. With added metrics, we can perhaps even more precisely identify types of players – and perhaps even different subcategories not originally imagined.