Central Winger: Sifting through Opta data to sequence the soccer genome

This network is a visual representation of something I have jokingly begun calling the MLS Player Genome Project.

Using statistics from the Opta's MLS chalkboards and a heavy amount of number crunching, each player (having played at least the minimum minutes required by the high lord “sample size”) is compared to every other player in MLS. Their positional tendencies and statistical dispositions are each carefully compared and contrasted.

During this process, each pair of players is assigned a similarity score. If this score is above a certain threshold, the representative nodes of the two similar players are connected. Then, using different visual clustering techniques, this enormous matrix of player comparisons is untangled into the visualization seen above.

Complete Central Winger archive

The results are impressive. Forwards are clustered together in blue on the bottom left. Connected to the forwards are attack-minded midfielders in green. The left side of the green cluster seems to be more flank players, while the right side of the cluster seems to have a few more central players.

Moving from the green midfielders, we connect to the red cluster – which seems to be home to some more conservative midfielders. And, as expected, this conservatism grows as you move from left to right until you find a handful of prototypical defensive midfielders in Dax McCarty and Kyle Beckerman at the extreme. The fullbacks have their own cluster, featuring players that almost exclusively play wing back. And the goalkeepers, as expected, were pretty easy to statistically pluck out of the crowd.

While these player connections are far from perfect (they are roughly based on mathematical concepts that online dating websites use for personal matching and what Pandora uses for deciphering your musical taste), much value can be gleaned from looking at MLS in this perspective. Strikers and fullbacks, for example, clearly still have very distinct and specific roles. The modern midfield on the other hand is becoming ever more nebulous, to the point that there is no separate cluster for flank midfielders.

Is that a surprise? Not really, since the traditional left and right midfielders commonly featured in a prototypical 4-4-2 are replaced with hybrid attacking wingers more commonly seen in a modern 4-3-3.

As North American soccer continues to grow and improve, I expect this network to become even more illuminating. The tactical game inside the game becomes increasingly sophisticated, so the types of roles that will be required will become even more distinct. A soccer game, after all, is never decided by players playing their positions on a whiteboard; it’s a series of actions and reactions.

The end result? Specific positional roles each have a generally accepted set of guidelines, but players risk playing outside of these guidelines when they see fit.

In my experience coaching youth players, we spend the first part of a player's developmental growth teaching them how to play their position. From here, coaches are burdened trying to teach the players how to bend their positional roles into something greater – something more dynamic and dangerous.

That’s the DNA of the game. Mapping it – sequencing soccer’s genome – is a project that can make it a bit better understood for anyone who cares to look.

Between the Lines: The Overlapping Fullback