I have a lot of questions about the world of football. Do clubs respect powerful opponents too much? Does this respect lead to a reinforcing pattern of Powerful club → play defensively → club looks more powerful → play more defensively, etc.? Is there a way to data-mine (like, really data-mine-apply-machine-learning-style, baby) the data recorded during games to create prescriptive insights for future matches on a granular basis?
This series of articles intends to dig into a few of these questions.
Quantifying Startelf Selection against Strong Opposition
We can pose the trivial-to-observe, yet difficult-to-quantify question of “to what degree do managers adjust their tactics based on the strength of their opposition?” In the Bundesliga context, this question is usually asked about how clubs lineup against Bayern Munich. Many commentators and viewers accuse clubs of playing “too defensively” against Bayern, hence in their minds, selecting a weaker Startelf (i.e. starting lineup or “starting eleven”).
Is this observation true?
Using analytics, one down-and-dirty way to answer this question is by comparing the different abilities of a manager’s chosen Startelf. Thus, our explicit question is “do managers adjust their Startelf against stronger clubs, and if so, how so, and to what degree?”
In an ideal world, event-level data (i.e. every touch of the ball classified and described) would be readily accessible, but in reality, it is hard to come-by (without shelling out the money that a club has, but we poor hobby-analysts, alas, do not have). This event-level data would be ideal for analyzing tactics—but we do not have access to it. However, luckily for us, there is *some* data available for our purpose of hacking our way to some proxies for tactical approaches (as in selection of starting eleven).
To this end, I analyzed the global behemoth EA Sports FIFA video game series data and match data from this (now defunk?) website, which includes matches from the 2008/2009 through 2015/2016 seasons. This EA Sports FIFA data set tracks 38 different abilities for players, such as crossing, finishing, short passes, volleys, free kicks, positioning, marking, standing tackling, etc. This is super useful data because FIFA has already translated real-world statistics and abilities into many quantified abilities for the players to be as realistic as possible in the game. Whether or not the ratings are “perfectly accurate” is a debate I’m not too interested in having at this moment, but the data is useful enough for our analysis this time around.
The Analytical Approach:
For exploring the question of player selection dynamics, we’ll look at the two consistently strong clubs in the Bundesliga over the last decade: Bayern Munich and Borussia Dortmund. Bayern, omnipresent at the top of the Bundesliga table, give us a good opportunity to look at years’ worth of match data to see if we find consistencies in the characteristics of the oppositions starting eleven. Borussia Dortmund, the second team we’ll look at, also gives us a great opportunity to see if other teams adjusted the characteristics of their lineups vs. BVB once BVB ascended to be a top team during the 2010/2011 Bundesliga-conquering season.
Step 1: Extracting Data from FIFA (the Video Game)
In order to make the EA Sports FIFA data set easier for analysis visually (which is not strictly necessary, but makes for a much more fun article to read 😉), we can summarize the abilities into different ability-groupings (Defensiveness, Offensiveness, Speediness, Goal keeping, etc.) using a standard statistical technique called Principal Components Analysis – we won’t get into any math details here, but try to envision a process something along the lines of …
- We feed the EA Sports FIFA data set to a computer.
- The computer spits back out which abilities are grouped together (without us needing to tell the computer which ones should be grouped together).
- The computer also spits out a rating for each ability-group for each player.
Got it? Three basic steps. Think of this process as magically grouping key abilities, which are essentially averaged and calculated for every play.
This process results in three relevant ability-groupings for our question at hand: 1) Defensiveness, 2) Offensiveness, and 3) Speediness of the players on the pitch. Once this step was completed, we now have a rating in each one of these three groupings for each player each time FIFA updated their stats (a few times a year sometimes).
(Note: We know players abilities change over time, so to account for possible fluctuations we use the players’ ability that was calculated in that season by EA Sports FIFA. When ratings are updated within a single season, a simple mean of the rating is used.)
With this data in place, we can move to the next step by posing this question: do managers adjust the overall player selection dynamics (Defensiveness, Offensiveness, and Speediness) when facing stronger clubs—and if so, how?
Step 2: Comparing Startelf Ability-Groupings (vs. Bayern, vs. BVB, or vs. Other Clubs)
The main idea behind step 2 (in the section above) is to compare how, what I’ll call, a club’s “Startelf dynamics” (i.e. Defensiveness, Offensivenes, and Speediness) change when facing Bayern Munich or Borussia Dortmund, compared to facing all other Bundesliga clubs. In order to control for the impact of roster changes (injuries, transfers, sales, loans), we can compare the ability groupings for each team when they face Bayern or BVB compared to when they play anyone else at the season-level, so there are no worries about roster changes. Yes, some players moved in/out at the January window, but those players *more often than not* still played proportionally against Bayern, BVB, and all others clubs due to the symmetrical scheduling by half-seasons in Germany.
Once the data is crunch, we’ll take a look at a number of “three humped” charts like this one:
However, before we look at results, we first need a quick overview of how to interpret the charts. These charts are called “kernel densities,” which is a fancy way for saying they are smooth histograms. This plot above is for every player during the 2008/09 Bundesliga season. On the chart, the further right you go, the more defensive the player; the higher the line, the more “counts” of players around that level of Defensiveness/Defensive ability. This “density” shows us the distribution of players across this one ability-grouping.
Findings for How Clubs Approach Bayern:
What we are really interested in seeing is if teams that play against Bayern have materially different distribution of players Defensive, Offensive, and Speed abilities compared to when they are not playing Bayern. This is easy to show with “kernel densities” visually.
In chart 2 below, you can see how two “densities” (the grey color = teams are playing clubs other than Bayern; the red color = teams playing against Bayern). You can see that even though there are still general peaks at midfield and defensive back positions, the entirety of the teams’ Startelf shifts to the right. In 2015/2016, this effect is especially apparent; notice how the midfielder bump almost entirely disappears, shifting to somewhere between midfielders and defensive backs completely—hinting at the use of very defensive-minded midfielders against Bayern.
What about offensively? Do team shift their starting eleven away from attacking abilities?
Actually—they don’t against Bayern! This was surprising to me (see chart 3 below)—but I’ve got some hypotheses about why. Teams consistently shift *some* of their Startelf toward being more offensive, but in a very distinct way. What’s happening is what we call a “bi-model distribution” of Offensiveness. You can see that in the right hand “mass” (say from 750+) teams are not shifting *all* of their selection to be more offensive, just a subset—this is especially clear in the 2008/2009, 2009/2010, 2014/2015, and 2015/2016 seasons. To me, this effect is an indication of selection for counter-attacking chances against Bayern.
You can see similar dynamics in the Speediness distribution against Bayern in chart 4 below. Generally, teams are drastically increasing the Defensive abilities of their selections while also shifting toward an Offensiveness + Speediness subset—i.e. defend like hell and counter with 1 or 2 attack-minded, pacey individual players. It seems this is the conventional formula for facing Bayern in the Bundesliga, at least according to much of the past decade.
Findings for How Clubs Approach BVB:
Now, perhaps this dynamic we found above is idiosyncratic, limited to only when sides are facing Bayern … but I suspect this is a Startelf pattern employed against most dominant teams. We can test out this hypothesis by tracking how clubs responded to BVB’s dramatic rise in performance from the 2008/2009-2015/2016 seasons. This time period provides a natural pseudo-experiment to see if the distribution of player abilities shifts over time when teams play BVB before their 2010/2011 championship season and after, once BVB was again confirmed as a Bundesliga powerhouse.
Visual inspection of the Defensiveness of starting elevens confirms this hypothesis below. You can see that up through the 2010/2011 season, the distribution of defensive abilities by teams against BVB is fairly similar to when teams play anyone else. However, starting with the 2011/2012 season, BvB are shown more respect and you can see that each year after, the defensive ability of the players fielded against BVB shifts clearly to the right (more defensive).
Offensiveness and Speediness also shift to the right after the 2010/2011 season—having teams rely on the counter-attack.
Two Approaches: against Bayern vs. against BVB
Just for comparison we can explore playing Bayern vs. BVB vs. playing any other team simultaneously. Defensively, teams eventually started fielding more defensive teams against BVB than even against Bayern, however Offensive and Speediness is still adjusted heavier against Bayern.
Defensiveness traits of players selected for the Startelf:
Attacking traits of players selected to for the Startelf:
Conclusion and Next Steps:
In this analysis we clearly see the adjustments in the starting eleven selection against strong teams is measurable. Teams hedge toward Defense with a few adjustments of Offensive-mided, Speedy players when facing strong opponents like Bayern Munich and Borussia Dortmund.
This article is really just skimming the surface of prescriptive data analytics for Football. In the next article, we will use the same data set to dig a bit deeper into what surely every big club is attempting to do analytically—identify key factors, features, characteristics, or patterns in clubs and key drivers of success for teams that do perform well against them. Without access to event-data like Opta, we’ll continue this exploration using the EA Sports FIFA data set.
Speaking of data, for fancy and not-so-fancy statistical reasons, identifying the key drivers of success in a game is *insanely* challenging, but could be made slightly easier using expected goals instead of goals or goal diff in a game…so if anyone has XG values for each Bundesliga match going back many years, let us know!
Latest posts by Josh MacCarty (see all)
- Exploring Prescriptive Analytics in the Bundesliga: Playing Dead against Bayern? - January 6, 2020
- What Can Season-End Bundesliga Data Teach Us? - July 10, 2015