A quick introduction to our Analytics Challenge data set
In case you missed it, we’ve announced a football analytics challenge! We’ll be releasing some data that can be used to evaluate defensive linemen, including individual player alignment pre-snap and play outcomes. Then, we ask you to determine which D-line position is the most important!
This is the first time we’ve done anything like this, but we thought this was as good a time as any to start thinking about such a competition, because we can give the competition a purpose beyond the little sports analysis bubble we live in.
There isn’t an entry fee for the competition, but we ask that all participants donate any amount they would like to the United Negro College Fund. We as a company want to do more to promote racial equality, and with your help, we can take a step (or many steps!) towards that goal.
(By the way, even if you’re not interested in the competition, feel free to donate via the GoFundMe page we created for the challenge.)
For those who want to explore some of our previously-unreleased data, maybe this brief article will help give you a lay of the land before you embark on the challenge!
“What’s in the box”
The data set used for the challenge combines both play-level information and player-level information from weeks 9 to 17 of the 2019 season into a single file.
The play-level section includes basic play-by-play plus some extra data points like the Expected Points Added (EPA) of the play. There are also some interesting details about the play that might be relevant to your analysis, with a few data points specific to each of pass or run plays.
The player-level section includes the identity of all of the up-front defenders on the play, their positions as named on the roster, their alignment on that play in particular, and a number of stats they might have accumulated.
A player is included if he (a) was in a 3 point stance, (b) lined up standing on the edge on the line of scrimmage, or (c) usually lines up as a DL, even if he might have been up or off the ball on this play.
Some of the stats included in the file are given at both a play level and a player level. For example, InterceptionOnPlay will tell you if anyone intercepted the pass, and Interception will tell you if the specific player referenced in that row intercepted the pass.
A little extra info on defensive alignment
Most of the data we’re releasing for this challenge is pretty self-explanatory to anyone who has played around with football data before. The defensive alignment info is probably the biggest exception.
For starters, we have what we’re calling RosterPosition and OnFieldPosition. The former is just what we have the player labeled as on the roster. The latter is his position on the given play. In this context, that basically means “did you have your hand on the ground?” If “yes”, then you’re a defensive lineman. If “no,” then you’re a linebacker.
The one piece that requires a little more football know-how is the technique (i.e. alignment) of the defenders on each play. In the file it’s called TechniqueName.
The technique of a defender is encoded using a (mostly) numeric system where your alignment is measured by which offensive player you line up against and on which side of that player you line up. See this image from the SIS Football Rookie Handbook:
Looking at this image, you can see that when people refer to “3-technique” or “5-technique” they’re talking about lining up just outside of either the guard or tackle. And the same structure is used for either side of the center, so you might have multiple players with the same technique on a given play, just on different sides of the center. The player’s side of the ball is encoded with SideOfBall, which is from the defense’s perspective.
There’s also another data point that isn’t quite alignment-related but does convey specific information about what a player was doing on a given play. The IsRushing column tells you whether the given player was rushing the passer on designed pass plays. That column will always be zero on designed run plays.
A few more notes on the data
Unsurprisingly, there are run plays and pass plays in the data set. The EventType column tells you whether the play was a pass or a run—not by design, just in result. So a scramble would be counted as a run play for this purpose. There are also “Challenge pass” and “Challenge run” event types, which are just passes or runs where a replay review changed the call on the field.
For the purposes of this kind of analysis, it’s likely fine to just assume that the “challenge” version of each event type is the same as the regular one.
We have included RunDirection and UsedDesignedGap to help you analyze run plays based on where the play was designed to go and whether the offense succeeded in running that direction.
The run directions are gap-based using the A-B-C-D naming convention (moving from inside to outside). A run to the left B gap, for example, was intended to go between the guard and the tackle on the left side.
If a run was intended to go between the right guard and the center and the rusher bounced the run outside the tackles, RunDirection would be “Right A Gap” and UsedDesignedGap would be set to 0.
In addition to basic information like whether the pass was completed or intercepted, we have also included the air yards on the throw (ThrowDepth). At both the play level and player level, we’ve included information about Pressure (hits, hurries, knockdowns, sacks) and PassBreakup (defensed, batted, deflected, or intercepted passes).
While we understand that the value of defensive players can be affected by their ability to draw offensive penalties (or commit penalties themselves), we decided that we would remove all plays with an accepted penalty from the data. There is enough gray area in how one should approach analyzing plays with penalties that it was decided to remove them from the picture.
It’s going to be a fun month while we have this challenge going! If you have any questions about the data set or the competition in general, don’t hesitate to e-mail email@example.com.