Highlights from submissions to our first Football Analytics Challenge
BY ALEX VIGDERMAN
Many (hopefully all) of you know that we recently concluded the initial judging of our first Football Analytics Challenge. We released some previously-locked-down defensive alignment data to the public and asked people to come up with an answer as to which defensive line position is the most valuable. To go with the competition, we also asked registrants to donate whatever they could to the United Negro College Fund.
To bring in 133 donations totaling $3,300 so far was beyond our expectations. And we ended up with a solid crop of 34 submissions for the competition, with the finals being presented on YouTube tomorrow night (Wednesday July 29)!
While we are obviously excited to show you the research that was done by the finalists, we didn’t want to turn away from the work of the other 31 teams off to the side. So here are a few highlights of the efforts of the rest of the participants.
As a company that dabbles in multiple sports, we appreciate it when analysts draw from multiple sports in their work. Both Nate Rowan and Sam Chinitz cited baseball’s Weighted On-Base Average (wOBA) as the inspiration for their approach to valuing the events on a play. Rowan called his key metric “Points Gained,” which essentially measures the value of a charting data point by taking the difference in EPA/play between plays with and without that event occurring.
Matthew Reyers, Meyappan Subbaiah, Dani Chu, and Lucas Wu leveraged two key resources outside of the provided data set to aid with their research. The first was the nflWAR paper by Yurko et al (whose work multiple submissions referenced), and the second was the predicted yards at the time of the handoff from the 2019-20 NFL Big Data Bowl winners.
Sam Struthers and Adrian Cadena used ideas about division of credit from the Yurko paper to distribute EPA among the players who had a chance to be involved on a play. They also estimated the extent to which edge pressure affects the performance of the interior line and vice versa, which was a unique approach.
Alex Stern invoked multilevel modeling (which does a good job in measuring player-to-player variation when sample sizes can differ wildly) to evaluate the same concept of Individual Points Added. In the passing game, the model focused mostly on generating pressure, which was a decision that many teams made thanks in part to recent research from Timo Riske of Pro Football Focus.
Calvin Smith used a linear model to predict the EPA of a play based on the existence and direction of pressure. Unsurprisingly, avoiding pressure altogether is the most valuable, with outside pressure being the most effective at reducing the offense’s EPA.
Matt Colón, Silas Morsink, Robbie Thompson, and Peter Gofen were one of a couple teams (including one of the finalists) who used Madden ratings to help quantify player talent. The group’s approach to evaluating play outcomes was what stood out the most, however. They figured that defensive linemen don’t have much impact on the specific final result of the play, but they do affect what kind of play it was, roughly. So, when evaluating the contributions of each defensive line position, instead of using actual play results, they replaced each play’s EPA with the average EPA value for many different kinds of play results (e.g. “Rush big loss”, “Screen under pressure”, “Medium pass”).
Dan Rees used some notions of how to break down a play using charting data that we use ourselves within our Total Points statistic. He also focused on the range of possible EPA values on a play when judging a player’s opportunities instead of just the EPA itself, which he called a play’s EPA Range. David Schmerfeld also took a Total-Points-esque angle at valuing plays, and added in explicit measures of “Indirect Impact” that allowed interior linemen to receive identifiable credit for their more subtle play-to-play value.
Keegan Abdoo and Mehmet Erden used a similar approach, using a linear model that controlled for situational factors to estimate the EPA contribution of a streamlined set of charting data points on each of run plays (forcing the rusher to bounce or cut back) and pass plays (pressuring the quarterback or breaking up the pass).
A few teams used clustering to robustly characterize player positions using some combination of roster position and play-to-play alignment. One of the better implementations of that belonged to James Hyman, Colin Krantz, Brendan McKeown, and Kushal Shah, who used a random forest to model the most likely roster position for a player (including a Hybrid DE/DT position) and then combined those with defensive line techniques to feed the clustering algorithm.
We’re so glad to have received so many great submissions to our competition. Feel free to check out work by the finalists or by anyone else in the competition on the competition’s GitHub repository.