Modeling the Subjective: 2019 Gold Glove Awards
By Chris Weikel and Sam Weber
The Rawlings Gold Glove, given annually to the MLB players who exhibit superior defensive performances, is a fickle and ever-changing award. Despite how much weight the Gold Glove is granted when discussing Hall of Fame careers, its actual inputs are vague and amorphous. The award has long been voted upon by MLB managers and coaches, but since 2013, in an attempt to combat its subjectivity, the SABR Defensive Index metric has accounted for 25% of the vote.
This statistic is a combination metric that integrates Ultimate Zone Rating’s (UZR) zone-based method with the hybrid play by play/zone based formula of Defensive Runs Saved (DRS) and Chris Dial’s Runs Effectively Defended. This change has allowed us to build a model that attempts to predict the award recipients, as we now have enough seasons of winner data under the new criteria.
Due to the award’s 75% reliance on subjective voters, we first had to test whether any public defensive statistics were actually taken into account by these managers, as they hold the most weight. Below is a visualization for all the Pearson coefficients of major defensive metrics and Gold Glove winners, bucketed into three-year stretches since 2003, the first year that DRS was available.
The original correlations start out very weak because in 2003 the voters’ buy-in for defensive metrics was almost nonexistent. But as time goes on, they slowly pick up more and more steam, with the 2013 shift to more analytical selection methods marking the final large jump in bucket 2. The current correlations convinced us that modeling this very subjective award is now possible, as long as we also take into account voter biases like previous award winners and flashiness (we used the Good Fielding Play component of DRS for this – SIS Video Scouts reward players for making notable plays that would not be acknowledged in a box score).
Our final model is a binary logistic regression with variables ranging from the DRS components and UZR (up to September 23) to previous Gold Gloves won. The model also incorporates a Gold Gloves-per-age factor to help weed out aging winners, while adding extra weight to young stars.
We performed rigorous 10-fold cross validation testing and determined our model to be the best predictor with a .97 sensitivity and .44 specificity. This may seem low, but for the training and validation set, the model does not realize that only one person at each position, in each league, can win each year; it just takes a winner as anyone over a certain probability cutoff set to pick the appropriate proportion of victors for that sample.
We then filter the winners by probability and delegate the award to the top probability player in each league, at each position, each year who played enough innings to qualify. We excluded both catcher and pitcher as their limited metrics and far more unique defensive requirements require different modeling than the other fielders.
The MLB uses three finalists to build up suspense, so here are our model’s projected Gold Glove winner and the top three finalists for each award. We also included extra information on the top probability earner.
- Paul Goldschmidt
- Anthony Rizzo
- Eric Hosmer
- Matt Olson
- Ronald Guzman
- Carlos Santana
Overall Probability Leader 1B: Matt Olson
Oakland’s Matt Olson is an artist. He makes an at-times-mundane position, first base, as exciting as shortstop. Anchoring one of the better defensive infields in the league, Olson allows teammates Matt Chapman and Marcus Semien to shine while also making quite a few highlight plays himself.
He was by far the rangiest first baseman in the league, posting 12 runs saved from the Range and Positioning DRS component alone. Where Rizzo thrives is in handling difficult throws, securing 32 of these attempts for his teammates (second to Pete Alonso’s 33). Rizzo gets a huge bump here, but unfortunately the other aspects of his defense bring him back down to the pack.
Our model sees Olson’s consistent, across-the-board production and decided it outweighs the previous Gold Glove resumes of Goldschmidt and Rizzo, so it selected him as the top contender.
- Kolten Wong
- Max Muncy
- Adam Frazier
- DJ LeMahieu
- Yolmer Sanchez
- Hanser Alberto
Overall Probability Leader 2B: DJ LeMahieu*, Kolten Wong
Although the Yankees’ DJ LeMahieu was the model’s selection at second base, we decided to discuss the second highest probability winner, Kolten Wong, because of LeMahieu’s significant use this season at multiple other positions, such as first and third. Even though Wong has never won a Gold Glove, he has been seen favorably by DRS. In the past two years, he’s taken a major step forward, accumulating 19 total runs saved last season and 14 this season (his previous high was 9 in 2014). Compared to LeMahieu, he also seems to handle difficult line drive outs well, amassing five of these GFPs compared to LeMahieu’s one at the position:
- Nolan Arenado
- Evan Longoria
- Brian Anderson
- Matt Chapman
- David Fletcher
- Kyle Seager
Overall Probability Leader 3B: Nolan Arenado
Nolan Arenado comes in as the overall probability favorite to win the Gold Glove at third base in 2019. Three 20-plus DRS seasons in the last six years have helped the Rockie win the NL award every year since 2013. His 2013 season was particularly stunning, amassing 14.6 UZR and 17.7 Fielding Runs Above Average to go along with his 30 total Runs Saved. Since our model takes into account the subjectivity of voting, Arenado is helped tremendously by the fact he’s won the NL award every year since 2013. Second overall was Oakland’s Matt Chapman, who’s arguably had better defensive seasons at third base in recent years, though he was still great in 2019:
With Chapman’s sole win coming last year, the model heavily favored a five-time Gold Glover in Arenado to be the top pick. That’s not to say Arenado hasn’t been superb throughout his career, as evidenced by plays like this one that will keep him a formidable force at the position for years to come:
- Nick Ahmed
- Trevor Story
- Javier Baez
- Andrelton Simmons
- Francisco Lindor
- Adalberto Mondesi
Overall Probability Leader SS: Andrelton Simmons
At shortstop, we have another case of a perennial winner taking home the overall top spot. Andrelton Simmons of the Angels comes in as the favorite at the position. A winner in 2013, 2017, and 2018, Simmons, for a period of time, was considered possibly the best defender in Major League Baseball. His range and ability to make tough ground ball outs (like this one) contribute strongly to his Good Fielding Play totals and other metrics like UZR.
Like Arenado, his 2013 season was pretty remarkable. Simmons totaled 14.8 UZR, 27.2 FRAA and 41 DRS, the second-highest total for the statistic behind Kevin Kiermaier’s 2015 season.
Last year’s NL winner Nick Ahmed came in at second and, like Chapman, was hurt by the model for not winning as many previous awards.
- David Peralta
- Joc Pederson
- Marcel Ozuna
- Alex Gordon
- Andrew Benintendi
- Michael Brantley
Overall Probability Leader LF: David Peralta
David Peralta would be our theoretical overall winner in left field. Peralta’s a name that may not have gotten a lot of attention playing in Arizona this year, but the 32-year-old outfielder was an anchor at the position. He put up an impressive 6.2 UZR and 10 defensive runs saved this season, considerably better than his runner up Alex Gordon, who only had a 3.2 UZR and -1 DRS. Again, we see an example of a six-time winner in Gordon being assisted by his previous prowess, but Peralta has performed well enough to win at a position that’s slightly devoid of talent.
- Victor Robles
- Lorenzo Cain
- Harrison Bader
- Kevin Kiermaier
- Jackie Bradley Jr.
- George Springer
Overall Probability Leader CF: Kevin Kiermaier
Kevin Kiermaier of the Rays comes in as the top probability winner, with Victor Robles behind him. Both have had strong seasons, with Kiermaier leading Robles in UZR, but Robles having 22 DRS compared to Kiermaier’s 13. With what’s become a theme of this piece, Kiermaier is helped by the model favoring players with previous wins, as he’s a two-time Gold Glover compared to Robles’ none. As previously mentioned, some of Kiermaier’s seasons have been remarkable, especially his 2015 campaign in which he totaled 42 DRS, the highest total since its inception in 2003. He’s certainly not a bad choice for the award by any means.
- Jason Heyward
- Cody Bellinger
- Hunter Renfroe
- Mookie Betts
- Josh Reddick
- Max Kepler
Overall Probability Leader RF: Jason Heyward
Right field turned out as effectively a tie between the Dodgers’ Cody Bellinger and the Cubs’ Jason Heyward. Taking a quick look at their 2019 defensive stats , this might be a bit striking: Bellinger had 19 Runs Saved and a 9.5 UZR. Heyward had 7 Runs Saved and a 2.4 UZR.
If we’re looking at this year’s numbers, the Gold Glove shouldn’t even be a contest between these two. Heyward, though, is helped tremendously, probably too much, by the fact he’s won five times by his age 29 season, whereas Bellinger has yet to win. Before any previous Gold Glove inputs were added to the model, Bellinger was the overall favorite to win, showing that subjectivity of voting and previous wins can play a major role in who ends up with the hardware.
This was our first attempt at building a Gold Glove model and it’s clear it needs some tweaking. It performs fairly well on most occasions but does tend to overvalue players who have won previous Gold Gloves. One potential way to fix this overemphasis on past winners is to eliminate more of the older years from the training data. We do need to go back in time relatively far in order to incorporate enough positive results in the sample, but voting in the pre-defensive index days was far more friendly to past winners than the current system (for example: Derek Jeter)
Removing some of the pre-2013 years — along with adding team record as a small adjustment to account for more bias — could make the model more robust and accurate. Overall, our model appears to be a successful way to judge Gold Glove contenders. Nevertheless, the true measure of whether our model performed admirably won’t come until we see the final votes.