By Kat Ravichandran, Josephine Elting, Tommy Fitzgibbons, and Nai Hola
The NFL season kicks off tonight, and with it, the inevitable complaints about refereeing. One can hardly get through a Sunday without social media blowing up about biased officiating. By scraping the nflpenalties.com database, we constructed a comprehensive dataframe containing every penalty called from 2009 to 2024. Below, we investigate the NFL referees the way we would any game player: by the numbers.
We kicked off with the most pressing questions. The most called penalty? Offensive Holding. The most penalized position? Defensive backs. The most penalized team? The 2011 Oakland Raiders, who hold the record with 163. Soon, we were pushing our questions from just who gets flagged to why the flags are thrown. Are the referees truly neutral arbiters, or does the context matter?
Is Officiating Biased in Favor of the Home Team?
This age-old question is not as simple as it may seem. While, as the graph below illustrates, the home team consistently receives fewer flags, some discrepancies are to be expected.

Road teams, facing boisterous crowds and timing issues, are more susceptible to simple errors. This could explain why road teams commit 59% of accepted Neutral Zone Infractions and 58% of Defensive Offside penalties.
Yet, this adjustment does not justify why, across over 15 seasons, road teams are penalized for 53% of Defensive Pass Interference calls. While 3% might seem slight, in the context of 20,000 flags, this bias suggests more than road teams merely adjusting to a new stadium. It is hard to account defensive backs erring so much more than the home team due to simply crowd noise. Indeed, it seems much easier to understand how crowd noise could influence referees to become more judicious on the visiting team. This trend is consistent with other penalties that might evoke strong fan reactions, such as the almost 55% of accepted Unsportsmanlike Conducts called against the road team.
Finding that where the game is played could tilt the balance, we next explored if when might do the same.
Does the Time of the Penalty Matter?
The graph below visualizes penalty rates by quarter for the ten most frequently called penalties.

We immediately see that procedural penalties, like Neutral Zone Infractions and False Starts, are front-loaded in the first half, as players acclimate to rowdy fans and game time nerves. We next see that judgment penalties, like Defensive Holding, Defensive Pass Interference, and Illegal Use of Hands, rise at the end of halves, as stakes inevitably rise.
While players might be getting more aggressive as the clock wanes, such distinct rises suggest referees themselves are more inclined to throw the flag in crunch time, giving them an outsized influence on game outcomes. In the following graph, we overlay the average number of flags called per five-minute period, sized by the average number of plays in that period, to illustrate a similar trend in both halves of the ball game: while referees start light on penalties, they spike in the last five minutes of halves, as tensions heighten. Penalties are not simply rising steadily through the game, but rather spiking at predictable moments.

Of course, outside of individual games, penalties can also be viewed on larger, annual trends – as NFL rule changes also impact the flags the refs throw.
For example, in a 2014 response to the dominance of teams like the Seahawks’ Legion of Boom, the NFL emphasized illegal contact and defensive holding – with referees instructed to strictly enforce existing rules against contact beyond five yards downfield. We can see in the graph below how the rates of both calls spiked.

Similarly, for the 2021 season, the NFL prioritized enforcing unsportsmanlike conduct during dead-ball periods, penalizing taunting, baiting, and ‘demeaning acts.’ As the graph below shows, a spike immediately followed. To put it simply – and unsurprisingly to anyone who has ever been employed – when the NFL tells the referees to throw a flag, they throw the flag.

By now, we have seen that penalties can vary by context – where the game is played, how much time is left on the clock, or when league priorities shift. But, what about who is actually throwing the flags? Referees are not interchangeable cogs in the machine, but humans, who, like any player, can display a distinctive style.
Does the Officiating Crew Matter?
As you can guess, having come this far, the answer is yes.
Take Bill Vinovich, who has averaged the fewest penalties per game every year save two since 2016. Yet, even in 2018, when his crew recorded a mean of 6.7 flags per game to Shawn Hochuli’s 10.2, Vinovich led the league in Defensive Pass Interference Call Rates. A team facing such an officiating crew is at a competitive disadvantage to not consider these numbers. Even last season, it mattered if your team was regulated by Vinovich’s crew or, say, Brad Allen’s, who averaged a 60% higher rate of 4th quarter flags.
If teams are adjusting game plans not only to opponents but also to the stripes, then referees are no longer neutral adjudicators, but actors shaping the game itself.
And it is not just that different crews throw different flags, but that each referee, like all of us, adapts over time. Perhaps one of the most striking examples is Alex Kemp, whose November 14th, 2022, missed face mask call was explicitly attacked across America after ending the Eagles’ undefeated season. Since the missed call, Kemp, compared to his own average and the league average, has called facemasks at a distinctly higher level. Here, as shown in the graph below, we have a referee seeking to prevent repeating his own mistake – a valiant but decidedly irregular officiating style.

Conclusion
The intention of this article is not to throw flags on NFL referees for failing to perfectly officiate a high-paced and complex competition. The intention is to highlight that officiating is not an infallible system because we, as humans, are not perfect. We are influenced by crowds and crunchtime, league trends and personal preferences. We offer this article in support of the many proposed officiating improvements – from a Sky Judge assisting calls to expanded challenge scopes extending reviewability. To ensure fair playing fields in the NFL, we must first ensure they are properly officiated.