By David Roher

Click here for the original post on parity.

In my previous post, I analyzed parity by looking at the standard deviation of team’s records in a given year. At the suggestion of a couple astute commenters, here is a different way of looking at parity. For each team in each year, I measured the standard deviation of their number of wins (standardized to 162) over the past 5 years. Then I took the mean of all of those standard deviations for each year, and the result is the graph above. **Unlike the previous graph, the higher the value, the greater the parity**.

Once again, I hope you can come to your own interesting conclusions in the comments, but here are some things I noticed:

– Unlike the previous graph, there’s no relationship between parity and the progression of time over the course of baseball history. When you remove the 5-year spikes (more on those later), the values are fairly consistent.

– That said, it does look like there was a definite increase at the start of the 1980s (spike aside). This corresponds roughly with free agency taking hold (the ruling granting it was in 1975). Not enough evidence to say anything conclusive, but the potential implications are interesting. Today, we connect free agency with uneven spending. But an equally important result is the more free flow of players from one team to another, a factor which would definitely increase variance in wins over a 5-year period.

– Here’s what I found most interesting: the location of those three spikes. They started in 1915, 1981, and 1994. Those last two years immediately rang a bell: they were the same years as the largest two work stoppages in baseball history. There was another labor relations problem in 1915: that was the year that the Federal League, the last third major baseball league, folded, resulting in a couple teams being merged into the AL and NL. Presumably, there was also a flood of players from the now-defunct league looking for jobs in the two still existing ones. As a result, the 1915 spike makes sense. But there was no inherent player movement in the ’81 and ’94 strikes – rather, it was more indirect. The damage done by the strikes likely caused teams to reevaluate their financial standing.

I think the 1981 and 1994 surges reflect the shortened seasons of those years (and 1995). Obviously, win totals were radically different. I’m guessing that explains why the spikes last exactly 5 years — once the strike year is left out, the SD returns to a normal level.

Also, you have to be careful with expansion teams — they shouldn’t be included until they have at least 5 years of history.

That’s a good point, but there are other years in which the number of games changed that don’t result in a spike. Each season is standardized to 162, so that wouldn’t cause a problem in and of itself. But a short season could absolutely do something – I’m just not sure that’s what’s being reflected here.

David: I don’t know why, but your standardization didn’t work. If I use raw win totals, I get these results which are almost identical to your graph:

1976-80: 8.0

1977-81: 14.7

But if I first convert everything to winning % (which adjusts for games played), I get this:

1976-80: 8.0

1977-81: 8.15

The spike disappears if you adjust for games played.

There’s a good lesson here: if you see a change this dramatic — a near doubling of the variance in one year, and then it vanishes five years later — double and triple check your work. It’s almost always a data or calculation error. The real world just doesn’t change that fast (especially when you’re using a 5-year average — ’76-’80 and ’77-’81 share 3 years in common).