How Important is a Good April?

by Andrew Mooney

This post also appeared on Boston.com here

Predictably, the Red Sox’ uneven start to the season brought with it more than its fair share of hand wringing. That’s understandable, given the way the beginning and end of last year played out, but there was still an amazing outpouring of angst over five or six games out of 162. Lest we forget, in 2011, the Sox rebounded from an 11-15 April, which included a season-opening 2-10 stretch to ascend temporarily to the top of the American League in August. A midseason turnaround like that leads me to wonder: how much does the first month of the season matter toward determining a team’s final record?

To investigate the question, I examined the April records of all 30 MLB teams for the past five seasons (resulting in 150 “seasons” in all), then matched them up with the teams’ records at the end of the season. The resulting plot is shown below. Each MLB team has five points on the plot (one for each of the past five seasons), each representing the April winning percentage and end-of-season winning percentage for a single season. I also included a line drawn through the points to describe the average trend of the data.

After performing a simple linear regression, I found that a team’s record in April was highly statistically significant in predicting that same team’s record at the end of the year. The R-squared in the model—a statistical measure that describes how well the fluctuations of the response variable (in this case, end-of-season winning percentage) are described by the corresponding changes in the explanatory variable (winning percentage in April)— was 0.257. This means that 25.7 percent of the variation in end-of-season winning percentage can be explained by teams’ April winning percentage.

It’s an interesting finding, since the average team played 26 games in April, or only 16.0 percent of its 162-game schedule. This implies that games in April mean more to a team’s ultimate regular season fate than what the simple win-loss record at the end of the month tells us. Under this reasoning, the first month of the season is worth the equivalent of 42 games in determining a team’s final October record.

A closer analysis of the data also reveals a few tidbits of interest. As illustrated in the graph, only one team (the 2009 Colorado Rockies) with a winning percentage of 0.400 or lower in April finished the season with a winning record. Similarly, of the 28 teams that won 60 percent or more of their games in April, 23 ended their years above .500.

Though the difference between a winning percentage of .400 and .600 in April is only about five games in the standings—a gap that could seemingly be overturned without too much difficulty over the course of the next 136 games—it’s a disparity that, in practice, is rarely surmounted.

I also tested whether a team’s Pythagorean expectation—its expected winning percentage based purely on the differential between total runs scored and total runs allowed—was a better predictor of final winning percentage. From season to season, Pythagorean expectation has been shown to forecast a team’s win-loss record better than the win-loss records from previous years, so I thought a team’s Pythagorean expected winning percentage in April might also correlate more strongly with end-of-season winning percentage. This method did slightly better than the previous one (R-squared = 0.266), but not enough to be a practically significant improvement.

Clearly, the first few weeks of the MLB season provide a limited amount of information about a team and its players — Chris Shelton, anyone? — but it appears they tell us more than we might initially think. A number of theories might be proposed to explain this phenomenon; for instance, it could be that a team’s early season start is important to establishing the clubhouse mentality that will prevail the rest of the season, creating a sort of self-fulfilling prophecy. Or perhaps, near the end of the year, when playoff berths are cemented and teams are eliminated from contention, the games matter less and thus aren’t as predictive of a team’s final record.

However, one thing is evident: April is not just any other month. The rates at which teams burst out of spring training into the regular season have effects that last throughout the rest of the year. Someone might want to mention this to Bobby Valentine; something tells me questioning the heart of one of your scrappiest players doesn’t really galvanize the guys into action.

About the author

harvardsports

View all posts

6 Comments

  • Aren’t you introducing a little auto-correlation by not comparing April to May-October instead of the entire season?

  • Can you provide R-squared results for all months in the season? I have a hunch that most months will have a larger R-squared value than percent of games that month of the entire season.

  • I did this for 2000-2009 to get a bit more data, and for each month (April-September), and got the following R^2:

    April: 0.32
    May: 0.35
    June: 0.31
    July: 0.36
    August: 0.42
    September: 0.51

  • agreed, this data is nice but ultimately presented in a vacuum. before claiming that “April is not just any other month,” it would be nice to know where april stands when compared to those other months of the year, or even with randomly sampled ~25 game streaks within a season. i would be surprised if many playoff teams (roughly .550 or better) had a 25 game streak below .400, in April or otherwise.

  • Nice article. Adding the 45-degree line to the plot would visually help because the x and y axes have different scales. This would also put the red trend line into better perspective and make interpretation easier.

Leave a Reply to Dan Cancel reply

Your email address will not be published. Required fields are marked *