by Diego Martinez
Editor’s Note: HSAC is excited to partner with Sportico, a new publication from Penske Media Corporation. You can read a summary of this article, as well as outstanding coverage of sports business and media, on the Sportico website here.
Consider this Cooperstown curiosity: Among the 20 Hall of Famers inducted as members of the Red Sox or the Reds, only two have been pitchers (Pedro Martinez, 2015; Eppa Rixey, 1963). Meanwhile, since their 1958 move to Los Angeles, the only Dodgers players enshrined have been pitchers. Just a cursory look at a franchise’s very best suggests that even through changes in ownership, management and player personnel, some teams simply gravitate to an overarching identity over time.
Utilizing Fangraphs data from 1969 to 2019, I used k-means clustering, a machine learning algorithm, to broadly distinguish team types—offense-focused, pitching-reliant, neutral—then analyzed them by franchise to see if trends emerged over the course of an organization’s history.
Now, the run scoring environment of 1969 and the early 1970s is drastically different than it is today. Taking this into account, I standardize the statistics on a per year basis. For example, the Red Sox led the league in home runs with 197 in 1969 while the Twins led the league in 2019 with 307. In my cluster analysis, I will not be looking at these nominal values, but their relative value compared to other teams during that particular season. Thus, when these two home run totals are standardized by year, both will have a significantly high z-score and these two teams can be clustered similarly.
Offense vs. Pitching Clustering
I began with a very broad clustering of Hitting vs. Pitching for every team with a record above .500. I chose .500 as a benchmark assuming teams with a record of .500 were relatively competitive and it did not make sense to label a team as a good hitting or pitching team if they had a losing record. From the plot above, we can almost draw a line where y=x with points above the line being better at pitching (red) and those below being better hitting teams (blue).
Shown in the plot above, most franchises fall somewhere within 20% of the mean; this means they have fielded winning teams that have pitched well, but also hit well in other years. But there are some fascinating outliers that appear to exhibit tendencies—perhaps inspired by organizational philosophies—to build successful teams in a certain type of way. Let’s take a look at three examples scattered across the spectrum.
Beginning with the Cubs, a franchise squarely categorized as neutral, their focus over time seems erratic, bouncing between short stretches of good hitting (bottom row) or good pitching (top row) in their quest to break the Billy Goat Curse. The Mets, on the other hand, have only had a handful of winning seasons behind a hitting-oriented team. From the 1969 Miracle Mets, led by Tom Seaver, to today’s team, anchored by dual aces Jacob deGrom and Noah Syndergaard, the Amazin’s have often been built around strong pitching. Finally, when you think about the Reds, the Big Red Machine of the 1970s should automatically come to mind. Looking at their history, the Reds have mostly seen success behind offensively dominant teams, particularly in the 1970s and the second half of the 1990s. The notable dearth of plot points in recent years is because Cincinnati has only had a few winning seasons since 2000. But with the huge investments they made in signing Nick Castellanos and Mike Moustakas this offseason, the Reds appear to be returning to the high-powered offenses that once defined the franchise.
While the broad distinctions are illustrative, I wanted to drill down on specific identifying philosophies, so I broke down each category even further.
Beginning with offense, I chose variables to highlight different aspects of hitting, particularly, the ability to get on base (OBP), the ability to make contact/put the ball in play (BABIP), and the ability to hit for power (HR, ISO). With a k = 6, signifying 6 clusters, I was able to explain 70% of the variation in offenses from 1969-2019.
The k-means clustering yielded fairly distinguishable categories of offenses based on z-scored statistics. The k-means algorithm defined 6 cluster centers as follows where each value represents the average z-score for the variable within the cluster. In the table below, high positive values correspond to being very good in regard to that particular offensive category, while negative numbers correspond to being below average.
|Cluster||Cluster Description||Team Example||Player Example||HR Z Score||ISO Z Score||BABIP Z Score||OBP Z Score|
|1||Contact Dependent||1982 Cardinals||Tim Anderson||-0.94||-0.70||0.93||0.47|
|2||Pure Hitting||2018 Red Sox||Joey Votto||0.52||0.77||1.54||1.47|
|3||Below Average||2012 Astros||Jackie Bradley Jr.||-0.47||-0.47||-0.33||-0.47|
|4||Power Dependent||2010 Blue Jays||Khris Davis||0.93||0.70||-0.96||-0.42|
|5||Three True Outcomes||2018 Yankees||Joey Gallo||1.59||1.48||0.09||0.97|
|6||Avg. All Around||1981 Dodgers||Adam Eaton||0.30||0.39||0.31||0.48|
Some clusters are intuitive and easy to define, like the Contact Dependent cluster, which includes the Whiteyball Cardinals of the 1980s, or the Power Dependent cluster, which features the 2010 Blue Jays, who led the league in home runs but were below average in most other offensive categories. A few of the others were harder to define and differentiate. I did not include strikeouts in the k-means clustering, but very interestingly, the average z-score for strikeouts in cluster 5 was positive, which is why I labeled the group as Three True Outcomes (HR, BB, K). The Pure Hitting cluster, which is marked by high values across the board, could best be personified by vintage Joey Votto. On a team level, that means a lineup that gets on base with their contact skills and plate discipline, with some pop mixed in.
To simplify the data, I organized the six clusters onto a spectrum, ranging from contact dependent to power dependent.
|Cluster||Cluster Description||Power Index|
|5||Three True Outcomes||0.75|
|6||Average All Around||0.50|
This index has nothing to do with being successful as an offense, and the higher values do not signify better teams. The index only takes into account how important power was to the offense. Even though cluster 2 (Pure Hitting) have positive power ratings, the k-means clustering distinguished this group by its ability to get on base and get hits when they put the ball in play whereas those in the Power Dependent and Three True Outcomes clusters are presumably more swing for the fences such that power defines the offense.
In the graphic above, notice the trend lines and how they relate to 0.5, which represents “neutral” in regard to power vs. contact. Many organizations fluctuate or remain somewhere in the middle signifying no distinct trend when it comes to reliance on power; however, others stay well above or below the line for the majority of their histories. Is keeping a certain trend over history the key to success? Not necessarily. After being a mostly neutral team before, the White Sox had a huge spike towards power hitting teams in the early 2000s, leading to their first World Series win since 1917. Where some teams change and adapt to compete, others have stayed true to offensive identities over their histories. Organizations such as the Orioles, Athletics, and Blue Jays seem to always have power driven offenses while the Royals, Cardinals, and Pirates are more are more prone to success through contact and getting on base.
Another fascinating result, given where they play, is the Colorado Rockies’ consistently being at or below 0.5 on the power index for the majority of their existence. They have, perhaps surprisingly, never led the majors in home runs, and they have only finished in the Top 5 a handful of times over their history. Instead of building lineups around pure power hitters, the Rockies tend to develop and acquire players who make consistent, hard contact, knowing the power numbers will come as a built-in byproduct of playing in Denver. Rockies legends like Larry Walker and Todd Helton are excellent examples.
Distinguishing philosophies around pitching was more of a challenge. An entire staff will never be uniform, usually consisting of a healthy mix of different types of pitchers. Thus, at the risk of oversimplification, I differentiated teams by starting rotations vs. bullpens over their histories. I used two variables to cluster team’s pitching: Starter WAR and Reliever WAR. WAR accounts for park factors, includes a leverage adjustment for relievers, and Fangraphs’ version of WAR is based on FIP (Fielding Independent Pitching). Thus, it is more dependent on what a pitcher can control without taking the defense that plays behind them nor the ballpark they play at into account. With these two variables and 5 clusters, k-means clustering was able to explain 72.3% of variation.
The clusters were defined as follows:
|Cluster||Focus||Cluster Description||Starter WAR||Reliever WAR|
|1||Starters Focus||Plus Starters, Below Average Bullpen||0.18||-0.89|
|2||Bullpen Focus||Plus Bullpen, Below Average Starters||-0.13||0.47|
|3||Below Average Pitching||Below Average Pitching||-1.24||-0.21|
|4||Starters Focus||Plus Starters, Average Bullpen||1.39||0.26|
|5||Bullpen Focus||Plus Bullpen, Average Starters||0.32||1.59|
I found there is much more variability with pitching than with hitting. It was not uncommon to see the bullpen dominate one year, and then see the team carried by great starting pitching the next. That’s understandable given how challenging it is to fill the hole of losing an ace or a lights-out reliever to injury.
That said, there were still some teams that showed a penchant for one over the other. The Dodgers, Diamondbacks and Mets were more often starting-pitching oriented, while the Yankees, Mariners and Athletics seem to have built stronger bullpens over their respective histories. However, even in making these observations, I am a bit wary. The Yankees have had some all-time bullpens, including Hall of Famers Goose Gossage and Mariano Rivera, but they haven’t exactly lacked in the starters department either: Andy Pettitte, Roger Clemens, C.C. Sabathia and now Gerrit Cole.
Very interestingly, the Dodgers are the only franchise never to have had a pitching staff classified as below average in both the rotation and bullpen by my clustering analysis. High-quality pitching seems to be in the Dodgers’ DNA.
Aggregating the data, I created team profiles that broadly characterize organizational philosophies. In the table above, the darker the red, the stronger a franchise’s propensity for fielding a team dependent on that characteristic. Of course, there are ebbs and flows over the course of 50 years, in the talent and strengths; however, it is interesting to find that the A’s and Orioles, for instance, can be characterized by power over contact in 80% of their seasons dating back to 1969. Furthermore, it is intriguing how different organizations have taken different paths to achieving the same result. In the overall focus, the Blue Jays, Cardinals, and Red Sox were all classified as being offensively minded over the 1969-2019 period. However, in their more specific offensive focus, each was different with the Blue Jays being very power oriented, Cardinals being contact oriented, and the Red Sox falling in the neutral range of fielding both power hitting and contact oriented teams over their history.
Though the pitching analysis did not produce as significant results, it did show that most of the pitching-oriented franchises are in the National League; one would presume this has a lot to do with how the DH affects the American League.
Shorter Period (2000-2019)
Finally, I performed all the same k-means clustering analysis from above on the period of 2000-2019. My question was if the turn of the century and the changes that have been brought about by the Moneyball Era have led to significant changes in the tendency to field similarly constructed teams.
With only 20 seasons of data for each franchise, there is a substantially smaller sample size. Thus, a string of a few years oriented in one way or another can carry a lot of weight when classifying philosophies over this time period. There are slight changes from being neutral in the 1969-2019 analysis to having a slightly higher percentage in a particular focus like the Yankees from 2000-2019 or vice versa, the Phillies. Additionally, there are many teams whose identities hold true even in the ever-changing game of the 21stcentury such as the Brewers, Dodgers, Red Sox, and Blue Jays.
However, there also are some major swings in philosophies between the two periods. The Athletics are an example as this k-means clustering now defined them as a substantially pitching oriented franchise since 2000 (80%). Many people tend to forget that the Billy Beane A’s immortalized in Moneyball were anchored by a superb starting rotation in the early 2000s. Furthermore, the A’s pitching focus is neutral as they have also been a factory of high caliber relievers and a leader in the bullpen innovations over the 2000s. Another interesting change is the Giants, who have dropped from being classified as an offensive organization. In fact, this lines up with their move in April 2000 to what is now Oracle Park, which has been known to be fairly pitcher-friendly. More generally, the 2000-2019 period has more significant results in regard to pitching focuses. Whereas I struggled to find trends in the longer period from 1969-2019, the 2000-2019 period shows that franchises have had much higher propensities to choose to focus on building out either their rotations or bullpens.
Though this analysis highlighted certain tendencies, it can’t really explain why, say, the Blue Jays tend to be power reliant. Certainly, home ballparks must play a role, especially for teams like the Rockies at Coors Field or the Padres at Petco Park. But is it the park that leads teams to perform better in certain areas or are decision-makers building their rosters with their home ballpark in mind?
The trends we see may be a function of coaching staffs at both the major and minor league levels, with strengths in developing certain types of players or skills. GMs and owners can understandably have a strong influence over the identity of a team, as can a franchise’s own history. It’s common for club legends to take on special advising roles in the front office or to find them at spring training, lending advice to current players.
With some organizations, history does seem to repeat itself. For some, that’s a recipe for success. For others, looking at their past might be a key to turning their fortunes around.
Editor’s Note: If you have any questions about this article, please feel free to contact Diego Martinez on Twitter (@drmartinez31) or via email (firstname.lastname@example.org).