Peter Majors
Introduction
The New York Yankees are projected to be one of the best teams in all of Major League Baseball in 2021. As of this writing, Fangraphs projects that they will win 96 games and accumulate 48.2 fWAR over what is looking to be a full 162-game season. These projected totals are both the second most of any team in baseball and cement the Yankees as serious World Series contenders for yet another season.
This season, the Yankees’ five most accomplished starting pitchers boast a combined 6 All-Star Game Selections, 9 top-5 Cy Young finishes, and 2 Cy Young Awards. Their rotation as a whole possesses a healthy balance of experience and youth, which if all goes well, could prove very beneficial for them come playoffs. However, as many have been quick to point out, the team’s starting rotation possesses a bill of health that is far from ideal.
Outside of obvious ace Gerrit Cole, three of the team’s four projected starters at the beginning of the season (Corey Kluber, Jameson Taillon, and Domingo Germán) threw for one combined inning in 2020. This is of course excluding Jordan Montgomery - who pitched 44 innings last season and joins Cole as the only other starter for the Yankees with a substantial workload in 2020.
Kluber, German, and Taillon each missed the 2020 season for different reasons. Kluber suffered a season-ending tear of his teres major in his first inning of action, Germán missed all of 2020 because of a suspension stemming from a domestic violence incident, and Taillon did not appear in any games because he was still recovering from his 2018 Tommy John procedure.
Purposefully omitted from this collection of pitchers is Luis Severino – who was not on the Yankees’ Opening Day roster because he is still recovering from his own Tommy John procedure. However, the two-time All-Star and former Yankees ace has already begun throwing off the mound and is expected to return to the team sometime this summer.
Based on the number of pitchers the Yankees have returning from injury, it seems they believe that enough of their arms will return and perform well enough to propel them deep into the postseason – where their elite offense and formidable bullpen can win them big games. With over a half-dozen pitchers in serious contention to make the starting rotation out of Spring Training, the issue for the Yankees was never the quantity of arms in their possession, rather the quality of the innings those arms can provide.
Corey Kluber and Domingo Germán both missed time in 2020 for somewhat unusual reasons - however, Jameson Taillon and Luis Severino did not. As the graph below makes it abundantly clear, Jamo and Sevy were not alone. In 2020, the year in which Severino underwent Tommy John surgery, a total of 29 major league pitchers underwent the procedure.
Taillon underwent his second Tommy John procedure in 2019, along with 16 other major league arms. The unfortunate reality is that more and more pitchers are missing time to have their UCL’s reconstructed upon - and at least for now, there does not seem to be any end in sight.
While nobody enjoys seeing any professional athlete undergo such a devastating procedure, the sheer number of pitchers who have had the surgery and recovered has lent itself to an incredible amount of data. Data on how the body reacts to the procedure, what the proper recovery timetable should look like, and how a pitcher’s ”stuff” is affected by the procedure, is for the most part, publicly available.
The Yankees plan to rely heavily on two pitchers coming back from Tommy John surgery, in Luis Severino and Jameson Taillon, for a bulk of their innings during the latter half of this season. Therefore, I thought it would be interesting to project how they might fare in their first season back in action.
The Process
For this project, I utilized a publicly available Tommy John surgery database as well as this Fangraphs leaderboard containing every individual pitching season from 1974 t0 2020.
Just as a disclaimer, this project was my first time using the statistical programming language R, so much of this article will include the code I used to perform my analysis. A copy of my code and all the files used in this analysis can be found on Github here. Feel free to skip this part if you are not at all interested in understanding the thought process or the code behind my research.
I began by loading in all the relevant packages and assigning all the necessary data for my analysis.
library(tidyverse)
library(lubridate)
library(plyr)
tj1 <- read_csv("tommy_john_injuries.csv")
fg1 <- read_csv("fangraphs_leaderboard_complete.csv")
df_viz <- read_csv("tommyjohnresearch_df_bf.csv")
Then, I filtered all of the observations in the Tommy John database by pitchers in Major League Baseball who returned to Major League Baseball following their surgery.
tj2 <- select(tj1, 'Player':'Post-TJ MLB IP/PA')
tj3 <- filter(tj2, Level == "MLB",
Position == "P",
!is.na(tj2$`Return Date (same level)`))
After this, I altered the data type for a column of data and then combined the Tommy John database with the aforementioned Fangraphs data. This left me with a data frame containing every season of each major league pitcher’s career who returned to Major League Baseball following Tommy John surgery.
tj3$fgid <- as.double(tj3$fgid)
df <- left_join(tj3, fg1, by = "fgid", copy = TRUE)
In order to sort through this data, I had to make the dates my data frame workable. In order to do so, I separated them into month, year, and day. Because of this, I was able to perform my next step: filtering my data by the season in which each pitcher returned from Tommy John surgery.
df1 <- df %>%
separate(`TJ Surgery Date`, sep= "/",
into= c("TJ Month", "TJ Day", "TJ Year")) %>%
separate(`Return Date (same level)`, sep= "/",
into= c("Return Month", "Return Day", "Return Year"))
I further filtered these observations by seasons in which pitchers threw at least 30 innings and started in at least half of their appearances. I did this to ensure that each observation had a large enough sample of innings to be somewhat reliable while not limiting my own sample of pitchers too much. I decided upon 30 innings because I expected both Taillon and Severino to throw at least that many innings in 2021.
From there, I made sure that the pitchers in question were all starting pitchers - since Taillon and Severino are both expected to work out the rotation in 2021. In order to do this, I created a new column containing the proportion of games started in out of games appeared in for each pitcher and filtered by observations with at least 50% in that category. Following these changes, I was left with 90 observations.
dfafter <- df1 %>%
group_by(fgid) %>%
filter(Season == `Return Year`) %>%
mutate(starter_ratio = GS/G) %>%
filter(IP>=30, starter_ratio>= .50)
Following this, I created a new data frame containing statistics from each pitcher’s season prior to and during the year in which they underwent Tommy John surgery. This involved creating a “diff” column, which allowed me to filter out all seasons which were not equal to or one year prior to the year in which each player had their procedure.
dfbefore <- left_join(dfafter, df1, by = "fgid", copy = TRUE)
dfbefore1 <- select(dfbefore,'Player.y':'FIP-.y')
dfbefore1$Season.y <- as.double(dfbefore1$Season.y)
dfbefore1$`TJ Year.y`<- as.double(dfbefore1$`TJ Year.y`)
dfbefore1[['diff']] = dfbefore1[['TJ Year.y']]-dfbefore1[['Season.y']]
dfbefore2 <- dfbefore1 %>%
filter(diff %in% (0:1))
Then, I had to verify that the seasons I was drawing post-surgery statistics from added up to at least 30 innings and contained appearances that were at least 50% of the time as the starting pitcher. This lowered my observations to 82 pitchers - a total which I felt comfortable using in my analysis.
dfbefore3 <- dfbefore2 %>%
group_by(fgid) %>%
dplyr::summarise(sum_ip = sum(IP.y))
dfbefore4 <- left_join(dfbefore3, dfbefore2, by = "fgid", copy = TRUE)
dfbefore5 <- dfbefore4 %>%
filter(sum_ip>=30) %>%
mutate(starter_ratio = GS.y/G.y) %>%
filter(starter_ratio>= .50)
The last major change I made before finishing up my data wrangling was calculating the weighted averages for each of the statistics I was interested in exploring. Since my data includes seasons ranging back over four decades - advanced metrics such as xFIP- or CSW% were not available for all of my observations. Because of this, I decided to use ERA-, FIP-, K%, and BB% in my analysis.
dfbefore6 <- dfbefore5 %>%
group_by(fgid) %>%
dplyr::summarise(ERA_minus_before = weighted.mean(`ERA-.y`,IP.y),
FIP_minus_before = weighted.mean(`FIP-.y`,IP.y),
K_pct_before = weighted.mean(`K%.y`,IP.y),
BB_pct_before = weighted.mean(`BB%.y`,IP.y))
dfafter1 <- select(dfafter, c('ERA-','FIP-','K%','BB%')) %>%
dplyr::rename(ERA_minus_after = 'ERA-',
FIP_minus_after = 'FIP-',
K_pct_after = 'K%',
BB_pct_after = 'BB%')
I wrapped things up by combining my data frame containing the weighted statistics before each pitcher’s Tommy John surgery with the data frame containing each pitcher’s statistics immediately following surgery. I used a weighted mean to ensure that any large disparities between innings pitched in the season(s) proceeding a pitcher’s Tommy John procedure would be properly taken into account.
df_final <- left_join(dfbefore6, dfafter1, by = "fgid", copy = TRUE)
Now that all of the boring code is out of the way, let’s take a look at the results.
The Analysis
After sorting through all pitchers in Major League Baseball who underwent Tommy John surgery, returned to the major leagues, threw at least 30 innings during the season in which they returned, threw at least 30 innings in the during the year prior to and during their Tommy John procedure, and appeared as the starting pitcher at least half of the time during those seasons these were my findings.
Here we can observe notable increases in the average ERA- and FIP- of pitchers following Tommy John surgery. In the context of these statistics, an increase means the pitchers’ ERAs and FIPs were becoming worse with respect to the league. We also see a slight decrease in K% as well as a minor uptick in BB% following the procedure. This suggests that pitchers, across a variety of metrics, tended to perform worse directly following their return from Tommy John surgery than before.
Intuitively, this makes sense. Any player coming back from an injury with a recovery time that typically exceeds one year should not be expected to immediately perform at the same level as their pre-injury self. In an art as fragile as pitching, technique and feel are just as important as physical health - and recapturing those skills can take time.
In addition to average declines in performance, we see a notable increase in the standard deviations of ERA- from 19.5 to 26.2, FIP- from 15.6 to 20.8, and BB% from 2% to 2.6%. Interestingly, we do not see this pattern occur in K%. In fact, K% remains incredibly consistent across nearly all of the measures of spread in the above table. We will explore this more later on. While the above chart is somewhat useful in understanding the increased levels of variance in performance following Tommy John surgery - there is much a better way to conceptualize the effect of this procedure.
Below is a smoothed density estimate plot made using R's signature ggplot package which contains the FIP- of the 82 pitchers in my analysis before and after their Tommy John procedures. The dotted lines represent the mean of each set of observations. Looking at this graph, it is clear how less concentrated the FIP- of pitchers in this sample were around the mean post-surgery.
According to FIP-, pitchers were only 7% worse than league average during their first season back. However, the standard deviation in FIP- among pitchers increased 33% during the season in which they returned, and this graph allows us to understand the magnitude of this difference. While most pitchers pre-surgery were concentrated around the mean, pitchers directly following their procedure were roughly evenly distributed between 90 and 120 FIP-.
So, what about ERA-? Surely, since ERA is less dependent on outcomes that pitchers should be in control of (strikeouts, walks, and home runs), it should have experienced more spread, right?
The answer to that question is a pretty clear ”yes”. Looking at the smoothed density estimate plot above, we can see that immediately prior to Tommy John surgery, the ERA- of pitchers in this sample were far less concentrated around the mean than in the FIP- density estimate plot. As mentioned above, this is because pitchers have significantly less control over their ERA- than FIP-. This is something we could have surmised but is still encouraging to see in our results.
In this graph, we see that pre-surgery observations of ERA- plateaued between 90 and 120, while the post-surgery observations were far more spread out across the spectrum of ERA-. We also see far more post-surgery observations to the right of the mean, signifying that pitchers, on the whole, performed worse during their first season back from Tommy John surgery according to ERA-. This is also reflected in the spread between the means of pre-and post-surgery ERA-, which at 12 is nearly double that of FIP-. These increases in spread and shift skewness were detailed in the above table, but are far easier to conceptualize thanks to these density estimates. The main takeaway here is that pitchers tended to perform worse directly following surgery, and this change is most felt in ERA-.
Density plots are helpful for conceptualizing the variance of these statistics as it pertains to the body of observations as a whole. But how do pre-and post-surgery statistics correlate to one another on the individual observation level? Below is a table containing the correlation of statistics from our sample, the correlation of those same statistics year-to-year across the league, and the difference between each of them.
In order to calculate the correlation metrics for typical pitchers, I used the same set of parameters for my Tommy John observations (minimum 50 innings in a single season and pitchers with at least 50% of appearances as the starting pitcher), without obviously limiting my search to those who received Tommy John surgery.
As you can see, outside of K%, there really was only a moderate correlation among the pitching statistics in question directly prior to and following Tommy John surgery. While FIP- and BB% each displayed a somewhat encouraging set of results, ERA- clearly possessed the worst pre-and post-surgery correlation. Outside of BB%, year-to-year correlations for typical starting pitchers closely resembled those of Tommy John victims. This should not be awfully surprising, since BB% experienced a roughly 30% increase in its standard deviation following surgery.
The above table demonstrates just how well statistics immediately prior to and directly following Tommy John surgery mirror year-to-year correlations. This should give us more confidence about other statistics with high year-to-correlations, including Barrel%, GB/FB, SwStr%, and Contact% during a pitcher’s first season back following UCL reconstruction surgery.
Now that we’ve taken a look at how each of the statistics in our sample correlates to one another following Tommy John surgery, let’s answer a more important question: how can they be used to predict a pitcher’s ERA- directly following their return? After all, run prevention is the name of the game - and no statistic, despite its numerous flaws, communicates this better than ERA-.
To answer this, I ran a multiple regression model using each of the statistics in question pre-surgery with the goal of predicting post-surgery ERA-. In order to determine which combination of these variables to use in my model, I ran correlation tests for each of them against post-surgery ERA-. As a baseline for comparison, I also ran correlation tests for ERA-, FIP-, K% , and BB% against the next year’s ERA-. Below are the results of those tests.
As you can see, there is not a very high correlation between each of these statistics pre-surgery and ERA- post-surgery. Again, we see a good amount of consistency between typical year-to-year correlations and correlations among the Tommy John sample. This means that we should put faith into statistics that correlate well with a pitcher’s future ERA- in typical samples in the context of Tommy John recovery. Just as a note, K% experienced a negative correlation because higher K%s correspond with lower ERA-s.
Upon examining these results and running a few different regression models, the highest R-squared I could achieve with any combination of the above statistics in my Tommy John sample was .14, (.37 correlation), while the highest adjusted R-squared I could attain was .12. The combination of inputs that yielded these results were ERA- and FIP-, which given the above table, makes sense. This means that controlling for how these two metrics overlap, we can only attribute 12% of the deviation in a pitcher’s ERA- in the season following their return from Tommy John to their ERA- and FIP- directly prior to their procedure.
After running another regression model using ERA- and FIP- for typical year-to-year correlations, I achieved an even worse set of results. The R-squared, as well as the adjusted R-squared for this model, turned out to be .13 (.36 correlation). Thus, both of these models turned out to do a poor job of predicting a pitcher’s ERA- in future seasons, whether or not they were returning from Tommy John surgery. While the results of these tests were slightly discouraging, they highlighted one important truth: that ERA- it is extremely difficult to predict.
The results of these models means that using either of them to predict the ERA- of pitchers immediately following their return from Tommy John surgery would be irresponsible.
Applications
Now that it’s clear just how difficult it is to predict a pitcher’s ERA- upon their return from Tommy John surgery, let’s take a look at what we actually can predict about their performance post-surgery. In doing so, let’s return our attention to Luis Severino and Jameson Taillon.
*The statistics above are weighted by innings pitched for Jameson Taillon because he threw at least 30 innings in the season during which he had Tommy John Surgery, while those for Luis Severino are not.
As stated in the table containing correlation metrics for statistics before and after Tommy John surgery, K% displayed a relatively high level of correlation. According to Fangraphs, Severino had a 28.2 K% in 2018, which was one season before he underwent Tommy John surgery, while Taillon had a weighted K% of 22.2% over the 2018 and 2019 seasons. For context, the average K% in MLB for the 2018 season was 22.3%.
Going back to the first table in this article, we see a .5% decrease in the mean of K%. Furthermore, based on the recently discussed correlation metrics, we can be confident that individual observations of pre-surgery K% should replicate themselves following a pitcher’s immediate return.
With these two things in mind, I think it is safe to assume that Severino will strike out approximately 27.8% of the batters he faces in 2021, a rate that still lies comfortably above league average. While we do not have much information on Severino at this point in his recovery, we do have his metrics in prior years, which for K% might be all we need. So Yankees fans, you should remain confident that your former ace will continue to strike guys out at a great clip in 2021.
For Taillon, we can apply these same sets of findings, but also must take into account that he has drastically shortened his arm path, which has affected the amount of spin he is able to impart on the baseball. In his first start of the season, he averaged 2473 rpm on his fastball, which is noticeably higher than his average of 2328 rpm in 2019. His slider has also substantially increased its spin, now averaging 2613 rpm, which is a 140 rpm jump from 2019. His curveball did not experience this same effect but still ranked in the 82nd percentile of curveball spin in 2019 according to Baseball Savant. Jamo only averaged 93.3 mph on his fastball during his first start after sitting at 95 mph in 2019. However, the season is still young and his velocity should pick up with the warmer weather.
Based on these early spin rates as well as the fact that Taillon plans to utilize high fastballs and low breaking balls more often, I think it’s safe to assume that Taillon will be striking out more batters this season on a rate basis than during any other time in his career. An example of such a combination can be seen in the overlay below.
While the information contained in this analysis suggests that we should expect a slight decrease in strikeouts from Taillon, his changed circumstances set him up for more strikeouts during his upcoming campaign. I would argue that because of these increased spin rates and strikeout-conducive approach, he may be a bit of an outlier. I project that Taillon will strike out approximately 24.5% of the batters he faces in 2021.
Outside of K%, there really is a lot of room for speculation in terms of pitchers returning from Tommy John surgery. As we have seen, ERA- and FIP- have proven to be equally insignificant in projecting a pitcher’s ability to prevent runs, while BB% has turned out to be borderline useless in this area. Without promising correlation metrics to back my projections, I will refer back to the table at the start of this article, which details the effect that Tommy John surgery had on starting pitchers in the short term as a whole.
In that table, the means of ERA-, FIP-, and BB% underwent a 12%, 7%, and 2% increase respectively. Using these altered means in conjunction with my far more reliable K% estimate, these are my somewhat sloppy Luis Severino and Jameson Taillon projections for this season.
For context, I am projecting Severino’s 2021 campaign to be similar to that of Hyun-Jin Ryu’s 2020 - during which the southpaw sported an 89 ERA-, 74 ERA-, 28.6 K%, and 6.1 BB% in his first season in Toronto. I am projecting Taillon’s 2021 to somewhat resemble that of 2020 Andrew Heany, who sported a 101 ERA-, 84 FIP-, 25 K%, and 6.8 BB% across 66 ⅔ innings as the Angels’ number two starter.
I realize that the above table does not represent the results of a clean mathematical model. But as we've seen in this analysis, predicting pitching statistics using such models is extremely difficult. Some statistics almost have to be left to chance, while others can be estimated with a good amount of confidence.
Overall, I think the Yankees are still in a good place if Severino and Taillon can return to a form that resembles the above stat lines. While they will still need to rely heavily on ace Gerrit Cole come playoffs, Taillon and Severino will remain solid compliments to him in the Yankees' decorated starting staff.
Conclusion
Pitching is difficult. And there’s an argument to be made that trying to predict how pitchers will perform is even more difficult. If this exercise taught me anything, besides how to manipulate and visualize data in R, it is just how delicate pitching statistics truly are.
To sum up the findings of my research, pitching performance tends to decline directly following a starting pitcher’s return from Tommy John surgery. However, this decline is not felt equally among different types of statistics. Metrics which tend to correlate well year-to-year among typical starting pitchers also seem to correlate well before and after Tommy John surgery. Based on ERA-, FIP-, K%, and BB% alone, it is very difficult to predict a pitcher’s future ERA-, whether or not they are returning from having their UCL reconstructed upon. Although it does not experience widespread increases, a pitcher’s BB% becomes far more unpredictable following Tommy John surgery and correlates far worse than in typical year-to-year samples. Strikeouts are king.
Although I was constrained by the metrics available to me in this analysis, I believe that they allowed me to accurately assess the effects of Tommy John surgery on a few of the most important pitching statistics. In this exercise, I chose to prioritize how closely my sample of observations resembled the pitchers I was interested in learning about over the quality of the statistics I had access to. This is a tradeoff that I definitely paid for - not so much in my mean estimates or in my correlation analysis, but certainly in my attempts at a regression model.
Further work can be done on this topic on a variety of fronts: including determining how the statistics of pitchers who have undergone Tommy John surgery change in the years following their immediate return, how more advanced metrics can be used to predict the run prevention ability of pitchers directly following Tommy John surgery, and how relief pitchers fare in the season following their return.
This was a pretty long article - and if you got to this point without skipping I seriously applaud you. I hope you got something out of it and enjoyed learning about the efficacy of pitchers following Tommy John surgery as much as I did writing about it.
Until next time.
Sources:
Peter Majors is the Founder and President of the Fordham Sports Analytics Society and a junior at Fordham's Gabelli School of Business. He is majoring in Accounting Information Systems, minoring in Computer Science, and plans to attend graduate school for Applied Statistics and Decision-Making. He is currently seeking a Summer 2022 internship in Data Science, Information Systems, or Baseball Analytics.
Comentários