By: Etienne Busnel, Nick McNulty, Naman Lakhani, Kenzo Valentin, and Andrew Pollack

Introduction
Imagine offering a baseball fan the chance to build a starting rotation featuring Shohei Ohtani, Tyler Glasnow, Clayton Kershaw, Tony Gonsolin, and Dustin May. Most would jump at the opportunity—after all, this collection of talent includes Cy Young winners, solid backend options, and two-way superstars. And yet, despite assembling this dream rotation, the 2024 Los Angeles Dodgers found themselves in a nightmare scenario: every single one of these pitchers landed on the injured list.
And yet, they still won the World Series.
The Dodgers' success wasn’t just about having stars—it was about preparing for the inevitability that some of those stars wouldn’t be available. They stockpiled depth, ensuring that when injuries struck, they had enough quality arms to stay competitive. Their ability to account for playing time uncertainty proved just as valuable as the high-powered offense that carried them to a championship.
In Major League Baseball, the best ability is availability. Front offices don’t just need to know how good a player is -- they need to know how much they can realistically expect them to play. This analysis aims to tackle that challenge by forecasting playing time (plate appearances for hitters and batters faced for pitchers) for the 2024 season, providing a roster-agnostic look at how much teams can count on their players to take the field.
Feature Creation
Examining the distribution of plate appearances (PA) and batters faced (BF) from the 2021-2023 seasons reveals key trends in player usage. Both distributions show a significant spike at zero, representing depth players who only made brief appearances before returning to the bench or minors.
For plate appearances, the distribution remains relatively steady from 100 PA up to around 700. In contrast, batters faced saw a decline around the 300 bf mark, likely due to the different roles that starting and relief pitchers play.

Given the differences between batting and pitching projections, we developed separate models for each, using role-specific features. The individual predictions for batting and pitching were then combined to estimate a player's total playing time.
We selected the following features for each model:
Batting-Specific
Total PA: Number of plate appearances a player had in the past seasons.
Age: Players tend to regress with age.
Bats: Specifies if they were a lefty, righty or switch hitter.
OPS: OPS had the highest correlation with plate appearances of all the traditional slash line stats.
xwOBA: The WOBA value expected based on a batter's contact quality.
Average Lineup Position: Players batting higher in orders get more plate appearances.
Home Runs per Fly Ball Percentage: What percent of fly balls were home runs.
Line Drives Ratio: What percentage of hits were line drives.
Ground Ball Ratio: What percentage of hits were line drives.
Strikeout Rate
Walk Rate
Pitching-Specific
Total BF: Number of batters faced a player had in the past seasons.
Age: Players tend to regress with age.
SP Percent: Percent of pitcher appearances where the pitcher started the game.
Zone Chase Percentage: What percent of pitches were either thrown in the strike zone or were chased by the batter.
Average BF Per Outing: How many batters did the pitcher face in an appearance on average.
wOBA: wOBA value earned by batters when facing the pitcher.
xwOBA: The wOBA value they were expected to earn.
In Play Ratio: What percent of pitches thrown were hit into play.
Hit Type Ratios What percent of hits were ground balls/fly balls/popups/line drives.
Strikeout Rate
Walk Rate
Batting Average Allowed (BAA)
WHIP
Average Fastball Velocity
For numerical features like OPS, we calculated a weighted average from the last three seasons to account for recent performance. We weighed players' stats by 5 for last season, 3 for two years ago, and 2 for three years ago, which is conventional in baseball analytics. This ensures that more recent data has a larger influence, reflecting the typical trajectory of player performance.
Additionally, to prevent players with limited at-bats from disproportionately skewing data, we regressed every player's statistics towards the average for their position by the equivalent of 5 plate appearances. For instance, a player who had only a single plate appearance and a hit would see their OPS adjusted from 1.000 to approximately .800, bringing it in line with more realistic expectations based on their position's overall performance. A player with 600 plate appearances would see very little adjustment though.
For the modeling process, we selected a NetElasticCV model to limit potential overfitting, given the small prediction sample size and the inherent randomness of baseball. After training the batting and pitching models, we got the following feature weights.


Analysis
One observation was that traditional performance metrics, such as OPS for batters and WHIP for pitchers, did not have as large an impact on the models as anticipated. While we expected previous plate appearances to have the biggest impact, it accounted for a greater share than we had predicted. This aligns with the reality that players, regardless of their lineup position, may see similar playing time if they are near everyday contributors. For example, a cleanup hitter might have more plate appearances than a player batting seventh, but if both are everyday starters, their overall plate appearances may not differ significantly. Players with lower stats often also quickly find themselves on the bench or in the minors, which removes the lower counting stat values from appearing.
Despite this, batters still saw stats such as OPS get much larger weights than pitchers, whose model was mostly dependent on previous playing time or if they were a starter or reliever. Usually, a pitcher's performance or expected performance metrics are highly associated with their ability to strike batters out and/or induce weak hits to the ground that can easily be fielded. This may be due to a team's need to send a different pitcher out every 5 days. It doesn't matter if your 4th starter has an ERA 2 points higher than your ace, they'll still be getting the same number of starts (as long as they aren't benched). With the rise of pitching injuries in the MLB as well, being a pitcher able to eat innings seems to have gained value. It is also easier to push a batter down the lineup or create a platoon based on their poor play than to find a new quality pitcher.
Using batters faced instead of outs recorded also likely contributes to this. While getting more plate appearances is always good in baseball (although this is dependent on your team's ability to hit) more batters faced is not necessarily a sign of success. Baseball is not the NBA where there is a set amount of time players have to be on the court. A pitcher who goes perfect through seven but is pulled after giving up his first hit will record less batters faced than a pitcher who gives up ten base runners through a rocky five. The same goes for relievers, where better relievers can come into a two out situation and not give up two hits before getting the out. As such, it would be interesting to see how projections change when accounting for outs recorded instead of batters faced.
Future Areas to Consider
More seasons of data: Baseball's inherent randomness makes additional seasons crucial for accurately assessing player performance, identifying aging trends, and using a season as a validation set.
Injury data: Without specific injury history, the model can't fully account for the risk of injuries, which limits its ability to adjust projections for injury-prone players.
Minor league data: The model struggles to predict rookie performance, as it lacks insight into whether a late 2023 call-up is a top prospect or a temporary fill-in.
Fielding: The model does not account for defensive skills, which can influence playing time.
General aging trends: More data would improve the model's ability to capture and predict aging patterns for different player archetypes.
Comments