Why early-season ERA tells you almost nothing
Every May, the same story cycles through baseball broadcasts. A pitcher nobody had on a Cy Young short list in spring training is sitting on a 1.74 ERA through eight starts. Highlights play. The graphic shows a ratio of comically low earned runs to a heroic innings total. The broadcast asks, with appropriate gravity, what changed. And by August, the ERA has crept toward four, the narrative has dissolved, and the broadcast has moved on to the next early-season story it will also overstate.
The honest answer most Mays is that nothing changed. Eight starts is not enough information for ERA to mean what people use it to mean. The number is mostly sample-size noise sitting on top of a smaller signal, and the noise component is large enough to support almost any narrative the broadcast wants to attach to it. Knowing which numbers actually settle quickly and which ones don't is the difference between reading early-season baseball sensibly and being permanently surprised by July.
Why ERA is fragile in small samples
ERA is a ratio of runs allowed to innings pitched, scaled to nine innings. The denominator at the eight-start mark is usually around 45 to 50 innings, which sounds like a substantial sample but isn't. A single five-run inning in a bad start moves an ERA in that range by close to a full run. Two bad starts buried inside an otherwise excellent sample will tank a number that, given another two months of data, would have stabilized into something perfectly normal.
The fragility cuts both directions. A pitcher with a 5.40 ERA through eight starts has usually had one or two disaster outings that account for most of the runs, with the other six starts looking competent or better. The broadcast graphic that flashes the ERA in May treats the whole sample as if it were a representative slice of true talent. It isn't. It's eight specific games, weighted by their results rather than their inputs, and any one of those games can dominate the line.
What stabilizes when
Russell Carleton's stabilization research — the foundational public work on this question — gives defensible answers about how many plate appearances or batters faced are required before a pitching statistic carries more signal than noise. The numbers are larger than most fans assume.
Strikeout rate stabilizes earliest, after roughly 70 batters faced. That's about ten starts for a healthy starter. Walk rate takes a little longer, around 170 batters, which is most of the first half. Ground-ball rate comes in around the same place. Home-runs-per-fly-ball — the number that drives most of the year-to-year volatility in pitcher ERAs — doesn't stabilize until well over 600 batters faced, which is essentially a full season. Batting average on balls in play against, the other big driver of ERA volatility, takes even longer. By the time you have enough plate appearances for BABIP-against to be a meaningful number for a pitcher, the season is more or less over.
This means that in May, a pitcher's K-rate is starting to tell you something real, their walk rate is on its way, and almost every other component of their ERA is still dominated by random variation that hasn't shaken out. The headline number — ERA itself — is built mostly on components that haven't stabilized, sitting on top of components that have. The signal is in the wrong place.
The numbers that work in May
The two pitching stats worth taking seriously this early in the season are strikeout rate and walk rate, ideally combined as K-BB% (strikeout rate minus walk rate as a percentage of total batters faced). These are the two components of the strikeout-to-walk ratio expressed in a way that doesn't blow up when the walk rate is near zero, and they stabilize fast enough to mean something at the forty-batter mark.
A pitcher whose K-BB% has jumped four or five percentage points off their established norm is doing something different, and the difference is large enough to detect through the noise of a small sample. A pitcher whose ERA has jumped four or five tenths of a run off their established norm has, statistically, told you almost nothing yet. The two numbers point at very different things even when they correlate at the season level.
Stuff-based metrics — pitch velocity, spin rate, induced vertical break, the shape and quality of each pitch in the arsenal — are the other category that stabilizes quickly because they're physical measurements rather than rate stats. A starter who has added two miles per hour to their fastball or whose slider has gained two inches of horizontal break is genuinely throwing a different pitch than they were last season. The change is real on start one. Whether it produces a true talent shift in run prevention takes longer to confirm, but the underlying physical change is observable immediately.
FIP, xFIP, and SIERA help, but not as much as people think
Fielding Independent Pitching, expected FIP, and the Skill-Interactive ERA estimator are all attempts to strip ERA down to its more stable components — strikeouts, walks, and home runs allowed, with adjustments for the components most influenced by luck. They are meaningfully better than ERA for evaluating starters in larger samples because they down-weight the components that are mostly noise.
In early-season samples they are better than ERA but not dramatically so. FIP still includes home runs allowed, which haven't stabilized. xFIP normalizes home runs to a league-average rate, which is more robust at small samples but introduces its own assumption that the pitcher's true home-run rate is league-average. SIERA does the most sophisticated weighting, but it inherits the same problem that the underlying components are still partially noise at the May mark.
The cleanest framing is that any ERA estimator is a better predictor of future ERA than current ERA is — and that's true even in May — but none of them are a substitute for the patience of waiting until the components themselves have settled. By the All-Star break, FIP and xFIP carry meaningful signal. In May, they carry a weaker signal than the K-BB% and stuff numbers, and they should be read with a wider error bar than the ESPN graphic suggests.
What changes the calculus
The one early-season pitching signal that should change your read is a clear stuff change. A pitcher who shows up in April with a new pitch, a new release point, or a velocity increase is doing something different in ways that are physically observable, not statistically inferred. Those changes carry information about the pitcher's future performance that an eight-start ERA does not. The same is true for a pitcher who has visibly lost velocity — that signal arrives earlier than the ERA does, and it usually means trouble.
Everything else — the hot start, the cold start, the sudden Cy Young case, the surprise collapse — should be held lightly until the sample catches up to the narrative. Most of those stories will be revised by July. The pitchers who are actually different from the version of themselves you knew will reveal themselves in their K-rate and their stuff first, and in their ERA last. Reading the components instead of the headline is the single biggest improvement available to a fan trying to make sense of early-season pitching.
The honest answer in May
If you want one rule for how to react to early-season ERA, it's this: assume any ERA below 2.50 will rise and any ERA above 5.00 will fall, unless you can identify a specific, observable, physical change in the pitcher that explains the new number. Regression to the mean is the default, and most of the surprising May lines are not survival bids for a true talent shift. They're variance wearing a story.
Wait until the sample is big enough to mean what you want it to mean. June and July do most of the real narrative work in baseball. May mostly produces broadcasts.