Pitcher wins are a useless stat (and what to use instead)
For roughly 150 years, the pitcher win was treated as the headline evaluation stat for starting pitchers. A pitcher's career record — 300 wins for the all-time greats, 200 for very good ones — was a rough proxy for Hall of Fame credentials, and an individual season was framed around whether the pitcher had reached 20 wins. The Cy Young award, in its early decades, was effectively given to whoever had won the most games. Every major baseball card and broadcast graphic still leads with the W-L line.
The win is also, on inspection, almost worthless as an evaluation of pitching skill. Half its variation comes from the pitcher's team, a quarter comes from random clustering of when runs scored relative to when the pitcher was on the mound, and the rest is a muddled mix of pitching, fielding, and bullpen behavior. Bill James was making this argument in the 1970s; every front office has agreed with him since the late 1990s; and the win is still on every broadcast graphic. This is the second-most stubborn bad stat in baseball, behind only batting average, and it deserves to be retired with the same finality.
What a win actually requires
For a starting pitcher to be credited with a win, three things must all happen. He must pitch at least five innings. His team must be ahead when he leaves the game, and must remain ahead through the end. And the bullpen behind him must not blow the lead before the game ends. None of these three has anything to do with how well he pitched. He can pitch a complete-game shutout and lose, because his team was shut out too. He can give up six runs in five innings and win, because his offense scored eleven before he left. The win records whether the team was ahead at a specific moment, not whether the pitcher was good.
Look at any starting pitcher's career and the year-to-year fluctuation in his win total is enormous compared to the fluctuation in his actual underlying performance. The same pitcher with the same ERA can go 18-7 one year and 11-13 the next, simply because the offense scored differently behind him and the bullpen held leads differently. The performance was constant. The wins were noise.
The empirical case
Run the regression. Wins for starting pitchers are correlated with how well a pitcher pitched, but the correlation is much weaker than the correlation of any rate stat with the same underlying skill. A pitcher's ERA in a given season explains roughly 35-40% of the variance in his win-loss record. A pitcher's FIP — fielding independent pitching, which measures only the things the pitcher controls — explains about 25-30% of the variance in W-L. The remaining majority is team context.
Run-support — the average runs scored by a pitcher's team in his starts — is, by itself, a stronger predictor of his W-L record than his actual pitching is. That's the cleanest single diagnostic of why the stat is broken. A pitcher whose team averages 6 runs in his starts will have a better record than a pitcher whose team averages 3, even if the second pitcher gives up fewer runs per nine innings.
The all-time examples
Felix Hernandez won the AL Cy Young in 2010 with a 13-12 record, which would have been disqualifying by traditional standards. He led the league in ERA, threw more innings than any starter in the AL, and pitched for a Mariners team that averaged barely 3.4 runs per game in his starts. By every actual measure of pitching, he had a historically great season. By win totals, he looked unremarkable. The award was the first major public recognition that the writers had finally stopped reading W-L as the primary signal.
The reverse case is Bob Welch's 1990 Cy Young, awarded for a 27-6 season. Welch pitched well that year — a 2.95 ERA across 238 innings — but he was not the best pitcher in the league. Roger Clemens, by every modern measure, had a meaningfully better year, with a lower ERA, more strikeouts per nine, and a higher WAR. Welch had more wins because the A's scored 5.5 runs per game in his starts. The award went to the W column. In retrospect, every analyst agrees that vote was wrong, and the wrongness was specifically caused by overweighting the win total.
What to use instead
For starting pitchers, the standard sabermetric evaluation suite is ERA, FIP, xFIP, and pitcher WAR. None of them is perfect, but each captures something the win column doesn't.
ERA — earned run average, runs allowed per nine innings — is the simplest. It's not park-adjusted in its raw form and it gives fielders implicit credit for outs they make, but it's a much better summary than W-L. League-average ERA in the modern game sits around 4.00; anything under 3.00 is excellent and anything under 2.50 across a full season is historic.
FIP — fielding independent pitching — strips out everything a pitcher doesn't directly control. It measures only walks, strikeouts, hit-by-pitches, and home runs allowed, scaled to look like ERA. The argument for FIP is that those four outcomes are the only ones a pitcher fully owns. The argument against is that some pitchers genuinely induce weak contact and benefit from being measured on what they actually allowed. Both arguments are right. The cleanest use is to look at FIP and ERA together — if a pitcher's ERA is much lower than his FIP, he's either inducing weak contact (a skill) or getting lucky on balls in play (not a skill). The distinction matters for projecting the next season.
Pitcher WAR collapses all of this into a single number denominated in wins above a replacement-level pitcher. The public versions — most notably FanGraphs' fWAR and Baseball Reference's bWAR — disagree mildly because they handle the FIP-vs-ERA question differently. A starting pitcher who accumulates more than 4 WAR in a season had a very good year regardless of which version you use; over 6 WAR is excellent; over 8 is historic.
The cultural problem
The reason wins won't die is not that nobody knows the alternatives. Every Cy Young voter in the modern era knows what FIP is. Every broadcast booth has been corrected on the question dozens of times. The reason wins survive is the same reason batting average survives — they're the oldest stats in the sport, they're easy to explain, and they have century-long records attached. The 300-win club, the 20-win season, the rookie-of-the-year vote that has always weighted them: these are cultural artifacts, not analytical claims.
The compromise the sport has settled into is that wins now appear on graphics alongside ERA and WHIP, and the analysts who follow the game closely have learned to mentally substitute the better stats when evaluating a pitcher. The win column is still the lead line on the broadcast scroll, but the people writing the trade rumors and the contract offers and the award ballots have all stopped weighting it. The audience is now several years behind the analysts on this. The graphic will catch up eventually.
The short version
When you see a starting pitcher's record in March or November, look past it. If his ERA is under 3.50 and his FIP is under 4.00, he's pitching well regardless of whether he's 14-4 or 11-13. If the numbers are bad, he's pitching badly regardless of whether his record looks good. The win column is the team's stat, dressed up as the pitcher's. Read the pitcher's actual columns instead.