Thursday, November 3, 2011

Arrogance and uncertainty

by Jason Wojciechowski


(He's gone all internet-anonymous, so I'm not going to shout him out too hard, but the Fire Jerry Manuel guy inspired everything here. He'd be better at this than I am, but he's got more important things to accomplish.)

"The free thinkers are the Margin of Error"


Margins of error and uncertainty matter. We know some things in baseball to a reasonable degree of certainty -- $X caught the fly ball to end the seventh inning of the game between $TeamA and $TeamB on May 23rd, 1957; $Y hit two homers in his career against Cincinnati; $Z struck out seven batters in his first start of the season in 1997. (Note: reasonable. Especially as we go back in time, there are errors and uncertainty around even simple box-score things like how many hits or strikeouts a player had.)

There are millions of things we do not know, however. How hard was that fly ball hit? Where was $X positioned at the beginning of the play? How far and fast did he run to get to the ball? How direct was his route? How did the atmosphere affect the flight of the ball and $X's ability to run, throw, and catch? How far did $Y's homers travel? Where were the pitches located? What kind of pitches were they? How much did they break? Did they miss the catcher's target? What about $Z's pitches? Did he tire as he went through the game? Were his mechanics affected? How did that change the quality of his pitches? How were the hitters he faced feeling that day? How much did he benefit from the strike zone? From his catcher's framing and sense of pitch-sequence? Did his overall defense behind him help or hurt? Did he pitch effectively to that defense?

These days, with PITCHf/x, stringers from data collection services recording game information in detail, physicists interested in baseball questions, and massive libraries of video, we have better ideas about many of these questions than we will ever have about that game in 1957. Still, there is error in all of these measurements. The PITCHf/x cameras have a known margin of error and can develop bias as the time since they were last calibrated grows. Stringer data ... who knows, because what can we measure against?

Then there are the sample issues. Forget about rookies with just 30 at-bats in the big leagues. Even a full season of data for a player contains an unpredictable mix of matchups, parks, injury situations, and competitiveness levels such that we can hardly be confident that the 650 plate appearances or 200 innings of results represent the player's true talent. As the sample grows larger, these issues tend to even out across the population of players, but, of course, the player whose skill we're trying to measure hasn't maintained static ability across that entire sample. Two thousand at-bats in a laboratory over a couple of weeks might give us a great picture of how good a player is, but two thousand at-bats over three years? That's tough.

Whether we're looking backward or forward (or, most often, backward so that we can look forward), we can't be sure of what we have.

Uncertainty about uncertainty

Even uncertainty has uncertainty. It's not like we know, when we project a player for three WAR in 2012, that he'll be between one and six WAR. He might be negative. He might bat .420 with 80 homers. If two teams are bidding on a player in free agency and both teams project the most likely performance of that player to be the same, but one team sees the player having a wider spread of outcomes, and that team is in a position for that additional upside to be worth more to them as a matter of playoff probability, the valuation of that player could be different for that team. We can't be certain of a player's range of outcomes around some mean or median any more than we can be certain of that average in the first place.


Given everything I've said above, and given further that none of what's there is novel, why do we still have the problem of relatively sophisticated writers, people who own The Book, subscribe to Baseball Prospectus, write on Fangraphs, comment on Beyond the Box Score, or Tweet with the leading lights of the public sabermetric community writing as if they're dead certain about anything? Why do comment sections on blogs turn ugly even without the "you can't measure heart" people getting involved?

Why, in short, are some otherwise excellent analysts so sure of themselves?


The other day, Nate Silver put a one-sided proposition out to political pundits: raise your hand if you'd give up writing about politics in the event that Herman Cain wins the Republican nomination for President. Somehow, a small handful of writers actually publicly signed up, so confident were they that the pizza man simply would not maintain his momentum all the way to the end of the race. This was before the sexual harassment scandal came about, so put aside for a moment that these pundits were probably right. Nate's point was simply that there are no guarantees. An 0.1% chance is still an 0.1% chance. Things happen.


Believe it or not, my ideal world does not involve analysts mentioning the phrases "margin of error" and "we can't be sure" while verbally shrugging sixteen times per blog post. Style is important. Taking positions and defending them vigorously is one of the aspects of being a sports fan that many of us love. I, of all people, don't want to take that away.

What my ideal world does involve is the end of pretending that we know things. We may have strong evidence for things. It may be the best practice to act as if a player will do X or Y but not Z because Z is tremendously unlikely. But we do not know things and the tone, the tenor of our analysis should reflect that.

Instead of "this is a stupid trade because Player X is 10 wins better than Player Y," we ought to be examining the possible range of outcomes over the effective life of the trade and getting a sense of the probability distribution of those outcomes. Maybe (hopefully, in fact) we come to the same place. Maybe the Dan Haren trade from Arizona to Anaheim was a terrible one because Arizona would need a 99th percentile outcome to achieve equal-or-better value to what they gave up. But the analysis has to be more than "Haren is an N-win player, and that's worth $X, and these prospects coming back are B-level, and that's only worth $Y." Assigning singular values to past-season performance is incredibly difficult (look at the different values the major WAR(P) components spit out) -- why on earth would we think we could do that for future performance?

Do you know what I like, speaking of Nate Silver, about Baseball Prospectus's PECOTA projections? They're presented, at least on their website, in percentile form -- a range of possible outcomes is published. (I'm shouting out PECOTA because I'm familiar with it. It's possible that other projection systems that I've never worked with, like Brian Cartwright's Oliver, do the same thing.) This is a model for how we should be analyzing baseball in general. Again, this does not have to be explicit, at least not all the time. If our minds are right, the written word will reflect our proper approach even if we never utter the word "percentile." We will also probably face fewer accusations of boorish behavior and invite less of said behavior from readers.


To preempt the cries of "nofunnik": we will still get to call people stupid in my ideal world! There is scads of terrible analysis in the world, just outright awful, uncritical, lazy, backward thought. When some columnist starts yammering on about how Brandon McCarthy was hurt in Texas and therefore he's not a good pitcher now, you can call that person dumb if you like! I won't stand in your way. There's a clear error of logic, a failure to appreciate that the past and the present are two different things. I would still encourage civility, but that's a different branch of discussion. It requires no certainty of projection or past performance to reduce this (purely hypothetical, surely!) columnist's argument to shreds of pixels lying bloody in your wake. Shred away at those who cannot even get from A to B without tripping over their shoelaces, friends. Just be careful when you delve into the more murky realm of actual baseball analysis.


Maybe I'm banging my head against a brick wall. Maybe the nerd-with-superior-knowledge is, by acculturation, simply unable to avoid feeling superior and 100% correct about his area of expertise. I think better of you all than that, though. I think we can come to understand, through the Cardinals winning the World Series, through the Red Sox chicken and beering their way to oblivion, through Adam Dunn hitting like Ty Cobb (the 2011 version of Ty Cobb, you know, the dead and decomposed one), that we just don't know things and that the world will be a happier place when we admit it.


SaberTJ said...

Fantastic post. Couldn't agree more.

Adam Krueger said...

Agree with SaberTJ, very well-reasoned. Unfortunately, the "arrogance of certainty" is not just a problem among baseball analysts, it is a problem within the entire scientific community. Great (recent) evidence for that?...the Global Warming issue -->

Matt Collins said...

This is great work

como mantener los rizos hidratados said...

A very good article. I am looking for is information. Thank you very much for the great article .We are encouraged to write this great article came out again

Gerry said...

Bravo! Between traditional media bashing for profit, bloggers using personally biased interpretations of current scientific theory (see UZR) to bash with Biblical certitude, or guys at a pub bashing everything that doesn't fit within their personal life experiences, there's not much prognostication going on that has true basis in fact. Alot of speculation, much of it both authoritative and wrong, that makes the speculator feel good, but too often at the expense of some really good people doing their best at a game we all wish we could play at the level of the player or manager or owner being bashed. Too much pompous use of damning phrases like "AAAA" or "too old"' or "too inexperienced" or "put a fork in him". Too much like reading a data based Enquirer or watching an angst-filled soap opera. It's a game. As we age we realize these guys are no longer our childhood heroes and that, in fact, many are still children themselves. Thinking out loud here, but I hope whatever "truths" these ever evolving stats may 'provide are used for more than belittling those whose performance is being measured; that just as understanding music makes the Mahler concert a richer experience, greater understanding of the grand olde game will rekindle, at a sophisticated and mature level, the joy we found in baseball as kids. I would rather use enhanced knowledge of the game and its players to enjoy the performance and not devolve into a self-made critic looking for flaws around every corner.