Monday, November 1, 2010

Matchups Revisited

By Bill

In 792 career plate appearances in August, Rusty "The Red Baron" Greer hit an impressive .353/.431/.579. In months other than August, Greer hit a solid but much less striking .294/.377/.456. In no other month did he hit below .289 or above .297, and in no other month did he hit more than 20 career home runs (he hit 32 in Augusts). Greer even went 9-for-9 in steals in August, and was a putrid 22-for-37 in other months. Is that enough to conclude that Rusty Greer was just really good at playing baseball in August?

The Common Man wrote a very, very good piece on Thursday in which he looked at pitcher-batter matchups for the top 60 pitchers by innings pitched since 1950. The results, while very interesting, were unsurprising; they were all over the map. When you're just looking for a very large number of plate appearances against one pitcher, you're looking almost exclusively at very-good-to-great hitters against very-good-to-great pitchers, and of course you're going to see some hitters who did very well against a certain pitcher, some who did poorly, and others (most of them) who did exactly what one would expect, i.e., slightly less well against that very good pitcher than they did overall.

It's all great stuff. TCM notes that even the most frequent matchup he could find -- Pete Rose against Phil Niekro, which happened 266 times -- is equivalent to only about 60 full games. He then points out that through about that many plate appearances to start 2010, Brennan Boesch looked like a superstar, but then finished out the year looking very much like the AAA player he actually is (well, no, he looked even worse than that).

All of which makes his conclusion something more than puzzling: explaining away the Boesch example by claiming that matchup stats are "less prone to streaky runs" than stats from continuous stretches, and then that fluky streaks are less likely with matchup stats because hitters and pitchers know each other better, TCM draws the following guess conclusion:
So where’s the line? TCM is tempted to put the number around 150, though obviously 200 is even more instructive. The trouble, as you can see above, is that players are increasingly unlikely to hit that mark in today’s game, making Hitter vs. Pitcher data little more than a fun exercise.

Well, that last part is certainly true. The venerable Derek Jeter hasn't faced a single pitcher 150 times, and isn't likely to see that many more chances against his number one, the even more venerable Tim Wakefield (currently 127). With 30 teams, interleague play and a great deal of player movement from year to year, this isn't just something that's going to be useful going forward.

What I'm stuck on, though, is landing on a number like 150 or 200. That seems crazily low to me, and I don't think it's supported by anything TCM wrote.

For one thing, TCM himself writes off several 150-plus-PA performances, like Ron Fairly against Bob Gibson and Joe Morgan against Don Sutton, as unhelpful. I think that alone tells you that the 150-PA cutoff doesn't work. I may not know a lot about the scientific method, but I am fairly confident that a data set in which you have to pick out as meaningful items that fit your preconceived notions and throw out the ones that don't is not a good data set. If Rickey Henderson's 117 PA against Frank Tanana mean something, then Fairly's 179 against Gibson have to mean something. If Joe Morgan's underwhelming numbers against Sutton mean nothing, then his .919 OPS in 136 PA against Tom Seaver means even less. The whole point of trying to identify the line at which these numbers become meaningful is -- or it seems to me ought to be, at least -- to make it so we don't have to make those kinds of judgment calls. If a guy has 150 plate appearances against another guy, that's it, we know how well those guys are going to do against each other. If that's true, then there's your line.

And I think it's abundantly clear that that's not true in this case. Brennan Boesch's first 267 plate appearances are a great example, actually. And I don't believe his success can be attributed to pitchers not knowing how to pitch to him; none of the new minor leagues he joined had that problem. I also don't believe it's explainable by streakiness. The Book showed that a hitter being on a hot or cold streak has no predictive value for his immediate futures. Streaks are streaks until they just happen to stop. Not that I don't think they happen sometimes -- maybe a guy really does just start seeing the ball better for a while, or is healthier, or whatever -- but I think they're almost entirely a creature of perception. I don't think it's substantially more likely that a hitter will have a "hot" 267 consecutive plate appearances than that he'll have a "hot" 267 plate appearances over his entire career against one pitcher.

So I think it's clear, by TCM's own examples, that 150 or even 200 plate appearances just isn't a big enough sample. But how big is big enough? I think the Rusty Greer example above shows that it's something more than four times that big. I can't think of any reason that Greer would actually do better in August, but back to his norms again in September. (And Greer, believe it or not, just happened to be the first player I thought to look up. Willie McCovey's OPS is 76 points higher in August than in July, in 3400 combined PA; Albert Pujols' jumps 93 points from July to August, in over 2300 PA; while Harmon Killebrew's drops 93 for the same two months, in over 3300 PA.)

If monthly splits don't work for you for one reason or another, look at lefty-righty splits. Switch-hitter Mark Teixeira OPSed .940 against lefties in 2010 (223 PA) and just .799 vs. righties (486 PA); in 2009, he was a little less great against lefties (.911 in 205 PA), but hugely better against righties (.951 in 499). It's certainly not hard to find jumps and dips like that from year to year, in a lot more than 150 PA. Again, Teixeira was the first player I happened to look at. Check out Ryan Howard's and Curtis Granderson's year-to-year numbers against lefties, to name two more commonly discussed examples.

The one big difference between these types of numbers and pitcher-batter matchups, of course, is that the latter all come against the same one guy, so it at least feels like it should be more dependable. Of course, that's an oversimplification to begin with; both pitchers and hitters are going to be entirely different players at age 25 than the same guys at age 35, and to rack up 150-200 PA against the same one pitcher, a hitter is going to have to face him in both phases.

But even beyond that, I just don't think it holds up. The 247 PA Duke Snider had against Robin Roberts (that we have a record of) constitute right around 3% of Snider's total career PA, about the highest you'll find among any of these guys. They came across 12 seasons, an average of 21 per year (and a high of 33). Is it possible that certain hitters get to know certain pitchers better, and vice-versa, and that this makes their numbers more reliable? Sure. But how do we know when that has happened, and how reliable does it make them? At any rate, I'm confident that it doesn't make the numbers reliable after even 200 plate appearances. There's just too much randomness, too many other variables with a sample that size, and the players' familiarity with each other is, at most, one countervailing consideration that doesn't come close to smoothing out all those uncertainties.

We'll close by going back to The Book again. It teaches that  after 200 plate appearances, one standard deviation for hitters with a .330 true-talent OBP is 33 points. What this means is that 68% of those players, after 200 plate appearances, will have an OBP between .297 and .363. Even after 1000 PA, the standard deviation is still 15 points, putting that 68% between .315 and .345; their true talent is exactly the same, but some of them will look pretty awful, some slightly above average. And that still only covers 2/3 of batters with that talent, the rest of whom will be more than one standard deviation away from that .330 level.

Account for some increased reliability because you're dealing with the same hitter and the same pitcher, and I'm still not close to trusting any numbers I see after 200 plate appearances. Get to 500, and then maybe we can start talking (using a lot of caveats). If you want to learn something, it seems you're better off with Rob's idea (not a new one) of studying how the hitter has done against "families" of pitchers -- not just against Matt Cain, but against a whole much larger group of guys who kind of throw like Matt Cain. Or even better, chuck matchup stats entirely and start tracking how hitters do against certain types and speeds of pitches from left- and right-handed pitchers. That might really tell you something.

All I'm saying, really, is that Eddie Mathews may have fared terribly in 229 plate appearances against Don Drysdale, but if they're both back in their primes right now, I'm taking Mathews over the next 229.


Anonymous said...

"Or even better, chuck matchup stats entirely and start tracking how hitters do against certain types and speeds of pitches from left- and right-handed pitchers. That might really tell you something."

Since you keep quoting "The Book", you may notice that the same book looked at the predictability of matchup stats against "families" of pitchers, exactly what you are talking about, and found little or no predictive value to that either.

Anonymous said...

I misspoke a little. The Book did not specifically look at types of pitches by speed and repertoire, only by general type, such as control or power, high or low K or BB, etc. And obviously platoon issues (L/R AND G/F) are relevant.

Bill said...

Right. And, the sets of "family" data they looked at varied from one-year to three-year samples, and I think the highest number of PA any one batter they looked at got against the family of pitchers in question was around 80. I would think you'd need to take much larger samples than that (by which point, maybe, either (a) the "family" gets too broad to be of any use anymore or (b) the data goes back so long that the hitter's skill set has changed by the time you get enough of it, but it seems to me it might be worth taking a harder look into).

Anonymous said...

Good article.

I would just note that there are absolutely discernable differences in hitter/pitcher matchups, but these are not clearly recognizable in statistical form.

If a batter smashes the ball 10 times, but always right at a fielder, he'll be 0 for 10 against that pitcher.

But anyone who saw those 10 at bats will know that the pitcher was lucky.

Similarly a batter who bloops 5 hits in 12 at bats against a great pitcher isn't good, he's lucky.

Brian Tung said...

Just a quick thought: I'm not sure exactly what the standard deviation on slugging percentage is per AB, but I'm guessing it's in the neighborhood of 0.8 or 0.9. That for OPS per PA is a little higher, probably pretty close to 1. (The s.d. for OBP per PA is probably very close to 0.5, but because it's correlated to slugging percentage, its effect is significantly reduced. There's also the effect of PAs that aren't ABs.) Close enough for government work.

That suggests that 100 PAs would be enough to reduce the s.d. on OPS to about 0.100, and 200 PAs would be enough to reduce it to 0.070. That still sounds to me like there's quite a bit of variation, but might be enough to identify really severe preferences.

Worth noting: Any sizable corpus will have "statistically significant" instances. Consider, for example, the ESP investigator who studied something like a thousand volunteers, and found one that was a "three-sigma esper!" The problem is that three-sigma (up or down) is approximately a 1-in-1,000 event, so you'd expect to see one volunteer score that high.