Thursday, March 22, 2012

Negro League Database Diving

By The Common Man (with an assist from Bill)

This morning, Baseball Reference dropped a bombshell when they went live with their Negro League database that covers 1903-1948. This represents the most complete public airing of Negro League statistics that we’ve ever seen, and baseball fans everywhere should be incredibly grateful to the National Baseball Hall of Fame, Seamheads.com, Sean Forman and his Baseball Reference team for making these publicly available. What a treasure trove of data.

Previously, we’ve had to rely on incredibly incomplete data and oral histories (much of which have been wonderful to read and hear, though they are highly subjective) to try and understand the black game in the age of segregation. This shines a beacon on a terrifically understudied and little understood part of baseball’s history.

The Common Man and Bill spent much of the morning combing through the stats and passing little treasures back and forth. We have a lot more to do to get a more complete picture of the database, but here are our ten favorite things we learned this morning:


#10 The first player to hit 20 homers in a Negro League season was either Heavy Johnson or Candy Jim Taylor in 1923.

Johnson was a star who got a late start in organized ball, after spending much of World War I in Hawaii playing for the 25th Infantry Wreckers with Bullet Joe Rogan. In his first full season, he hit .390/.406/.720 in 226 plate appearances. The next year, he topped that, hitting .405/.460/.717 with the aforementioned 20 homers in 426 plate appearances. He continued to star for another 5 seasons in the league, though he’d never hit more than 5 homers in a season ever again.

Taylor was 36 before the Negro National League was founded in 1920, but had played professionally since he was 19. Taylor had little success in his first go arounds in the league in 1920 and 1922. But in ’23, at 39 years old, Candy Jim’s bat was extra sweet. In just 244 plate appearances, he also hit 20 homers with a .372/.438/.712 line. He would hit just four other homers in the rest of his career, but went on to be a successful manager of the St. Louis Stars, Richmond All-Stars, and Homestead Grays.

#9 Cool Papa Bell didn’t steal many bases.

That’s probably not an accurate statement. Bell probably stole a lot of bases that simply weren’t counted for one reason or another. There are definite holes still in the data, and the research teams are working to fill them. What we know is that he’s credited with 132 steals in 21 seasons, but that just doesn’t seem credible. We also see a season where he’s not credited with a single walk in 275 PAs. That’s probably a missing data too. As it is, at .316/.363/.420, he seems far more like the Lou Brock of the Negro Leagues (albeit a Lou Brock who is an excellent defender in center field, by all accounts) than Tim Raines or Rickey Henderson.

#8 The Homestead Grays really didn’t like to walk.

Speaking of holes in the data, as Bill pointed out to me, according to the stats we have available the Grays walked one time in more than 1900 plate appearances. The lone walk was credited to Josh Gibson, which raised his OBP from his .486 batting average all the way to a .489 OBP. This lax record keeping did not carry over to the pitching side, where Grays pitchers were hung with 198 BB in 636.2 innings. It’s a bummer, because it would be nice to see where all those walks went. But this is one of the major problems when you’re reconstructing stats from box scores and incomplete records.

#7 Josh Gibson was a freaking god.

We already knew this, yes. But we didn’t really have much beyond first-hand accounts to go on. But look, for instance, at Gibson’s 1936, when he hit .451/.526/.756. Or 1937, when he hit .392/.422/.907. Or even his .365/.500/.865 1939 campaign. Or maybe you’d prefer183 plate appearances of .486/.489/.862 (remember, that OBP is way low) in 1943. For his Negro League career, Gibson is credited as hitting .359/.413/.644 as a catcher in 13 seasons, before dying at 35 before the 1947 season. The highest MLB OPS by a catcher with more than 1500 career plate appearances is Mike Piazza’s .992. Gibson beats that by more than 60 points, and that’s with at least an entire season's worth of walks not counted in Josh’s final record.

#6 Satchel Paige was too.

Another of the giants of the Negro Leagues, Satchel Paige, doesn’t look so great at first glance. His 1922 season, for instance, saw him allow 4.55 runs per 9 innings for the Birmingham Black Barons. And his RA ranged pretty consistently between 3.00-4.00. But look closer. Consider, for one thing, that Paige was working with bad fields in what was a high offensive era in the MLB history. And while it’s hard to see without a RA available on team pages, he looks to have consistently been the best pitcher on his own teams. Finally, consider his strikeout rate. As Bill pointed out this morning, he’s credited with striking out over eight batters per nine innings for his Negro League career. Meanwhile, in Paige’s prime from 1920-1936, when he struck out 8.1 batters per 9 innings, the Major League average for K/9 was around 3.3, and the highest mark of any Major League pitcher was Dazzy Vance’s 7.6 K/9 in 1924. In other words, Paige struck out almost three times as many batters as Major League pitchers on average, and K’ed more on average than any Major League pitcher could do at his best. That’s remarkable.

#5 There was a huge talent gap between the best and worst players in the Negro Leagues.

This is probably something we should have expected, especially in the early days of the Negro National League. There were huge stars, but there were also incredibly bad players. It’s similar to how the National Association was in 1871, or the National League in 1876, or the American League in 1901, or the Federal League in 1914. Some black players were slow to jump to the league, waiting to see how it fared before leaving secure jobs elsewhere. So we get players like Bingo DeMoss, a second baseman who played regularly from 1920-1928 and had a career .235/.296/.285 batting line, which would be the 42nd worst OPS of all time among players since 1900 with more than 2000 plate appearances. His .371 OPS in 1922 (he “hit” .149/.203/.168) would be tied for 9th worst since 1900 among players with more than 175 PAs, and he kept his job for six more seasons. (By the way, four of the seasons worse than Bingo’s belong to the immortal Bill Bergen.)

#4 Pop Lloyd was an ageless wonder.

Bill found this out and it’s amazing. From 1924-1929, his ages 40-45 seasons, Pop Lloyd hit .358, which is 14 points higher than his career batting average. And he wasn’t playing sparsely either. He was in the top 20 in the league for plate appearances in 1929, and was 7th in the league in batting average (.370), 5th in OBP (.430), and 8th in slugging percentage (.541). After that, he spent three more seasons traveling around and barnstroming, hitting .369/.409/.431 at 46 years old and .400/.438/.600 in 16 plate appearances as a 48 year old.  He seems to have stopped playing after that, perhaps after ascending to heaven in a chariot of fire.

#3 We don’t know nearly enough about these players.

Charles Smith was a mighty outfielder that Satchel Paige called one of the two best hitters in the Negro Leagues (probably after Gibson). He packed a wallop in 1927 and 1928 but barely played for Brooklyn the Eastern Colored League (he may have been barnstorming, or playing elsewhere for most of those seasons). In 1929, however, he settled in with the New York Lincoln Giants for the full season, and hit .465/.538/.994 with 19 homers in the American Negro League as a 28 year old. He never played in the Negro Major Leagues again. Instead, he barnstormed and hit .429/.531/.701.  Playing in Cuba in 1932, however, he caught yellow fever, and died. But what did he do in 1931? Where did he play? How did he do? What was he doing during those years when he could have been playing the Negro Leagues, but apparently chose not to?  Lost.

#2 There are probably still more Hall of Famers we could cull from this data.

This is the first time either of us have heard of Tubby Scales, a second baseman who played 19 seasons in the official Negro Leagues, and barnstormed with independent black ballclubs in several others. In 1923, a 22 year old Scales hit .408 to win the Negro League batting title, besting Heavy Johnson, Cristobal Torriente, Oscar Charleston, Bullet Joe Rogan, and Turkey Stearnes. For his career, he’s credited with hitting .316/.386/.510 in 1838 plate appearances (numbers brought down significantly by his final 5 years in Baltimore when he was between 41 and 45 years old). Plus, he must have been similarly successful as he toured the country. His .896 career OPS would rank 2nd among all 2B since 1900 with more than 1800 plate appearances, behind only Rogers Hornsby.

#1 The greatest season in baseball history might just belong to Bullet Joe Rogan.

Rogan is an incredibly deserving Hall of Famer, who won 117 games from 1920-1928 and had a career RA of 3.66. While probably not a better pitcher than Satchel, Rogan was a much better hitter, raking at a .343/.395/.522 for his career. Indeed, in 1929, unable to pitch anymore, he still played regularly in the outfield, getting 312 plate appearances and hitting .356/.443/.564 (his 1.007 OPS was 7th in the league). But his best season, and perhaps the greatest season in baseball history, was undoubtedly 1925. Rogan won 17 games (and lost only 2) for the Kansas City Monarchs, allowing just 2.31 runs per game in 171.1 innings (which seems to be a really low mark, from what we've seen). He completed 17 of his 18 starts, and had 5 shutouts. He also came to bat 160 times over the course of the season and hit .381/.442/.590 with 7 doubles, 8 triples, 2 homers, and 5 stolen bases. It looks like the Monarchs played between 85 and 90 games that season, to give you an idea of how his numbers might look extrapolated over a full year.

--------------------------------------------------------------------

And that’s just the tip of the iceberg. The Common Man and Bill can’t wait to dig through these records more, and hope against hope that more data will come to light that will paint an even more detailed picture for historians and fans who are interested. And from the bottom of our cold, black hearts, we want to say thanks to everyone involved.

4 comments:

Josiah said...

That was certainly interesting, I'll have to comb through the baseball reference stuff, too. One of my favorite parts of the New Bill James Historical Abstract was reading his section on the Negro Leagues.

Lady Wezen said...

This was great. Good work you two.

Mike Lynch said...

Thanks for the shout out. We're hoping to have fresh data at Seamheads.com soon, including some nice features that will complement our Negro Leagues database as well as the one at Baseball-Reference.com. Stay tuned!

Vidor said...

Is RA just another way to write ERA, or is it a different statistic?