Ian hates player interviews. So, when he learned that his friend Mike Duffy was founding an interview blog, he was a little disappointed. However, he quickly realized that Mike was having immediate success. So, he decided, if you can’t beat ’em, join ’em! Ian Joffe is a 16-year-old Los Angeles native, who loves engineering, podcasts, swimming, cats, and especially baseball. In terms of MLB, he leans far on the SABR side, spending hours a day in the depths of BBR and Fangraphs. As Pythagoras once put it, “All is number!” He enjoys writing statistical analyses and having friendly arguments with the anti-SABR Mike. Let’s wish his Red Sox a great season!
Third base has become by far the deepest position in baseball, with player not even making the overall list that who could place near the top at other positions. Almost all ranked players are good enough to make an argument for number overall, but the blog ended up voting for Cleveland corner Jose Ramirez. Ramirez remains underrated after amazing in 2018 with a Trout-like 39 bombs, 34 steals, 146 wRC+, and 8.0 fWAR. #2 third baseman Alex Bregman, who we even outhit Ramirez with 157 wRC+, comes in second with a .394 OBP and 7.6 fWAR. Manny Machado, who played shortstop in 2018 but is moving back to third after signing for 300 million dollars with San Diego, now ranks third. Machado hit 37 home runs and stole 14 bases in 2019 with 141 wRC+, totaling up to 6.2 fWAR. He is followed by Nationals star (at least until the end of the season) Anthony Rendon, who topped 6 WAR and 140 wRC+ for the second year in a row in 2018. An outstanding 38 home runs, the least in four years, bring recent extendee Nolan Arenado to the fifth overall spot. While his defense dropped off a little last year, it remains good enough to bring him his third year in a row of 5 fWAR or above, 5.7 in 2018. The star glove of Oakland’s Matt Chapman carries him to the sixth spot after his 6.5 WAR sophomore year. 29 DRS wasn’t his whole game though, and he also had and extremely strong 137 wRC+. Former MVP Kris Bryant is #7, as despite experiencing a power outage, he gave the Cubs a .374 OBP, and hopes to rebound to his old form, when he put of three consecutive seasons of 6 wins above replacement. Dodgers’ hot corner hitter Justin Turner has had injury issues, but is a superstar when healthy, as shown by his .312/.406/.518 line and 154 wRC+. He places eighth. Veteran utilityman Matt Carpenter earned ninth overall slashing .257/.374/.523 last year. His OBP has not dipped below .365 for eight years and he just set a career high with 36 home runs as a 33-year-old. Eugenio Suarez hopes to improved even further off his 2017 breakout last year, when he hit 34 homers with a .366 on base percentage and 135 wRC+, just a tad below Carpenters’ numbers. He hopes help lead a new era of Reds offense in the coming year. Of course, no third base roundup would be complete without mention of superprospect Vladimir Guerrero Jr. While he didn’t make any list today, he could become a mainstay as soon as the coming season.
After experiencing an influx in a rookies a few seasons ago, shortstop has surprisingly become a beacon of young offensive talent. The position is highlighted by unanimous number one Francisco Lindor. He will miss the first month of the season on the IL, but he keeps the spot due to his 7.6 fWAR, 38 home runs, 25 stolen bases, and 14 DRS in 2018. Corey Seager barely played last year, but he was still selected for the second spot due to his remarkable consistency in the two before that. In 2016, he slashed .308/.365/.512 with 7.0 fWAR, and in 2017 his line read .295/.275/.476 with 5.9 fWAR. Infamous PTBNL Trea Turner ranks third. The speedster led the NL with 43 stolen bases in 162 games in 2018, and, while he hit a respectable .271, he hopes he can regain some of the hitting ability he showed when he hit .342 as a rookie. The still-young Carlos Correa follows Turner. Despite an awful 2018 in which he batted .239, he was an MVP candidate the year before with 152 wRC+ and 5.2 fWAR in only 109 games. Fifth overall is Andrelton Simmons, whose defensive wizardry (21 DRS last year, and he has never dipped below 19 in a season) brought his fWAR up to 5.5 last season for the Angels. After falling out from his 2016 power super-breakout, people thought Trevor Story was done by the end of last season. But, he proved the general public wrong last year when he hit 37 homers and stole 27 in Denver. Boston shortstop Xander Bogaerts had his strongest season yet in 2018, with 133 wRC+ and 4.9 fWAR. The pending free agent has had a WAR over 4.6 for three of the past four years. Despite his breakout, Javier Baez sits at eighth on the table, likely due to criticism of his high BABIP and low walk rate. Still, he put up strong totals with 34 home runs and 21 stolen bases. Yankees SS Didi Gregorius comes in at #9. His 1.156 April OPS was one of the best in history, although he fell off later in the season, and will miss considerable time in 2019. Finally, Phillies acquisition Jean Segura ranks tenth after hitting over .300 for the third consecutive year and stealing 20 bases for the sixth.
Second base scored as one of the weaker offensive positions on the diamond last year, but that didn’t stop us from filling the list with interesting names for next year. Despite falling from MVP candidacy to “mere all-star level,” Jose Altuve still leads the second base list with a .316 batting average and a power/speed combo that can threaten a 20/20 season, or better. Speedy Royal Whit Merrifield takes over the two spot, batting .304 with 45 stolen bases in 2018, a count that led an MLB that is emphasizing speed less and less. Underrated Reds’ keystone Scooter Gennett took home the three-spot after batting .310 last season. His modest yet distinct power broke out a couple seasons ago, and he smashed 23 home runs last year. #4 second baseman Ozzie Albies astonished in April, when he hit nine home runs with a 158 WRC+ mark. He fell back in the second half, but hopes to once again lead the Braves early and throughout the season in 2019. Brian Dozier had a down year last season, marked by a .240 BABIP, but he still hit 21 home runs with 12 steals. Dozier had 5.0 fWAR in 2017, and 6.2 the year before that. Gleyber Torres, entering his sophomore season with the Yankees, had a very solid season at the dish and ticketed third in AL Rookie of the Year Voting with a line of .271/.340/.480 in 123 games. He places sixth, just ahead of veteran Robinson Cano, who was dealt crosstown from Gleyber to the Mets over the offseason. Last year with the Mariners, Cano managed a .303 batting average with 2.9 fWAR in a season that was cut in half by a PED suspension. Another sophomore, Rays second baseman Joey Wendle, finished eighth on our overall chart. While his .353 BABIP draw criticism, it’s impossible the to overlook his .300 BA, strong speed and defense, and 3.7 total fWAR. Cesar Hernandez occupies our nine hole, coming off a season in which he got on base at a .354 clip and went 15/19 on home runs and steals. Finally, the 38-year-old Ben Zobrist rounds out the top ten. After rebounding from a horrid 2017, Zobrist put up 3.6 fWAR backed up by a .303 BA and .378 OBP.
While first base as a position has shallowed in recent years, the addition of designated hitters to this list made it one of the most talented. It’s highlighted by solidified stars a the top, and underrated breakouts at the tail. Boston DH J.D. Martinez, to whom much of the team’s 2018 World Series campaign is attributed, top the chart. Martinez’s bat was one of the most potent this decade, as shown by his 170 wRC+ from a .330 batting average and 43 home runs. The Cardinals’ newest import, Paul Goldshmidt, ranks second after putting having his sixth consecutive season of at least 130 wRC+ and fifth out of six seasons with at least 140 with the Diamondbacks last year. Goldy narrowly edged out the comparable Freddie Freeman, who earned the third overall spot slashing .309/.388/.505 in 2018, helping lead his Braves to a division title. 2017 NL MVP Giancarlo Stanton took a step back after being traded to the Yankees for 2018, but he still swatted 38 home runs while staying healthy for the second year in a row. #5 first baseman Joey Votto also lost some power in 2018 (which Ian wrote about here), but the walk machine maintained a 17.3%, leading to his .417 OBP. The Twins’ newest addition, slugger Nelson Cruz, comes in at the six spot. Cruz has hit at least 37 home runs for five years in a row now, and has maintained an on base percentage above .360 in four of the past five. Khris Davis has easily overtaken his homophone counterpart in this category, after setting a career high with 48 bombs last season and, perhaps even more impressively, having a batting average of exactly .247 for the fourth year in a row. The eighth spot on the list went to Cubs’ first baseman Anthony Rizzo, who finished 2018 with a .376 OBP, which is actually his lowest in five years. Coming in ninth was Dodgers breakout Dodgers breakout Max Muncy, whose .263/.391/.582 line led him to 162 wRC+. Despite never being a top prospect, he paced for a 6 fWAR full season, and hopes to build on his success. Rhys Hoskins broke onto the scenes in 2017, and continued to impress last year with a 34 home runs and a very respectable .354 OBP, earning him the final spot on our list.
For the most part, the American League was easier to try to predict than the National. The Astros and Indians should each win their divisions easily. Boston and New York will have a tough fight, but I really like the Yankees’ Paxton acquisition and given a bounceback by Sanchez and full season out of Judge, I think they have the upper hand. That rounds out the four AL superpowers, leaving no other clear option. I chose the Rays to the take the final spot because of my love of bullpenning, and continued belief that the strategy remains underrated. It could legitimately shave 100 runs off the pitching staff, and will make up for a less-than-spectacular offense. The Red Sox, however, will easily defeat Tampa in the wild card game, and move on to face Houston, who will out-pitch Boston to the championship series. Meanwhile, the Indians and Yankees will get locked in a 5-game series that New York just barely edges out. While the Yankees may continue to display strength in the CS, they are still defeated by the Astros, who return the World Series for the second time in three years.
The easiest division in the senior circuit to predict was the West. The Dodgers are one of the deepest teams I have seen, as assets like Max Muncy, Joc Pederson, and several pitchers continue to get underrated. While others have had trouble predicting the NL East, I think the Nationals will run away with it. It’s true that there are four teams that may contend to make the playoffs, but the Nationals have both the best hitting (even without Harper), and by far the best pitching (Scherzer, Strasburg, and the underrated Corbin could make a historic group). Early season injuries appear to be a potential issue, but I still expect a strong bullpen and better-than-Chicago offense to carry the Brewers to a division title over the Cubs, although neither team will win too many games. The Cubs pick up the first wild card spot, and are met by the Phillies, who out-pitch the Braves and out-hit the Mets. While I have the Cubs over the Phillies for the regular season, the Cubs lack a strong ace, and will be defeated by Nola in the wild card game. But, the Dodgers make quick work of the Phillies in the next series. The Nationals will also rather easily bring down the Brewers’ weak rotation to move past the DS (yes, I know I say this every year and yes, I know the Nationals never actually succeed). While Washington does give L.A. a run for their money, the Dodgers’ depth ultimately leads to D.C.’s defeat in the Championship Series, leading Los Angeles back to the World Series for the third time in a row.
Justin Verlander outduels Clayton Kershaw to win game one of the 2017 rematch World Series, as does Gerrit Cole to Walker Beuhler in game two. The Dodgers, however, pull through to win games three and four, tying the series at two-a-piece. The two teams split games five and six, leading to a rubber match (sounds familiar?), but Houston’s bullpen outlasts the tragic Dodgers’ once again in game seven, as the Astros become World Series champions for the second time in three years, and the Dodgers are handed yet another loss, in a kind of cruel Shakespearean/Sisyphusian crossover.
The American league has a clear top three. Mike Trout, Mookie Betts, and Jose Ramirez all put up WAR’s over 8.0 last season, and should repeat similar feats again in 2019. Trout is the best of the three, and will be making a handsome sum of money for the next 12 years because of it. The NL is much less top-heavy in both teams and individual players. I ended up going for Harper as MVP, mostly because I couldn’t find anyone who I think is better. Goldschmidt, a similar player, will finish second place with power, some steals, and a high OBP for St. Louis. I expect a lot of regression out of Christian Yelich, who had unsustainable BABIP, HR, and fly ball numbers in 2019, but I sneaked him in at third out of the probability that he does not regress as expected.
Another recent extendee, Chris Sale, will win the AL Cy Young award with an incredibly strong K/BB ratio. Verlander has been extremely consistent and just had his best season yet at 36 years old. Bauer’s ceiling narrowly brings him ahead of Verlander’s teammate Gerrit Cole for the third place spot, and his FIP from last season suggests a mid-two ERA could be coming once again. In the NL, Max Scherzer is a safe pick to carry the award after turning in his fourth straight season with an ERA under 3.00. Jacob DeGrom was incredible last year, and while some regression is expected, he will still be stellar after regressed numbers. Finally, the best pitcher of this generation, Clayton Kershaw, will win third place in voting, and that could go up if he can stay healthy.
I went with the easy pick for AL Rookie of the Year, and while I don’t guarantee Vlad’s success like many optimists are trying to do, he certainly has the best chance at it. If he gets brought up early enough, Rogers also has a chance to be an elite hitter, especially in Colorado. Cash and Counsell are similar in their masterful bullpen use, and that, in addition to the lineup manipulation both will need to make the playoffs, earns the two managers awards. Gary Sanchez is the best catcher in baseball, and will rebound to prove so this season after lucking into a dismal 2018 BABIP, and Josh Donaldson will also return to old form with his power and walks. Finally, I chose two pitchers: Bieber and Pivetta, to take out breakout player honors. They put up 3.42 and 3.30 xFIP’s in 2018, respectively, and should move towards those numbers, maybe even with improvement from experience, in the coming season.
This is my favorite part of the annual predictions column. I won’t be right on all or most of these, but I think all of them have a real chance at happening, and with each I’m trying to make a statement about a team or player.
Trout, Betts, and Ramirez put up the highest combined WAR in history out of three AL hitters. The three are all truly spectacular in all five categories, and will likely play more total games than last season.
Every member of the Dodger’s closing day rotation – that’s five of Kershaw, Beuhler, Ryu, Maeda, Hill, Stripling, and Urias – put up a better ERA than any other pitcher in the NL West. All seven have sub-three ERA potential, and the only strong other candidate I can think of outside of the team is the elderly Zack Greinke.
Matt Carpenter leads the NL in OPS. Carpy is known for his consistent on-base prowess, and hit 36 home runs last season.
Justin Verlander and Gerrit Cole combine for a 12.5 K/9, an extreme strikeout feat that looks possible based on their 2018 numbers.
The Indians have four starters with ERA’s under 2.80. While the rotation doesn’t have quite as much depth as the Dodgers’, there are still a plethora of ultra-talented arms (Kluber, Carrasco, Bauer, Clevinger, Bieber).
Anthony Rendon and Brian Dozier combine for 12 WAR for the Nationals. Rendon is an easier call to make up his share, as he remains possibly the most underrated player in baseball, and Dozier should bounce back to all-star form.
Jon Gray finishes with an ERA under 3.30. He manages to pitch in Coors, breaks the Rockies’ pitching curse, and becomes an extremely valuable asset.
The AL has four 100 game winners and four 100 game losers. There are some seriously good (Astros, Indians, Red Sox, and Yankees) and seriously bad (Orioles, Tigers, Royals, Jays) rosters in that league.
Less than 10 closers get 25 saves. Teams are finally starting to realize that the best pitcher should face the best batters.
Bullpenning becomes a largely accepted strategy in MLB, as shifting did a few years ago, and low-budget teams must start to search for the next big strategic advantage.
If you liked this article, please follow The K Zone on Twitter and be the first to know when more original research, opinion, and interviews, come out!
Images Attributed to:
Forbes Matt Carpenter
It is a well documented fact that Joey Votto is one of my favorite baseball players. I wrote my very first opinion article about how good he really was, and I have drafted him in fantasy baseball for several years in a row. However, this year my seemingly everlasting love for the Red’s first baseman hit a snag. Votto’s home run power plummeted in 2018 to 12 total bombs, his lowest full-season total ever, yet he did that despite maintaining his regularly high average exit velocity (88.1 mph) and launch angle (13.3 degrees). His line drive rate (31.4%) also remained exceptional. My first thought was that Votto was having a lot of near misses, balls were hit hard but died on the warning track. But, the Statcast data contested that theory too, as his barrel rate of only 6.7% matched his low home run total.
So, Votto had his normal high average exit velocity and strong launch angle, yet he was rarely getting barrels, which is defined as combination of the two. My theory became that he was still hitting balls hard and still hitting balls high, but in 2018 those types of hits did not coincide on the same at-bats. He had a lot of soft flyouts, and a lot of hard groundouts, but few well-hit balls angled for the stands. At first thought, one would think those two events — hitting balls hard, and hitting balls high — are independent. In other words, doing one does not make the other more likely on any specified at-bat. If this were the case, then Votto would be a victim of bad luck. One could expect his hard hits to coincide with his high hits at a normal rate again next season, and we can imagine 2018’s lack of intertwined hard and high hits like a low BABIP, where it will regress towards a mean. However, it is also possible that the two events are dependent, and that certain types of players are better at doing both at once than others. In that case, it is possible that Votto has experienced a legitimate decline in his skill level.
To test whether the events were independent or not, I examined data from 332 hitters that had at least 150 balls in play in 2018. The goal was to examine how often their hard hits and high hits actually coincided, versus how often they should have coincided, and to test whether those numbers differed by a reasonable margin. For this study, I looked at a statistic that I am calling crossover (CR), which is defined by a baseball hit with at least 99 mph of exit velocity and at least 22 degrees of launch angle. It’s similar to barrels, but a little less complicated. Barrels did not work for my purpose because their required launch angle differs based on exit velocity. The numbers 99 and 22 are admittedly somewhat arbitrary, but were decided upon by looking at where distribution of home runs started to accelerate. Crossover rate, or CR%, is defined as crossovers divided by crossover opportunities. A crossover opportunity, in turn, is the sum of a players hard hit balls and high hit balls, minus crossovers (so that crossovers only count for one at bat). The league average CR% was 13.1%, and Joey Gallo led the league with a 43.7%, although that number is over 10 points higher than the next best, which is Tyler Austin at 32.5%. From there, a right-skewed distribution starts:
Next, I made a formula to determine the expected crossover rate of every player based on their hard hit rate and high hit rate. A player’s total expected crossovers (xCR) is the product of his hard hit rate and his high hit rate, times his number of ball in play. To find expected crossover rate (xCR%), put xCR over the sum of hits and high hits minus xCR, like with experimental CR%. My final statistic was CRd, or crossover differential. CRd is defined as CR% minus xCR%, times 100 (to make it more readable). A positive CRd indicates that a player had more crossovers than expected, and a lower, negative CRd indicates that a player had fewer crossovers than expected. A CRd of 0.0 means that the player’s crossover rate is the same as the expected number. Here is the distribution of CRd:
Interestingly, the league average value was -2.3. The league leader in CRd was, once again, Joey Gallo with an astronomical 18.1, with Tyler Austin next at 10.5. After Austin came a new name, Matt Joyce, at 10.4. At the bottom of the charts was Yuli Gurriel, at -13.0, followed by Jose Bautista at -12.6.
The next step was to determine if having a higher CRd than expected meant that a player was lucky, or meant that a player was skilled. To do this, I analyzed how consistent CRd was between two halves. If a player’s first half CRd was predictive of the second half, it could be legitimate skill. If it was not, CRd is due to luck. Here is the scatter plot comparing the two halves for players who had sufficient balls in play in each:
From that plot, is looks like there is a very significant correlation between crossover differentials in each half. The statistics would back your eye test up, as the graph produces a resounding r value of 0.51 and a P-Value just over 10^-12, meaning the probability of crossover differential being entirely luck is, for all intents and purposes, zero. In fact, this makes crossover rate seem like even more of a controllable, intentional skill than the extreme peripheral of hard hit rate itself, which has an r value of 0.33.
If half-to-half correlation is strong, I would expect the year-to-year correlation to be even stronger, due to the larger sample. My assumption was correct:
This chart churned out a correlation coefficient of 0.61 and another near-zero P-value. Interestingly, there was a lot less variation in 2017 than 2018, and no extreme upper outliers. I can’t explain exactly why that is, but I can confirm that players like Gallo, who led the league in 2018, also did so in 2017, just with a lower overall number. To build on the case of the high stability of CRd, look at how close most players’ 2018 numbers were to their 2017:
Change in CRd (In Either Direction)
Almost a quarter of individuals differ by less than one percentage point between two years of tracking this statistic. 60% of players will deviate in CRd by less than 3 percentage points between two seasons. That’s a very low deviation between years, especially compared to very volatile statistics like batting average. To be honest, these results are the opposite of what I expected. I thought that hard hits and high hits would be independent of one another, and that differentiation would be up to luck. I thought that the statistic would regress to a league average, not a career average. But, it appears that my initial hypothesis was wrong. CRd is a very stable peripheral that is grounded heavily in the skill to do two important things at the same time.
Let’s get back to my friend Joey Votto. My original expectation was that he was getting unlucky by having his hard hits and high hits fall on different at-bats. I was wrong for two reasons. First, crossover rate is not up to luck. Second, his 2018 CR% was actually higher than his xCR%, 16.6% to 12.2%, so, even if it were luck, that would not explain his drop in power. Instead, we have to look at Votto’s case through what we do know: that crossover differential is based in skill, meaning if a player keeps the same skills, they should keep a similar CRd. Votto dropped 3.37 points in CRd between 2017 and 2018, from 7.68 to 4.31. That’s puts him in the bottom 20% of the league in CRd, which is a convincing argument that he has legitimately down-skilled. Votto is still an incredibly valuable MLB and fantasy asset due to OBP alone, but he is 35 years old, and I sadly must admit that it’s possible we will never see his old power totals again. In fact, based on what I have found in this article, I would not bet that he will hit for power again.
I used this set of stats to analyze Joey Votto, but you could, of course, just as easily apply it to any player. For your convenience, I have taken all the statistics invented for this article and written them into the following Google Sheet files: 2018 Stats 2017 Stats Stats Glossary
If you liked this article, please follow The K Zone on Twitter and be the first to know when more original research, opinion, and interviews, come out!
Up through 2015, baseball was noticing a troubling trend: Tommy John surgeries – in the major leagues, minor leagues, and even among youth – were on the rise. In more recent years, the number of torn UCL’s has started to fall back, at least among professionals, but the concern is still ever-present, especially given the 12-16 month recovery time and far-from-perfect success rate. The rise in Tommy Johns has led a lot of doctors and baseball analysts to chime in with theories on why so many more players are needing the surgery. In this article, I wanted to test a few of the leading theories on which risk factors are significant in increasing the odds of needing Tommy John.
I’ll get to my own research later, but I wanted to start with a few theories that have already been tested by others. The first have to do with pitch selection, and these theories are, to say the least, contradictory. Some hypothesize that an increase in fastballs thrown has led to the spike in Tommy Johns, but at the same time others argue that breaking ball usage ultimately does pitchers in. I had the hardest time finding research to back up the curveball theory. An entry to the American Sports Medicine Institute’s journal found that there is no correlation between throwing curves and needing Tommy John. In terms of the fastball theory, one study from the Journal of Shoulder and Elbow Surgery argues that there is a correlation between fastball usage and torn UCL risk. However, a later study (I couldn’t find the original link) from the American Sports Medicine Institute says that there is no correlation between pitch selection and Tommy John surgery. There is a potential lead here, but it’s not conclusive. High fastball selection may or may not be a Tommy John risk.
One theory that seems to have more widespread backup is that higher velocity can risk Tommy John surgery. This article, by the American Journal of Sports Medicine, suggests that higher velocity may very well lead to higher risk of elbow injury. Another piece, also from the AJSM, makes the same case, and goes as far as to say that pitch velocity is the most predictive element of Tommy Surgery, but it still limits r^2 to 0.07. Specifically, it suggests that peak pitch velocity, as opposed to mean velocity, is a risk factor. These findings are corroborated by this Fangraphs community research article, which details exactly how the data was found.
Based on all of that, it seems that while pitch selection is not a fully proven theory, there is evidence that high velocity leads to heightened Tommy John risk. That begs the question, “What can be done about it?” The obvious answer is “throw less hard,” but it’s very unlikely that pitchers will be willing to sacrifice an essential part of their game to reduce health risks. That especially goes for younger pitchers who are being judged for their tools rather than a career’s worth of stats. In today’s game, when draft signing bonuses are so large, and initial free agent contracts are even more massive, it is borderline unreasonable to ask a young pitcher to risk all their value to improve their health. Additionally, the players most at risk are high-velocity pitchers, and high-velocity pitchers are the ones who depend most on their speed (when combined with other tools), and are therefore least likely to be able to make a change without taking a potential hit to their value. In my research, I wanted to look at changes that I thought could be made in pitchers without really hurting their value.
One proposed theory for the increase in Tommy Johns is sports specialization. This theory is not only a logical causation, but is heavily respected in orthopedic circles, and seems scientifically sound. Unlike the other theories, there are pages after pages on Google that champion this one, but here are thefirstthree. As the theory goes, high school baseball players, especially those looking for scholarships, are always looking to gain a competitive edge. So, a few decide to do baseball year-round, in order to get better. Then, to catch up, others had to do the same thing. Soon enough, every serious baseball player was practicing baseball all year in high school. I’ve seen a few different colloquial explanations as to why this is bad – “the UCL needs rest;” “an arm only has to many bullets;” “one needs to strengthen different muscles” – and I’m not sure which is the closest to the real scientific explanation, but either way the negative aspects of specialization seems like a widely accepted theory among doctors and casual baseball fans alike.
To test the theory, I grabbed a data set of pitchers who had Tommy John surgery between 2015 and 2018. I then built a Python webscraper to sort through MaxPreps data, which keeps tracks of all high school athletes and their statistics. The program searched for the player on MaxPreps, and then checked how many sports he had played in high school. Unfortunately, I was only able to get data on about a third of the players who had Tommy John surgery during the given period. Some players had unusual last name configurations (I’m talking to you, Jose de Leon), others did not go to high school in the United States, and some went to high school before MaxPreps was founded in 2002 and later popularized. The biggest issue, though, was that several high school players had the same name as those who I was looking for. I was able to further filter my search using state, but if two people had the same name and played high school baseball in the same state, which happens more often than one would imagine, I had to remove them from my data set. In total, I was left with a sample size of 28, which while small, is still reasonable enough to mean something.
Out of those 28 MLB players who had a torn UCL, 7 played multiple sports and 21 only played baseball. That’s a 25% multiple-sport rate. In a control sample of random baseball players who I could specify on the MaxPreps database, 155 out of 596 played multiple sports in high school, or 26.0%. Based on this, it is safe to conclude that I found no evidence that sports specialization is a Tommy John risk (my chi-square derived P-value was a hardy 0.904). To be clear, I was dealing with a limited sample. My research also says nothing about the very real risk of needing Tommy John surgery while in high school. But, based on that, I see little reason to believe that playing multiple sports in high school leads players to have significantly better odds of staying healthy in the majors.
The most common method of preventing injury in MLB is the pitch count. Every team practices it, and pays special care to number 100. According to common knowledge, high pitch counts risk injury, and managers will take pitchers out when the count gets high because of it. That’s not to say that injury risk is the only reason pitchers are removed in the late game; batters get better multiple times through the order, pitchers get worse as they fatigue, and relievers are often just better than starters. But, injury risk is usually part of the equation, and almost every manager would probably say that high pitch counts do risk injury. So, pitch count is the second factor that I set out to test.
For this test, I gathered data on starters alone, because they are more similar to each other in use (at least for now). Like last time, I was not able to look at every starting pitcher in my data set, so I once again ended up with a very small sample, only 13 starters. So, for the last time, I want to reiterate that because of that, my research is more of a starting point on the subject than an end. Anyways, the first look I took at pitch count had to do with the game of the injury. Here were the results:
The points seem to be scattered rather randomly across the number line. The chunking of data points is a little odd, but I would expect the gaps to fill as n increased. The overall lesson here, though, is that pitch count does not seem to contribute to torn UCL risk. A pitcher is about as likely to tear their ligament on the 40th pitch as on the 90th. In fact, one might note that there are zero data points past the 100th pitch. In a larger sample, there may have been a few, but the point is clear: removing pitchers before or around the 100th pitch does absolutely nothing to decrease injury risk within that game. Tommy John risk does not increase as the game goes on, and players should not be pulled early simply to avoid getting hurt, because that does not work.
While pitch count has no influence on injury odds within a game, it is possible that high pitch counts have a hangover effect, making a pitcher more likely to get hurt in their next start. So, looking at the same lucky 13 pitchers, I charter their pitch count from the start before the one in which they got hurt:
The average among these pitchers was 85.3, 7 pitches below the league average for a start. Even when those lower two outliers are removed, the mean only goes up to 93.5, a pitch and a half above the league average. From this data, it does not appear that lowering pitches in the previous start leads to lowered injury risk overall, for pitchers who got hurt threw about the same number of pitches in their pre-injury start as the average pitcher who will not get hurt. So, that’s one more reason that pitch count should be ignored as a factor for injury risk.
Pitch count on a start-by-start basis appears to be a complete non-factor in the Tommy John question. Still, though, I wanted to give pitch count one last chance and take a look at seasonal trends. Perhaps each individual start is insignificant, but if a player throws too many pitches in one season, they become a higher risk for the surgery. So, to test this, I found the amount of pitches thrown in the season of the injury for Tommy John recipients.
Think of the blue histogram as an extended version of the league average dot, showing how many pitches every pitcher has thrown per season from 2015-2018. The singular red dots exist only on the x-axis, and show the seasonal pitch count of injured pitchers. For the first half of the graph, the density of the red dots seem to match the density shown by the blue histogram. But then, there are no dots at all in the second half on the chart. This shows that seasonal pitch count has no effect on injury risk. Injured pitchers did not throw more pitches than other pitchers. In fact, injured pitchers are completely left out of the upper range of the graph, a range that many healthy pitchers got to. This is to say that there were no pitchers from 2015 to 2018 that got Tommy John surgery because they threw an abnormally high amount of pitches in the year of their surgery.
Single-year trends showed no evidence that season pitch count had an effect on Tommy John risk. The very last step of my study was to examine multi-year trends. First, I took a look at changes in pitches from year to year. Often, teams will say that normally they would not cap a pitcher, but because he only threw so many pitches in the previous season, that number could only increase by a certain amount next season. For example, if a pitcher throws 1500 pitches in 2017, the manager may conclude they can only throw 2000 pitches in 2018. To test the theory of pitch increase, I charted the percent change in pitches per season for injured starters. Unfortunately, I did not have access to minors pitch data, so I had to remove rookies from the set, once again shrinking the sample.
Like all graphs before, the pattern on this one shows how small the effect of pitch count is on injury risk. Most pitchers who needed Tommy John threw far fewer pitches in the season of the injury compared to the season before, not far more. Only one pitcher experienced a severe workload increase, and only two had small workload increases. Torn UCL’s had no tendency to occur more often to pitchers with heavy workload increases.
Since two-year trends seemed to have no effect on Tommy John risk, the next multi-year trend to turn to is a player’s career span. I don’t have the pitch count data for the players’ careers, and even if I did it wouldn’t mean much because it wouldn’t account for the bullets on their arm in high school, college, and perhaps most significantly MLB practice. However, I do have player ages. In the career-pitches “the arm only has so many bullets” theory, it is suggested that as players tear their UCL’s after a certain number of career pitches, thus as a pitcher’s total career pitches increases, their odds of hurting themselves increase too. Age increases at the same rate as total pitches, so it would follow that as age increases, likelihood of surgery also would increase. This is not the trend that I found in the real player ages. The average MLB player age in the seasons that I studied was 28.9. Rather tragically, the average of a Tommy John pitcher was a year younger, 27.9. This discredits the theory that injury risk increase is directly proportion, or even somewhat proportional, to career pitches thrown, as players who were younger actually turned out to have the higher injury risk in the data.
From all my research and all the research of my peers, we are left with few clues about the causes of torn UCL’s. The only useful piece of information was the study about fastball velocity, but even that just barely had any predictive power. This inability to find causes does not signify a weakness in modern research, but rather a weakness in the traditional views of health. It’s easy to look at injuries as directly caused events. Just like how I could stub my toe because I jammed it against the wall, a pitcher tears his UCL as a direct effect of more complex causes. Instead, health should be looked at as a skill, like, for example, batting. Everyone is born with some degree of batting skill, whether it be very high or very low. People can improve on that natural ability through techniques like diet and practice, and then bring their total skill level to the plate. But, once at the plate, their chance of getting a hit is rather random. A .333 hitter, for example, gets a hit one in three at bats, and it is more or less random which of those three at bats he got a hit in. Similarly, players are born with good or bad health skills, which they can work on improving through techniques like proper stretching. But, once they bring that health skill to the table, which may be represented by a probability of injury, the likelihood that an injury occurs within that probability is more or less random. It’s impossible to know if an injury is more likely on the 10th or 100th pitch of a game or season because the pitch at which the injury occurs should be viewed as randomized. This skill-based view of health is more accurate to the data, and scientifically assumes the null hypothesis. If teams thought like that, it may lead them to more successful pitcher use.
If you liked this article, follow us on Twitter and be the first to know when more original research, articles, and interviews come out on the site.
– The K Zone – December 3rd, 2018 One in 49 Million, by Ian Joffe
The hitting streak is among the most exciting phenomena of the game of baseball. We like to think them as incredible feats, accomplished only by a unique combination of mental and physical skills manifesting themselves over a month-long period. There is another view, however, on the creation of hitting streaks: that they are actually statistical likelihoods which are all but bound to occur within a given period of time, controlled by data’s randomness alone. Both explanations seem reasonable. The perfectly robotic sabermatrician would argue for the latter, for in a game driven by statistics, things like the hitting streak can be predicted rather perfectly using data and probability. But the first argument, too, has logical merit. Players are human, and it’s very possible that they are able to get “locked in” to some mechanical or psychological state that increases their odds of getting hits in each game.
To determine which argument is true, and if hitting streaks exist as anything more that statistical illusions, I compared data from baseball reference‘s play index about real hitting streaks to simulated data from a python program I wrote that determines the odds of certain hitting streaks occurring over a given time period. If the real MLB data matches the statistically expected data, it is reasonable to assume that real hitting streaks are based in nothing more than statistical probabilities, but if the MLB data is distinguishable from the expected results, it would appear that there is something special going on with players who have lengthy streaks.
To find the number of expected streaks in a given period of time, one must apply a geometric distribution, which is based on a string of events, each of which is labeled a success or a failure. The probability of a success, denoted by p is, in this case, a game played without a hit. A failure, then, is a game with a hit. To find the number of trials (games) it takes for a batter to not get a hit, or the number of consecutive games with a hit before a batter fails to get one, one applies two conditions. First, a batter must fail to get a hit in the game in question (p), and second, the hitter must get a hit in all previous games ((1 – p)x-1), where 1-p is the probability of a hit (or more specifically, the odds of not not getting a hit), and x is the number of the games in the streak, the last game being the one without a hit. So, the formula for the expected frequency of both conditions to occur is the product of the two, or(1 – p)(x-1)(p). The data that I used extends from 2000-2018, over which the MLB batting average was .260. The average player had 3.134 at bats per game during that period (although this is a very, very slight overestimate because in order to avoid adding too many games without at bats for players like AL pitchers, I had to purge from my data players with less than one at bat per game on average). So p, the probability of not getting a hit in a game, equals (1-.260)3.134, or 0.389. From this, I was able to plug in and find the expected number of each length of hitting streak.
To find the real number of hitting streaks since 2000, I wrote a script that put together data from baseball reference’s play index. The longest hitting streak in that period is Dan Uggla’s 33-game streak back in 2011, so I calculated the odds of each streak length up to there. Here were the results:
Looking at the shorter streaks where length < 9, the expected values are actually greater than the observed values, which suggests that getting in a short groove has no psychological or mechanical advantage. Having a three-game hitting streak does not make a player any more likely to have a four-game hitting streak. So, where did the extra frequencies go? For starters, the observed one-game streaks is much higher than the expected, which is strange. I have no explanation for that. But, a lot of frequencies went to longer streaks as well. Here’s the graph zoomed in on lengths > 10:
There’s a critical point after about 10 games where the observed frequencies overtake the expected frequencies, and they do so by a very significant amount. The chi-square P-value was way under 0.001. That’s probably because this effect becomes even more exaggerated as the hitting streaks get longer. Here’s the data for hitting streaks longer than 20 games:
The observed values start to lose their perfect exponential curve because of the smaller sample, but the effects are still very clear. Very, very few hitting streaks over 20 games are expected. Yet, many occurred. In total, the model expected 10.28 hitting streaks longer than 20 games in the 19-year period. We got 81 – an increase by nearly a factor of eight. The model predicted 1.49 hitting streaks of 23 games. The actual value: 14. The odds of a hitting streak like Dan Uggla’s occurring during the new millennium were just over 1 in 100. I would say we should consider ourselves lucky to be able to see such incredible statistical feats – and we are – but this is clearly more than luck. There is no way that so many of these lengthy hitting streaks occurred in a non-mental, non-physical game of randomness. While there is little evidence to suggest a 4-game hitting streak is any more likely than expected, it is clear that players are far more likely to go on hitting streaks over 20 games than statistics would expect. A player who already has a hit in 22 games is much more likely than expected to get a hit in the 23rd. This is probably because there’s little pressure involved on a short streak. I doubt a hitter would even be aware that they have a hit for four games in a row. But, as the steaks climb above 10 and 20 and the media starts to pay attention, it’s impossible not to be aware of them. For the players who perform well on the big stage, they start to improve. Based on the data, we can be all but certain that the mental factor is there.
I found this a rather relieving conclusion. Some of my previous articles, like those about taking revenge on old teams, or players on their birthdays, found little evidence for a mental factor in baseball. They suggested that the game is perfectly predictably random. This data, however, suggests otherwise. It shows that there is an element to how hitters perform above the statistics. It’s still incredibly scientific – my opinion is that psychology and next level sports medicine will be the next Moneyball-esque breakthrough in the game – but it shows that players are more than numbers. I love statistics, which you know because you just read my article, but it’s still nice to think that players operate on a field above the random, and from this, one can argue that they do.
Of course, I couldn’t finish an article about hitting streaks without mentioning Joe DiMaggio. His 56-gamer in 1941 is still the gold standard for hitting streaks, and feels as unbreakable as a record gets. The purely statistical odds of any player having such a streak since the dead ball era are 1 in 49,000,000. In other words, he did something in one short century that should have taken five billion years, the literal age of the Earth, to accomplish. Yeah, DiMaggio was pretty great.
If you found this article interesting, make sure to follow The K Zone on Twitter and be the first to know when we post brand new research and interviews. Thanks!
Sources Cited: Fangraphs Baseball Reference Ms. Christine Robbins Statistics How To