-The K Zone-

February 2, 2017


Statology: One Through Nine, by Ian Joffe

“I [stat], you [stat], he she me [stat]s, [stat]ology, the study of [stat]s! It’s first grade Spongebob!” -Patrick Star

I have long criticized the Angels on their strategies and decisions, but one thing they do (now less often than before) that I admire is on occasion, batting Mike Trout second. Statistics have a lot to say about batting orders, and it is all rooted in what I like to call “statology”. To explain what I mean by this, I want to take all of you back to differential calculus. You likely have forgotten differentiation, whether it be unintentionally or very intentionally, so as a reminder, the idea is that you can take an equation for a variable, but then unpack the equation for a new equation that affects the original one. For example, if an equation represents the position of an object, you can unpack, or differentiate that to get the equation that affects position, or speed. Then you can differentiate again to find the equation that affects speed, which is acceleration. When you differentiate statology, you get plain stats. In other words (literally, Latin words), statology means the study of stats. When I wrote about Joey Votto, I drew conclusions based on the stats alone. Now, in discussing lineup orders, you have to take that extra step. My conclusions will be based on simulations (a form of statology), which when unpacked, are based on stats. Stats are the differentiated version of simulations.

Moving from calc to high school statistics, one learns that sample size is of high importance. This is never a more important rule than when dealing with statology. You can eliminate error down thousandths of a percent and can greatly exaggerate accuracy by increasing sample size. This is where the simulation comes in. If I wanted to use MLB data to determine how good the Angels’ lineup construction is, I may only have a few hundred games to study, and a relatively high margin of error. However, with simulations, a good processor, and time, we can run as many games as we want, and get the margin of error down to a minuscule number. We can also control other variables, like opponent and weather. For these reasons, statology’s simulations are a widely accepted way to draw conclusions, such as how a good lineup is built.

Simulations are very useful for disproving information that we previously thought was clear fact. It would appear obvious that a hitter batting first get more AB’s, and therefore end more games than a two-hole hitter or 3-hole hitter, but simulation evidence shows otherwise. BOOFigure1b.jpgSource: http://www.fangraphs.com/community/where-to-bat-your-best-hitter-a-computational-analysis-part-1/

In actuality, the second and third hitters end more games than the first hitter, while the fourth and surprisingly ninth hitter end nearly as many games. So, it is a myth that you should bat your best hitter first to get him the most opportunities. As evident from the simulations, it would be much more logical to put your best hitter second or third.

It is not only important to get at-bats, but it is important to hit when runners are on base. Traditionally, people have thought that the cleanup hitter will most often bat with ROB. This is somewhat true, the fourth man will come to bat with lots of RISP, but the second hitter will bat with RISP almost as often. And, considering the second hitter is more likely to get an extra at-bat in the game, it is a superior strategy to put your best hitter second, in terms of power and overall talent.

Your leadoff hitter is an interesting case. Sabermetrics have never been big fans of the stolen base. It usually hurts to get caught more than it helps to advance. This goes back to the original Bill James idea, that outs are a finite resource. The true goal of offense is not to score runs, but to avoid outs. So, with baserunning valued less, what should be valued in the top spot? On-base. OBP is the most important thing a leadoff hitter can do in order to give more opportunities to the power-hitting two-hitter.

Some NL teams have tried putting their pitcher eighth in the order, rather than the traditional last. This does check out to be a useful strategy, the idea being the guy who hits last can get on base for the 2-5 mashers. However, with this rare exception, the 6-9 batting slots should generally be put together in descending order, with the worst hitter batting last.

In summary, your best hitter should hit second in your lineup, for he will, on a game-by-game basis, produce about as many at-bats as anyone else, and be presented with stronger opportunities than others. Your three and four hitters should have power, while your leadoff hitter needs on-base skill. Presented by simulations, there is a clear strategy to batting order. But, how much does this really matter? If you look back at the graph by Fangraphs a few paragraphs ago, the x-axis differs by tiny units, with a range of maybe 1%. Other data shows similar patterns, with there being some difference, but very little between batting order position. So, next time you think your team lost the game because that “stupid manager” messed up on the lineup card, you may be right in criticizing him, but any change probably would have resulted in a similar game.

We would really appreciate if you followed us on Twitter and Instagram, or checked out some more great content like my  dissection of the statistic WAR or Mike’s passionate argument about rookie salaries.





Image Attributions: