This post is sort of a preview of something I'm working on. But one aspect of it was so interesting and relevant to today's discussion I thought I'd share.
One of the question that I consider is who should bat leadoff? Andrus or Kinsler? My gut says Kinsler because of the more complete package he offers as a hitter... but I wanted to back that up with quantifiable information.
I did some poking around to try and find some lineup optimization metrics to help answer the question. That led me to the use of a Markov chain simulator to derive the impact of various batter skills to weight how they may have more impact on various batting order positions.
If you don't know, a Markov simulator takes an event and the probabilities of the outcomes of that event and using a random number generator determines an outcome. Then it moves on to the next event in a chain. Simulating a chain of events over and over again provides solid data at the value provided based on those probabilities.
Think of an event as being at the plate. You can either Walk, Single, Double, etc. And there are percentages tied to each outcome. You roll a die to determine what your outcome is and then move on to the next event.
However, none of the work I'd seen using Markov simulators for Lineup Optimization use base running information in their simulation. They assume a runner only advances one base at a time, never steals a base, and is never out on the base path.
I really wanted to see the impact of base running, so I built my own Markov simulator. I incorporated traditional hit outcomes, grounding into a double play and number of DP opportunities, going from first to third on a single (or getting thrown out), scoring from first on a double (or getting thrown out), scoring from second on a single (or getting thrown out) and attempting to steal a base, get picked off or caught stealing.
Most of the base running statistics are available on BR.
After simulating a million games per lineup, my margin of error was around .04% with a 99.96% similarity achieved in the various percentages.
The most fascinating thing happened when I started running various lineups using 2010 data.
Mitch Moreland was consistently the most valuable person to have lead off. It never occurred to me that would be true... I entered the process just planning to evaluate Andrus and Kinsler, and wondered about Moreland maybe in the 2 slot. But when you think about it... it kind of makes sense. He has a high walk rate... decent power... not a bad baserunner.
And when you just consider the elements I didn't include in my simulation, such as his ability to work a pitcher and being left handed. Creating a nice L - R - L - R top of the lineup. It's pretty interesting.
That said... this is premature. Mitch's 2010 data is a small sample size and not necessarily what he'll do in 2011. It also takes 10 seconds to run 1,000,000 games on my simulator. Which is fine for doing a few... but that's two months for the 380,000+ combinations of a nine man lineup.
Over the next week, I'm going to distill the linear weights of the various elements in my engine to assign run values so I don't have to actually simulate the games to determine the lineup value. As well as evaluate various projections such as ZiPS, Bill James and BP.
Here's the 2010 data I used last night just to get started.