Baseball Stat NERD

by user DNL

In the ongoing struggle between statheads and skeptics, one of the stronger arguments of the latter sect is that sabermetricians tend to ignore the situation and, instead, focus on the abstrat. That was one of the result of reader mail based on not one but two of my older articles. After all, I was arguing that a solo home run and a grand slam should count the same, as far as the batter is concerned -- it is neither to his fault nor credit that the bases were empty or juiced. So, why should it affect his stat line?

But dissenters raised an interesting and potentially valid point: regardless of who caused the baserunner situation, the batter has a greater duty in the bases loaded situation than in the bases empty one. We probably should give more credit to the guy who hits the grand slam because at the end of the day, he's doing more for his team. While you can't credit nor debit the batter for the situation they're presented with, they argued, you certainly can adjust their statistics based on how they handle the situation.

I don't know if I buy that. However, a few years of contemplation have lead me to a third path. Let's not credit the player for the situation with which he is presented. But, let's ask that he leave it no worse than how he found it.

Not only can we measure this, but it turns out to be pretty easy.

A few years back, The Hidden Game of Baseball published a chart showing the "expected runs" a team would score in an inning, given the number of outs thus far recorded and the baserunner situation. The data was based on the 1983 season, but a few years back, Baseball Prospectus re-ran the grid for 2003:

The rows -- 0, 1, 2 -- are the number of outs when the batter's at bat begins. The columns -- None, 1st, etc. -- are the baserunner situation. The data cells are the number of runs the average team would score for the rest of that inning, given the out and baserunner posture. For example, with one out and runners on first and second, we would expect the team to score 0.909 further runs that inning. "Further," that is, because we ignore how many runs were scored thus far in the inning. Let's take an example.

Rickey Henderson leads off with a home run. Carney Lansford strikes out; Mark McGwire singles. Jose Canseco walks, and up comes Dave Henderson. The posture of this at bat: one out, runners on 1st and 2nd. Expected runs ("ER"): 0.909. The fact that a run has already scored is immaterial.

The only data point missing is "three outs," which clearly has an expected run total of zero.

What I am proposing is simply this:

Let's look at each individual plate apperance and ask three questions:
 * 1) What would the team have scored on average, given the posture of the situation (based on outs and baserunners), when the player came to bat? (Note the minor caveat below.)
 * 2) What would the team have scored on aveage, given the posture of the situation when the player's at bat ended, and the actual runs scored during that at bat?
 * 3) How much better or worse did the player leave his team?

That last question is the stat we are after.

Application
Let's use the 2003 numbers as if they were accurate throughout time (which they're not, but for ease of discussion, it matters little). And again, let's look at the Henderson through Canseco plate appearances, above.

In order to determine the value of a player's at bat, all we need to do is:
 * 1) Start with the expected runs value when the batter come to bat*, as provided by the chart, but make it a negative number


 * 2) Add the expected runs value when the plate appearance ends, as provided by the chart


 * 3) Add any runs scored during that time period.

The little asterisk is there because sometimes, the outs/runners posture changes during an at bat. We really want to use the posture right before the batter's plate appearance ends, not when he comes to bat. That is, we should pretend that a runner caught stealing, for example, during a batter's at bat, was actually caught before that batter came to the plate originally.

Henderson's plate appearance is easy. He comes up with none on, none out. The average team will score .531 runs given this posture. Right now, Henderson's situation has a "expected run value" -- or ERV -- of .531. Henderson leaves the situation with none on, none out. Again, a .531, and those zero out. So, his net ERV is zero. Applying step three, well, that homer scored a run, so Rickey gets +1. His "net expected run differential," or "NERD", is -.531+.531+1.000, or simply 1.000.

Lansford comes up at -.531 ERV, and leaves the field with none on, one out. That has a value of .282. No runs have scored, so Lansford gets a .-249 NERD. Yes, his out cost his team a quarter of a run.

McGwire's hit? Well, we already know that he came to the plate with a -.282. But he exits with a .535 situation (one on, one out), or a +.253 NERD.

Canseco's plate apperance starts off at -.535. The walk, though, puts runners on first and second with one out, or .909. This nets him +.374 runs.

Disaster strikes as Dave Henderson grounds into a double play. He entered at a .909, exited at a 0, and nets -.909.

Now, if you add up all those numbers, you'll get junk. IYou'll get .469 -- suggesting that given this inning, we'd "expect" the A's to score about .469 runs. That suggestion is incorrect, and the .469 number is meaningless. The explanation as to why is rather long and hard to articulate, but rest assured, this number is not a problem.

The Albert Pujols Summary
So, let's use NERD to measure Albert Pujols' value over the first series of the season (versus the Phillies. Mostly because it's fun to call Pujols a nerd, but also to demonstrate his value -- maybe.  Note that these are using the averages from 2003, as per above, and will have to be recalculated at the end of the 2006 season to reflect the actual averages.  But they'll probably be rather close.

Game 1
2 outs, none on = 0.109 Walk (2 outs, runner on first) = 0.237 NERD = +0.128

2 outs, none on = 0.109 HR (2 outs, none on, one run in) = 0.109 + 1 NERD = +1.000

1 out, bases loaded = 1.544 Sac fly (2 outs, second and third, one run in) = 0.541 + 1 NERD = -.003

2 outs, runner on first = 0.237 HR (2 outs, none on, two runs in) = 0.109 + 2 NERD = 1.872

0 outs, none on = 0.531 Walk (0 outs, one on) = 0.919 NERD = 0.388

Five Plate Apperances. Total NERD = 3.385

Game 2
1 out, runner on first = 0.535 Strikeout (2 out, runner on first) = 0.237 NERD = -0.298

0 outs, none on = 0.531 Flyout (1 out, none on) = 0.282 NERD = -0.249

0 outs, none on = 0.531 HR (none out, none on, one run in) = 0.531 + 1 NERD = +1.000

0 outs, none on = 0.531 Ground out (1 out, none on) = 0.282 NERD = -0.249

0 outs, none on = 0.531 Walk (0 outs, runner on first) = 0.919 NERD = +0.388

Five Plate Appearances. Total NERD = +0.598

Game 3
1 out, runner on first = 0.535 Pop out (2 out, runner on first) = 0.237 NERD = -0.298

2 outs, runners on first and second = 0.454 Walk (2 out, bases loaded) = 0.797 NERD = +0.343

1 out, runners on first and third = 1.211 Single (1 out, runners on first and second, one run in) = 0.909 + 1 NERD = +0.698

0 outs, none on = 0.531 Pop out (1 out, none on) = 0.282 NERD = -0.249

0 outs, runner on first = 0.919 Single (0 outs, runners on first and second) = 1.551 NERD = +0.632

Five Plate Apperances. Total NERD = 1.126

Series Summary
Adding the three games up, we get a NERD of 5.109.

Is this surprising? Sure. This series was exceptional, Pujols put averaged a NERD of about .333 per plate appearance, or about 200 extra runs over the course of a season. He's certainly not worth that much, but it does show how exceptional this series was.

A Number with Context
If you had to ask me the top reason to use NERD, it's simply this: The number means something -- specifically, it means the number of runs the player is worth over average. How's that, you ask?

Well, if a player comes to bat to lead off an inning, we'd expect his team to score a certain number of runs that inning (in 2003, .531). Anything the player does to improve or reduce that number is simply a deviation from the mean, and can be credited to that player. So, a leadoff homer is worth +1.000 NERD, and, quite literally, is one run more than the team would have had over an average event. This can be applied to any situation, too.

Name Bias
Another advantage of this system is that the name one attributes to the event really does not matter. The April 16, 2006 Angels/Orioles game gives a perfect example. Here's a snapshot of the bottom of the second inning, courtesy of ESPN's play-by-play:


 * Miguel Tejada reached on infield single to second.


 * Jay Gibbons flied out to center.


 * Javy Lopez singled to deep left, M Tejada scored, J Lopez out stretching at second.


 * Jeff Conine struck out looking.

That Lopez at bat must have been pretty incredible. You can picture it as you wish, but my guess would have been that Tejada was off with the pitch, Lopez drove the ball down the line, and got caught in some sort of run-down as Tejada scored. I'd be wrong.

What the above doesn't tell you is that Lopez actually hit the ball over the center field fence. Tejada thought that Darin Erstad came down with the ball, though, and retreated to first. Lopez passed him on the basepaths. By rule, Lopez is out, and is credited for a hit based on the last base he reached safely. Tejada scores as if the ball were the home run it should have been. (Good job, guys!)

But let's use the ERV system to measure Lopez' at bat. He enters the game with a runner on first and one out -- a .535. He exits the game with no one on and two outs -- a .109. That's a net of -.427. He does, however, get a credit for Tejada's run scored, and therefore gets a +.563. "Single, out stretching?" Doesn't matter.

Hidden Values
Another thing we can capture is the otherwise hidden value of things like errors/mistakes and good baserunning. (To whom we attribute these values is a minor problem, but that problem was preexisting and is not worsened by NERD.) Take, again, the Lopez non-home run. We know that the way it played out, he gained +.563. But what if he had not passed Tejada on the basepaths?

Again, he would have entered with a .535 to work off of. However, he would have left the situation with one out instead of two (and again, with no one on base). Instead of a .109, he would have left with a .282. That's a net, a -.253. He would have been credited for two runs -- his and Tejada's -- and should have ended up at +1.747.

Subtracting that from his actual +.563, we see that Lopez's baserunning error cost the team 1.116 expected runs.

Another example can be seen in the Brewers/Mets game from Saturday. With one out, Gabe Gross was on first base. He attempted to steal second, but Darren Oliver made a move to pick him off. However, Carlos Delgado's throw to second went wide of the bag and rolled into the outfield; Gross advanced to third safely.

How bad was Delgado's error? Well, let's assume that he had executed properly. The result would have been two out, none on, or a .109 ERV. The actual result -- one out, runner on third -- has an ERV of 1.032. Delgado's faux pas cost the Mets .923 expected runs.

Back to Name Bias
And once again, we see that what you call the events doesn't really matter. Anecodatally, let's assume that Gross had been caught stealing. He entered the game with none on, one out. He exited with none on, two out. Nothing happened in between. Does it matter that he was "caught stealing"? What if he had simply popped out, or K'd?

Let's do the math.

Gross's hit gains him +.253 (the difference between none on, one out and runner on first, one out). But Gross has to be credited with a negative value as well. He was on first with one out -- an ERV of .535. He was caught stealing (but for the error), which would have resulted in a situation with an ERV of .109. Therefore, he should get -.426 associated to his bottom line. The NERD is -.173.

Had he struck out, he would have entered with a .282 posture, and exited with a .109 posture, or -.173. Same result. It works anecdotally and mathematically: if you do the math out, you'll the reason fo this is that the .535 values simply cancel out as he is erased on the basepaths.

Problems
As with any statistical yardstick, NERD has a few problems. Let's draw them up.

Value Allocation
The biggest problem is that we can't always properly allocate the change in baserunner posture. In other words, even though we may go from "runner on first" to "runners on first and third," it's hard to say who should get the credit for that change.

Taking an example, let's say that Derrick Lee is on first, and Jacque Jones singles cleanly to right. Lee advances to third. There are no outs.

What we know for certain:
 * When Jones came up, the ERV of the situation was .919.
 * Regardless of who the runner or relevant fielders are, the worst-case scenario is runners on first and second, no outs. This has an ERV of 1.515.
 * The ERV of the actual result -- first and third, no outs -- is 1.869.

Therefore, in the very least, Jones should be credit with +.596 NERD. But the problem is that Lee went to third, not second, and we don't know how to account for that. It is probably a mix of where Jones hit the ball (credit to Jones), Lee's speed (credit to Lee), and/or the right-fielder's arm and ability to get to the ball (debit to the defense). The NERD created from Lee advancing to third is .354.

As a rule of thumb, my instinct is to assign ERV to the most independent actor, and in this case, it'd be Lee. Basically, this arises from an aversion to assigning ERV to a defensive player unless otherwise necessary, and because if Lee were thrown out at third, he should certainly bear the brunt of the NERD drop-off from a first and second, no out situation to the runner on first, one out posture. That is, Jones should still get the +.596. Unfortunately, this is difficult, so for now, I'm assigning everything to the batter.

Playing for One Run
This is a minor quibble, but should be raised.

In general, baseball is a pretty clean sport, statistically speaking, because the marginal value of each additional run is pretty close to 1. That is, one run is worth 1 unit more than no runs; seven runs is worth 1 unit more than 6, etc.

This is not true, however, when the game gets late -- especially when we're dealing with the bottom of the 9th. Say, for example, that the home team is up, the game is tied, and we're in the final frame. The second run has a marginal value of 0 -- it simply doesn't matter.

Therefore, the matrix above has little use. Here's a quick example.

If the leadoff runner gets on, it is not clear that the next batter should swing away instead of sacrificing the runner over. Yes, "runner on first, no outs" has an ERV of .919, and "runner on second, one out" has an ERV of .706, for a loss of .213. But that measure the number of expected runs lost; it does not tell us whether the odds of at least run scoring has changed. And that's all we care about. Sure, the sacrifice may have cost us a big inning, but has it gained us the ability to score one run? When one run is all you need, the big inning is a non-issue.

Not Enough Context
Classic sabermetrics often limit the context, isolating the event itself. (That is, a homer is a homer, regardless of runners on, outs, etc.) I am adding a great deal of context, but I am only adding little bits. I'm neglecting the score, the inning, etc. I'm also discounting intangible stuff, like whether the pitcher seemed distracted by the runner on first, or whether a LOOGY was brought in to face the batter, etc.

Really Tedious Bookkeeping
There's no system in place to easily keep track of the change in posture. Right now, one has to go through each at bat to determine the NERD for that player. That's no fun and takes a lot of time.

Conclusion
NERD, beyond it's hyper-witty name, gives us a view of a player's value without having to quantify the actual event in question -- all that is done for us. I think it works.

How's that for a conclusion?

Date
Mon 04/17/06, 9:14 am EST