[
## Measuring AccuracyHow do all the offensive statistics measure up in an accuracy comparison? To answer that question, I put together a study to examine how well each statistic relates to run scoring. Unfortunately, setting up the study wasn't a straightforward proposition. That's because there isn't a general agreement among sabermetricians about how accuracy is best calculated. As a matter of fact, the two greatest minds in sabermetrics disagree about this. Bill James uses standard deviation (really Root Mean Square Error or RMSE) and absolute errors to compare accuracy. Pete Palmer uses regression equations to answer the question. How did I decide to answer the question? Rather than restrict my study to either Mr. James' or Mr. Palmer's accuracy standard, I decided to study the matter employing both of their standards. By doing so, I figured, anyone looking it over could chose whichever standard they prefer. Because this is a baseball book and not a math book, I won't take up a bunch of time explaining the entire process in detail. If you don't already have an understanding of standard deviations, regression equations, correlation and the like, I regret to tell you that you won't find an explanation of them here. The reasons are simple. First, there aren't an infinite number of pages that this book can hold. If I had to explain every math term and concept here, it would have to be at the expense of other material. Other material that is a heck of a lot more interesting to read than mathematical techniques. Second, I'm not properly qualified to teach them to you. I'm a baseball
analyst, not a math professor. Rather than trying to learn it from me, you'd be
better off taking a good introductory college statistics class. Of course, you
might not have the time or desire to do that. If that's true, but you'd still
like to get a basic understanding of the methodology, I suggest you pick-up a
copy of ## How did I generate run totals for the different stats?As I said in the “Why Do We Need
Another Player Evaluation Method” essay, rate statistics need an
additional step to express the measure in runs. How did I do this? I divided
the team measure by the league average and then multiplied the result by the
league runs per out Here's an example using batting average and the 1955 Boston Red Sox. In 1955, the Red Sox hit .264 that season and consumed 4145 outs. The league averaged a .258 batting average and scored .170 runs per out. To generate the Red Sox run total, I put all the numbers together: .264/.258 x .170 x 4145=721.037 Runs. Using this method, I produced run estimates for every stat included in the study. This was done for every team from 1955-1997 (1002 team seasons). For the statistics that are already expressed in runs, things were much simpler to calculate. I directly compared estimated runs with actual runs. ## What numbers did I calculate?I calculated figures that correspond to the methodologies used by Bill James and Pete Palmer. ## Bill James MethodologyI applied the standard that Bill James used in his
## Pete Palmer MethodologyIn
## What are the numbers?## Rate Stat Scorecard## Run Stat ScorecardAlthough most of the statistics'
abbreviations are defined elsewhere, a few aren't. Included are a couple of RC
spin-offs: ## What should you make of these numbers?I'd prefer that you draw your own conclusions, but since I realize that you may want my opinion, here's a few things to keep in mind. (Feel free to stop reading here, if my opinion doesn't really interest you. :) - XR was developed with the same data used in the validation study. This means XR gets something of a helping hand. Because of that I also supply a few other comparisons. The first is a decade-by-decade comparison (with only RMSE and SE regr). The numbers indicate that XR holds its accuracy advantage across different periods.
## Decade Match-up- On the team level, there really isn't that much of a difference between most of the run estimation methods. Although XR comes out on top, the gap isn't earth shattering. Of course, as Jay and I pointed out in the “Deciphering the New Runs Created” essay, the numbers generated on the team level don't necessarily equal the numbers used for player comparisons. This means that for most methods, the accuracy on the player level is different than the accuracy on the team level. This is true even for Palmer's Batting Runs. Since BR contains a term for Outs On Base for team calculations, but excludes the term for player calculations, accuracy for individual players is worse than this study indicates.
- EQA's accuracy lies in the eyes of the beholder. If you figure things out the way Clay Davenport tells you to, it ranks at the top of the rankings; if you don't, it doesn't. As a matter of fact, if you compare EQA to Grab, you won't find much difference. This indicates to me that EQA isn't really much different than OPS.
- OPS doesn't possess the accuracy that Pete Palmer's study implies. David
Grabiner pointed out a possible explanation. David explained that the reason
OPS is considered to have better accuracy than my study shows is that Palmer's
OPS accuracy claims are based on OPS's correlation with OTS. What this means is
that if OPS had a perfect linear relationship with OTS, the accuracy of OPS
would be the same as the accuracy of OTS. I investigated this and found OPS's
correlation with OTS was indeed very high (.998556). I then took a look at
David's suggested modification 1.2*OBP+SLG (GRAB). GRAB had a correlation
coefficient (R) figure of .999173. Although this correlation figure is pretty
high, it's still not a perfect correlation. To generate a better correlation
figure, a much more involved formula is required. David sent me a formula [
OTS=.333*.400 + (OBP-.333)*.400 + .333*(SLG-.400) + (OBP-.333)*(SLG-.400)
-.333*.400 + (1.2*OBP+SLG)/3 + (OBP-.333)*(SLG-.400) ] which produces an almost
perfect correlation figure (1.00000000).
You might be thinking, "OK, what does this really mean?" Well, what it means is that although OPS is a very good quick and dirty method, it's not as accurate as some of its proponents claim. So if you want to get a good quick estimate, OPS works fine; if you want a more accurate assessment, you're better off using one of the other methods. - My study shows that with the proper selection of event values, a linear formula is more accurate for the different run scoring periods.
## Linear vs. non-Linear Match-up- Although the other formulas move around in ranking, XR stays near
the top for both run scoring environments. This finding directly contradicts
James' assertion in the
*Historical Baseball Abstract*that linear formulas cannot accurately estimate runs because "run scoring is not linear." Although I agree with Mr. James that run scoring is not linear, I don't believe that this fact prevents the use of a non-linear formula to estimate runs created. That's because as long as the frequency and distribution of events is pretty stable (and it has been) run scoring can be looked at as linearthe more positive events that a team packs into a game, the more runs it scores. My numbers confirm this assertion. - XR is designed for use from 1955 onward. Although you can use the formula for seasons prior to 1955, I haven't confirmed its validity for that period. Having said that, I've already begun work on creating other versions for seasons prior to 1955.
## Closing thoughtsAs I mentioned at the top of the article, I'm a baseball analyst, not a math professor. With that in mind, I'd like to encourage any and all input from all the math experts reading this article. Although I'm confident that I've got a good handle on the topic, I'm open to other ideas about how to examine the accuracy question. Also, if any of you math experts have written (or can write) something that explains all the underlying math in a simple, easy to understand manner, and if you're willing to share your knowledge, please contact me. I'd really like to include a nice explanation in future versions of this article. Speaking of the web, I encourage everyone with Internet access to check out the web version of this article. Since my web site does not suffer from the space constraints that the printed word does, I'm free to include more, more, more data. You can find the webified version at http://www.baseballstuff.com/btf/scholars/furtado/accuracy.htm. [ In closing, I suggest that anyone interested in other looks at the same question, read John Jarvis' "A Survey of Baseball Player Performance Evaluation Measures" and Alidad Tash's "Win and Run Prediction in Major League Baseball". Back to the top of page | BTF Homepage | BBBA Web Site |