Since the subject of MLEs has come up quite a bit [in the baseball newsgroups] in the last week, I've decided to print the basic concepts on how to come up with MLEs. There are a few variations of the MLEs, depending on whether you use 1 or 3 year park effects or whether you use actual minor league park effects rather than the estimation of minor league park effects (when Bill James proposed this system in the 1985 Baseball Astract, minor league park effects were difficult to come by due to the unavailablity of home/road data for minor leaguers). However, the differences are small, which is OK. It's what the numbers mean that's important, not what they exactly say. If one person finds the MLE of Joe Schmo to be 253/330/418 and another person finds it to be 258/327/425, they're still saying the same thing. One thing to remember is that MLEs are not a prediction of what the player will do, just a translation of what the major league equivalence of what the player actually did is. This is useful for predictions however, because like, major league statistics, MLEs have strong predictive value. As strong as major league statistics (which was the goal of this). Bill James stated that MLEs were the most important concept that he had ever come up with. The normal season-to-season fluctuation in batting average at the major league level is 25 points. I figured the season-to-season changes for every major league player who has had five years or more of 300 at bats, and the average annual change in batting average was between .024 and .025 [...] Note that this has been tested for batting average, slugging percentage, and on-base percentage over the last few years and the methods still work as well as they did in the early 1980s. As I go along the simpler version of theprocess that I choose to use (which is usually very close to what STATS comes up with, they don't tell exactly what M factors they use so it's hard to reproduce exactly), I will use two players from different parks in two different leagues: Danny Clyburn and Paul Konerko. First let's look at their raw minor league statistics for 1997. Normally, I come up with also park-adjust the 2B/3B/HR by individual factors rather than one single factor, but it doesn't improve the result all that much in most instances as it really doesn't effect the qualitative results, just the quantitative (I'm more interested in qualitative results). Here are the raw statistics for the two players I'm using to demonstrate MLEs: Danny Clyburn and Paul Konerko Player AB R H 2B 3B HR RBI BB SO Clyburn 520 91 156 33 5 20 76 53 107 Konerko 483 97 156 31 1 37 127 64 61 Player BA OBP SLG Clyburn .300 .372 .498 Konerko .323 .407 .621 The first thing we need to do is adjust for the level of league and park. Some players play in hitters' parks/leagues, some play in pitchers' parks/leagues. First, Clyburn. Clyburn played in the International League last year in which 9.597 runs per game were scored. Last year, Clyburn played in Rochester, in which run-scoring was deflated by approximately 4% over the last 3 years (unfortunately, I don't have 1 year stats available). So, we'd expect a game between two league average teams at Rochester to score (1.02+[.10*.02])*9.597 runs to score. When we calculate it, we end up finding that Clyburn's production came in a 9.81 run per game context. In the American League last year 9.862 runs per game. The (.10*.02) is used to represent that you can't just use road totals because a league- average player would still get to play some games in Rochester. It's not crucial and if you leave it out, it won't change things that much, so if you prefer, you can find out this number by simply 1.02*9.597 Clyburn: 9.808/9.862 = 1/1.006 We'll call this number PL for park/league adjustment. Clyburn has a PL ratio of 1.006 This indicates that Clyburn's raw statistics won't take a nose dive due to his home park or his league. Now, we do the same for Konerko. The PCL last year scored 11.532 runs per game. Albuquerque is a pretty darn good hitters' park and increased run scoring by 17%. Once we do what we did for Clyburn, we end up with Konerko's stats being produced in a 12.616. Now, let's match up Clyburn and Konerko's PL ratios side by side. (Konerko 12.616/9.862 = 1/.782 PL Ratios Clyburn 1/1.006 Konerko 1/.782 It's clear that since Konerko played in a park in which runs were easy to come by, that Konerko's raw stats will suffer much more because of the park differences. Next, we have to adjust for the calibre of competition. A player ordinarily loses about 18% of his offensive ability relative to the league in moving from AAA to the majors. When we adjust for this, we get "m" Clyburn Konerko 1.006 0.782 *0.82 *0.82 ------- ------- 0.825 0.641 This tells us that Clyburn, upon moving to the major leagues, will probably retain about 83% of his offensive punch while Konerko will retain about 64%. The other thing we need to find is "M". It's merely the square root of "m". m M Clyburn 0.825 0.908 Konerko 0.641 0.801 Now, we can start to adjust. RAW Player AB R H 2B 3B HR RBI BB SO Clyburn 520 91 156 33 5 20 76 53 107 Konerko 483 97 156 31 1 37 127 64 61 MLE Player AB R H 2B 3B HR RBI BB SO Clyburn Konerko Now, all we need to get is the park factors for the major league stadiums. To avoid getting too technical, if a stadium has has a park factor of 104 for something, use the multiplier 1.02 rather than 1.04 (not particularly accurate, but close enough). Too be consistent, let's use three year factors again. Here are the multipliers. I'm gonna refer to park multipliers as PM. Which PM to use is pretty self explanatory. R H 2B 3B HR BB SO BAL 0.995 0.98 0.985 0.805 1.05 0.985 1.025 LA 0.895 0.88 0.865 0.745 0.89 0.97 0.995 First, we need to find the MLE hits. To get it, we multiply minor league hits * .98 * M * PM Then, we need to find the MLE Doubles. To get it, we multiply minor league doubles * M * PM Then, the MLE triples. To get them, we multiply minor league triples * m * .85 * PM Then, the MLE homers. To get them, we multiply minor league homers * m * PM Then, for the RBI and R, we multiply them each by m and then by the PM. For walks, we do minor league walks * m * PM For strikeouts, we simply do minor league strikeouts * 1.05 * PM After we do that, we get this: MLE Player AB R H 2B 3B HR RBI BB SO Clyburn 75 136 30 3 17 62 52 43 Konerko 55 108 21 0 21 72 40 64 Now for the At-Bats, we have to do something a little differently. First, we need to know how many outs both players made in the minors this year. For this, we just need to find the AB-H for Clyburn and Konerko. Clyburn: 520 - 156 = 364 Konerko: 483 - 156 = 327 So then, to get the MLE At Bats, we just add the amount of outs they really made to the amount of hits. Clyburn: 136 + 364 = 500 Konerko: 108 + 327 = 435 So finally, after using the complete stats to calculate batting average, on-base percentage, and slugging percentages, we end up with: MLE Player AB R H 2B 3B HR RBI BB SO Clyburn 500 75 136 30 3 17 62 52 115 Konerko 435 55 108 21 0 21 72 40 64 Player BA OBP SLG Clyburn .272 .343 .446 Konerko .248 .312 .441 Neither of these differ much from STATS' MLEs. Keep in mind that Konerko would still be easily the better long-term bet. After all, he put up that MLE at age 21 last year while Clyburn was 23. A year or two makes a big difference in minor league development. My advice is to set up a spreadsheet. :-) To send comments, critiques, criticisms e-mail Dan Szymborski at czerny@baseballstuff.com. Back to the top of page | BTF Homepage | Essays |