Hi. Have you played fantasy baseball? How many hours have you poured into it only to come up short? Wanna put in a few more hours to create a system instead of relying on Yahoo rankings?
This is a thought I had, and these are my ideas on possible steps to gain an advantage in a fantasy baseball league. To be more specific, I’ll discuss how to create $ values to use for an auction league fantasy draft. Credit where it’s due: fangraphs, tom tango’s site.
Step 1: using a projection system (protip: use the pros’)
First off, we need projections for how all the players in your draft are going to do next season. Sure you could do them by gut, or with some sort of weighting… but people much smarter than myself have already done this. They’ve used massive amounts of data and math and they share their results with the public. There’s CHONE, Cairo, ZiPS, Marcel, Bill James, Fantistics, PECOTA, RotoChamp and a few others. Which one to choose though? They’re all respected, and have varying levels of success as projections. Why not choose all of them?
So instead of taking 1 projection, let’s take an average of 9 different ones. This lessens the bias some projections may have against some types of players. For example, CHONE often minimizes error pretty well but has it setbacks. Since it draws only upon major-league results, it may undervalue young players in terms of both performance and playing time.
Unfortunately I can’t find the google document of this data, it’s been taken down. Dang. But here is what I ended up with after weeding out the players unlikely to be drafted in my league. Not only does it have projections but it also shows how I calculate the values in this post.
Step 2: calculating z-scores
The goal is to find out a players contribution in each statistic by seeing how many standard deviations they are from the average. We’ll discuss the hitters, then the pitchers.
With a spreadsheet (like the above) it’s not hard to calculate the average and standard deviation of each stat for all hitters. First you need to know the average in each of the 5 categories. Then the standard deviation.
Excel will calculate the standard deviation and average of SB in the i4-i152 range if we type
Let’s stick that in a safe place, the header.
Say the projection has Jacoby Ellsbury stealing 55 bases, and the average player steals 12. Out of all players, some steal very little and some steal a lot, so there’s a wide distribution of SB. In other words this data set has a large variance in SB. But all we need is the standard deviation of SB, the variance square rooted.
So the standard deviation is about 12. With a large enough sample size, only 1% of any SB totals should be 3 or more standard deviations from the mean. Ellsbury is 3.5 SDs above average in SB, showing how he can win you that category. This contribution of 3.5 is called a z-score.
We then do the same technique for 7 of the 10 stats.
Pujols RBI z-score = (Pujols RBI-avg RBI)/stdev( HR)
= (114-76)/(18) = 2.4
If you play around with the spreadsheet and are comfortable with Excel you should see how to quickly calculate each of the 150 hitter’s 5 z-scores. You might be wondering what xH is, that’s covered in the next step.
Step 3: calculating rate-based z-scores
We still have to adjust rate stats such as WHIP, ERA and AVG. This way Roy Halladay’s 2.8 ERA over 230 innings has a greater impact than a relief pitcher’s 2.8 ERA. To find his true value, you need to know how many ER he gives up, and how many ER the league average pitcher would give up if they pitcher the same number of innings as Roy. So to find Roy’s expected ER over average
= (Roy’s IP) (avg ERA – Roy’s ERA)/9
= 230*((3.68-2.78)/9) = 23 xER (ER better than the average pitcher)
It’s a similar calculations for the other statistics
xH = Pujols AB * ( Pujols avg – all hitters’ avg)
xWHIP = Roy’s IP * (Roy’s WHIP – avg WHIP)
After that you should have 5 z-scores for each hitter, and 5 z-score for each pitcher. Simply add a hitters 5 z-scores together to get his value, the same for goes for pitchers. I will refer to each stat as the player’s raw value. You can rank players by their position, or predict which players will have the highest o-rank next year. Woo!
Now, it’s important to note that the weights for each z-score don’t have to be equal. Some would prefer putting less of an emphasis on, say, AVG. For example, it’s a lot easier to predict a player’s HR range than their AVG range. Some players like Adam Dunn may alternate between hitting .230 and .270. If you’re playing in a head-to-head league, this is especially true, as AVG is much less predictable over 1 week. Even a dominant AVG team can lose in AVG in a short series, that’s baseball. AVG, wins and ERA are the most vulnerable to bad luck. For this reason, I might recommend multiplying those players’ z-scores in ERA, W, AVG by .8 or so. This lessens the weight that those stats have towards player’s overall value. Pitchers are able to control WHIP, whereas ERA/W/SV can vary depending on good/bad luck. It would be interesting to know the optimal weights to place, but that’s a question for another time.
Next time: turning value into $, finding undervalued players