Wednesday, October 1, 2014

Examining Playoff Probabilities

On June 6, 2014, the Toronto Blue Jays record stood at 38-24. They had a 6 game lead in the division, had just won 15 of 17 games, and were the only team in the AL East with a positive run differential (+53). According to the playoff odds on the standings page, powered by Baseball Prospectus, the Blue Jays had an 86% chance of making the playoffs.

Maybe it was the cynical Jays fan in me who had been sitting witness to a 20-year playoff drought, but I couldn’t help look at those playoff odds with a high degree of skepticism. The Jays had just finished an epic winning streak, everything Encarnacion swung at was going out of the park, the team was still relatively healthy, and the rotation consisted of Stroman who had two career starts, Hutchinson coming off a bad injury with no real track record, and JA Happ who is JA Happ. Everything to that point in the season that could have gone right had gone right. Buehrle was even 10-1 and if there’s one thing that Buehrle has shown over his career is that his final numbers rarely change.

The factor that scared me the most though was that there were still 100 games left in the season. We’ve seen enough division winners lose 6 game leads in September in recent years, so I’m to believe that there’s only a 14% chance that the Jays get passed with almost 4 months left? Forget the Jays, it seemed crazy to me that any team could have such high playoff odds with so many games left to play.

So with the season now over, it seems prudent to review the playoff probabilities and how they compared to the actual resulting playoff teams. First let’s quickly review how Baseball Prospectus determines the playoff odds.

Firstly, a type of adjusted Pythagorean winning percentage is determined for each team. This is based on three factors: 1) Pythagorean record based on runs scored and allowed, 2) Expected record based on normalized runs scored and allowed, and 3) Expected record based on normalized runs scored and allowed adjusted to strength of opposition. This record is then regressed toward the mean.

Each game is then “simulated” one million times based on the calculated winning percentages for each team. For each game the winning percentage of each team is given a random adjustment to account for day-to-day variations, as well as a slight adjustment depending on whether it is the home or away team. The expected winning percentage for a game is determined using the log5 method. For each simulated game, a uniform random number between 0 and 1 is chosen. If the number is less than the winning probability for Team A, then Team A wins the game, if it is greater, than Team B wins the game. After each of the one million simulations, the number of times that a team makes the playoffs is divided by one million to determine the playoff probability. The simulations are repeated each day for all future games. Further details are available here and here.

I gathered all the probabilities for every team for every day for the past three seasons. Although this isn’t a completely valid assumption, I considered the probabilities calculated for each day as an independent event. This resulted in 16935 samples. I then grouped the samples into 5% bins and determined how many of the samples corresponded to teams that made the playoffs. If the calculated probabilities are correct we would expect the percentage of teams in each bin that made the playoffs to equal the percentage value corresponding to that bin. The results are shown in the following graph, with the expected height of each bar equal to the black line.

The results actually fared better than I expected, with a 6.5% root mean squared error (RMSE). My suspicions were correct that the model tended to be overconfident, in that not as many teams in the 80% range make the playoffs as predicted and similarly more teams in the 20% range do make the playoffs than predicted, however not to the extreme degree that I would have thought.

It’s helpful to also examine different parts of the season. The results for the first half of the season are shown in the following figure. This again shows that the model is overconfident and would benefit from pushing more teams toward the 50% mark. However it is still quite accurate given how many games are left to play and the vast changes that occur with so many games remaining. The RMSE is only slightly higher than for the full season results, at 7.2%.

The results for the second half of the season are actually worse, 8.6% RMSE. This is surprising since fewer games need to be simulated so there is less left up to chance. However that also leaves less room for error and less time for teams to regress to the mean. There were seemingly a few teams that the model should have been more confident in making the playoffs in the 70% range.

It should be noted that this is still a fairly limited sample. The 16935 samples are really only 90 teams, of which only 30 made the playoffs. However I’m still fairly impressed by the performance of the model, especially given that it does not make any predictions about team activity. The same winning percentage is used as the base for calculations (before the random variation is applied) for every remaining game in the season. This means the effects of the pitching rotation are ignored. The Mariners would effectively have the same odds of winning any game that Felix Hernandez starts as they would in any game that Roenis Elias starts. This assumption is difficult to avoid since rotations can be extremely difficult to predict more than five days in advance. However the random variation applied to each game could become more structured to possibly account for this.

Similarly injury concerns or future trades aren’t factored into the model. Neither is the chance that a team out of the running will use most of its 40 man roster in September and drop its expected winning percentage further. The expected run values can also be difficult to calculate after all the offseason activity, and for that reason the playoff odds are kept constant for the first month of the season.

After analyzing the odds though, I’ll probably take them a little more seriously next season, at least before I can perform another analysis with an added year of samples. After all large division leads early in the season still mean something. On that same June 6th, the Giants had a 9 game division lead and a 98% playoff probability. They may have lost the division lead, but they still clinched a wild card berth with 3 games left to play.

No comments:

Post a Comment