On June 6, 2014, the Toronto Blue Jays record stood at
38-24. They had a 6 game lead in the division, had just won 15 of 17 games, and
were the only team in the AL East with a positive run differential (+53).
According to the playoff odds on the mlb.com standings page, powered by
Baseball Prospectus, the Blue Jays had an 86% chance of making the playoffs.

Maybe it was the cynical Jays fan in me who had been sitting
witness to a 20-year playoff drought, but I couldn’t help look at those playoff
odds with a high degree of skepticism. The Jays had just finished an epic
winning streak, everything Encarnacion swung at was going out of the park, the
team was still relatively healthy, and the rotation consisted of Stroman who
had two career starts, Hutchinson coming off a bad injury with no real track
record, and JA Happ who is JA Happ. Everything to that point in the season that
could have gone right had gone right. Buehrle was even 10-1 and if there’s one
thing that Buehrle has shown over his career is that his final numbers rarely
change.

The factor that scared me the most though was that there
were still 100 games left in the season. We’ve seen enough division winners
lose 6 game leads in September in recent years, so I’m to believe that there’s
only a 14% chance that the Jays get passed with almost 4 months left? Forget
the Jays, it seemed crazy to me that any team could have such high playoff odds
with so many games left to play.

So with the season now over, it seems prudent to review the
playoff probabilities and how they compared to the actual resulting playoff
teams. First let’s quickly review how Baseball Prospectus determines the playoff odds.

Firstly, a type of adjusted Pythagorean winning percentage
is determined for each team. This is based on three factors: 1) Pythagorean
record based on runs scored and allowed, 2) Expected record based on normalized
runs scored and allowed, and 3) Expected record based on normalized runs scored
and allowed adjusted to strength of opposition. This record is then regressed
toward the mean.

Each game is then “simulated” one million times based on the
calculated winning percentages for each team. For each game the winning
percentage of each team is given a random adjustment to account for day-to-day
variations, as well as a slight adjustment depending on whether it is the home
or away team. The expected winning percentage for a game is determined using
the log5 method. For each simulated game, a uniform random number between 0 and
1 is chosen. If the number is less than the winning probability for Team A,
then Team A wins the game, if it is greater, than Team B wins the game. After
each of the one million simulations, the number of times that a team makes the
playoffs is divided by one million to determine the playoff probability. The
simulations are repeated each day for all future games. Further details are available here and here.

I gathered all the probabilities for every team for every
day for the past three seasons. Although this isn’t a completely valid
assumption, I considered the probabilities calculated for each day as an
independent event. This resulted in 16935 samples. I then grouped the samples
into 5% bins and determined how many of the samples corresponded to teams that
made the playoffs. If the calculated probabilities are correct we would expect
the percentage of teams in each bin that made the playoffs to equal the
percentage value corresponding to that bin. The results are shown in the
following graph, with the expected height of each bar equal to the black line.

The results actually fared better than I expected, with a
6.5% root mean squared error (RMSE). My suspicions were correct that the model
tended to be overconfident, in that not as many teams in the 80% range make the
playoffs as predicted and similarly more teams in the 20% range do make the
playoffs than predicted, however not to the extreme degree that I would have
thought.

It’s helpful to also examine different parts of the season.
The results for the first half of the season are shown in the following figure.
This again shows that the model is overconfident and would benefit from pushing
more teams toward the 50% mark. However it is still quite accurate given how
many games are left to play and the vast changes that occur with so many games
remaining. The RMSE is only slightly higher than for the full season results,
at 7.2%.

The results for the second half of the season are actually
worse, 8.6% RMSE. This is surprising since fewer games need to be simulated so
there is less left up to chance. However that also leaves less room for error
and less time for teams to regress to the mean. There were seemingly a few
teams that the model should have been more confident in making the playoffs in
the 70% range.

It should be noted that this is still a fairly limited
sample. The 16935 samples are really only 90 teams, of which only 30 made the
playoffs. However I’m still fairly impressed by the performance of the model,
especially given that it does not make any predictions about team activity. The
same winning percentage is used as the base for calculations (before the random
variation is applied) for every remaining game in the season. This means the
effects of the pitching rotation are ignored. The Mariners would effectively
have the same odds of winning any game that Felix Hernandez starts as they
would in any game that Roenis Elias starts. This assumption is difficult to
avoid since rotations can be extremely difficult to predict more than five days
in advance. However the random variation applied to each game could become more
structured to possibly account for this.

Similarly injury concerns or future trades aren’t factored
into the model. Neither is the chance that a team out of the running will use
most of its 40 man roster in September and drop its expected winning percentage
further. The expected run values can also be difficult to calculate after all
the offseason activity, and for that reason the playoff odds are kept constant
for the first month of the season.

After analyzing the odds though, I’ll probably take them a
little more seriously next season, at least before I can perform another
analysis with an added year of samples. After all large division leads early in
the season still mean something. On that same June 6

^{th}, the Giants had a 9 game division lead and a 98% playoff probability. They may have lost the division lead, but they still clinched a wild card berth with 3 games left to play.
## No comments:

## Post a Comment