Tuesday, October 21, 2014

An Exciting World Series

I hate this World Series. I hate the Giants who are trying to win their third championship in five years even though almost the exact same team was unable to make the playoffs in those other two years. I hate any sentence that involves the phrase “even number year.” I hate that the Giants’ best hitter is their catcher. Not that Buster Posey isn’t amazing, but any team that can’t get at least a better AVG or SLG from any other position should be ashamed of itself. I hate how even with Matt Cain injured and Tim Lincecum far past the point of being a viable starter that the Giants were able to still find enough quality starts to make it to the postseason.

As much as I hate the Giants though, it’s nothing compared to how much I despise the Royals. Sure, this all started because I didn’t want the Blue Jays to have the longest postseason drought in the league, but with every unlikely walkoff win I find myself hating this team a little more. Simply put, this team has no business being here, and they’re stealing an opportunity from a much more deserving team.

David Schoenfield has explained how this is basically the worst World Series matchup of all time. The Royals were 9th in runs scored, 4th in runs against, and had only the 7th best win-expectancy in the American League. I wish I could attribute their overachievement to effective managing, but 8 playoff games have been more than enough to disprove that. For as pathetic they’ve been as a team though, individually they’ve been even worse.

No player on the Royals hit 20 home runs. Only 3 players even managed to hit 10 balls out of the park. No American League pennant winner has done even one of those since the 1959 White Sox (excluding shortened seasons). In fact only the 1982 Cardinals have won the pennant without players hitting for such power since the mound was lowered in 1969.

Highest individual RBI total on the Royals: Alex Gordon’s 74. Last AL pennant winner to not have a player reach 75 RBIs in a non-shortened season: The 1916 Boston Red Sox. That includes a lot of teams playing 154-game seasons.

The Royals don’t even walk. Alex Gordon led the team with 65 walks. Only two teams (2010 Rangers, 1990 Reds) were able to win a pennant without hitting that threshold since the mound was lowered. Alex Gordon also led the team with a .432 slugging percentage. Another feat unaccomplished by an AL pennant winner since the mound was lowered, and occurred only once in the NL (1973 Mets).

If there are no batting stars, maybe at least there are some pitchers to attract people to watch! Wins may be an outdated stat but the Royals are only the second AL pennant winner (2008 Rays) to not have a 15-game winner since the mound was lowered. They do have that amazing bullpen though, and if the game reaches the 7th inning before I fall asleep I’ll be excited to watch them.

I really wanted to turn this into a referendum on the current playoff format, unfortunately that’s tough to do. The Giants wouldn’t have made the playoffs under the previous system, but the Royals would have been the traditional Wild Card team. Expanding the first round to seven games wouldn’t have even done much as the Royals would have been up 3-0 going into the fourth game of the ALDS at home against the Angels.

Instead I’ll write about what it means to have an exciting playoffs. The games this year have been close, competitive, with frequent extra-inning affairs and walkoff hits. Many of them from unlikely sources such as Kolten Wong or Travis Ishikawa. The underdog mentality, focused on both the players and the teams, seems to bring fans to their feet and cheer. But is this really what we should be celebrating during the playoffs?

MLB plays a 162-game season, precisely because the game-to-game variability in baseball requires such a large sample size before the good and bad teams can be differentiated. Unfortunately the playoffs present the exact opposite scenario. A one game series, followed by a best-of-five series, followed by two best-of-seven series, with a ton of offdays squeezed in don’t truly capture the seasonal flow of baseball. That can make them exciting because it creates a situation where anyone can win. The problem is that when anyone can win, anyone can win. This year is a prime example.

Baseball is exciting already. There’s no need to manufacture situations to create more excitement. When that happens we end up with undeserving winners. The underdog story is fun, but it leaves a sour taste in your mouth. In the future when we look back at the narrative and storylines of the 2014 season, at no point until the final pages will we ever think read either the Royals or Giants are the best team. That isn’t a fun twist ending, it’s a bait-and-switch.

I don’t think that we need to go back to having only 1 or 2 teams make the playoffs in each league. Different schedules and injuries aren’t necessarily well reflected in the standings. The playoff structure could use revisions though to ensure that the best teams are given their earned advantages. I’m not against the excitement of an underdog, but I do find it more exciting watching the best teams face off against each other.

Then again, if the Royals are going to win 8 straight games none of this matters anyway. 

Wednesday, October 1, 2014

Examining Playoff Probabilities

On June 6, 2014, the Toronto Blue Jays record stood at 38-24. They had a 6 game lead in the division, had just won 15 of 17 games, and were the only team in the AL East with a positive run differential (+53). According to the playoff odds on the mlb.com standings page, powered by Baseball Prospectus, the Blue Jays had an 86% chance of making the playoffs.

Maybe it was the cynical Jays fan in me who had been sitting witness to a 20-year playoff drought, but I couldn’t help look at those playoff odds with a high degree of skepticism. The Jays had just finished an epic winning streak, everything Encarnacion swung at was going out of the park, the team was still relatively healthy, and the rotation consisted of Stroman who had two career starts, Hutchinson coming off a bad injury with no real track record, and JA Happ who is JA Happ. Everything to that point in the season that could have gone right had gone right. Buehrle was even 10-1 and if there’s one thing that Buehrle has shown over his career is that his final numbers rarely change.

The factor that scared me the most though was that there were still 100 games left in the season. We’ve seen enough division winners lose 6 game leads in September in recent years, so I’m to believe that there’s only a 14% chance that the Jays get passed with almost 4 months left? Forget the Jays, it seemed crazy to me that any team could have such high playoff odds with so many games left to play.

So with the season now over, it seems prudent to review the playoff probabilities and how they compared to the actual resulting playoff teams. First let’s quickly review how Baseball Prospectus determines the playoff odds.

Firstly, a type of adjusted Pythagorean winning percentage is determined for each team. This is based on three factors: 1) Pythagorean record based on runs scored and allowed, 2) Expected record based on normalized runs scored and allowed, and 3) Expected record based on normalized runs scored and allowed adjusted to strength of opposition. This record is then regressed toward the mean.

Each game is then “simulated” one million times based on the calculated winning percentages for each team. For each game the winning percentage of each team is given a random adjustment to account for day-to-day variations, as well as a slight adjustment depending on whether it is the home or away team. The expected winning percentage for a game is determined using the log5 method. For each simulated game, a uniform random number between 0 and 1 is chosen. If the number is less than the winning probability for Team A, then Team A wins the game, if it is greater, than Team B wins the game. After each of the one million simulations, the number of times that a team makes the playoffs is divided by one million to determine the playoff probability. The simulations are repeated each day for all future games. Further details are available here and here.

I gathered all the probabilities for every team for every day for the past three seasons. Although this isn’t a completely valid assumption, I considered the probabilities calculated for each day as an independent event. This resulted in 16935 samples. I then grouped the samples into 5% bins and determined how many of the samples corresponded to teams that made the playoffs. If the calculated probabilities are correct we would expect the percentage of teams in each bin that made the playoffs to equal the percentage value corresponding to that bin. The results are shown in the following graph, with the expected height of each bar equal to the black line.

The results actually fared better than I expected, with a 6.5% root mean squared error (RMSE). My suspicions were correct that the model tended to be overconfident, in that not as many teams in the 80% range make the playoffs as predicted and similarly more teams in the 20% range do make the playoffs than predicted, however not to the extreme degree that I would have thought.

It’s helpful to also examine different parts of the season. The results for the first half of the season are shown in the following figure. This again shows that the model is overconfident and would benefit from pushing more teams toward the 50% mark. However it is still quite accurate given how many games are left to play and the vast changes that occur with so many games remaining. The RMSE is only slightly higher than for the full season results, at 7.2%.

The results for the second half of the season are actually worse, 8.6% RMSE. This is surprising since fewer games need to be simulated so there is less left up to chance. However that also leaves less room for error and less time for teams to regress to the mean. There were seemingly a few teams that the model should have been more confident in making the playoffs in the 70% range.

It should be noted that this is still a fairly limited sample. The 16935 samples are really only 90 teams, of which only 30 made the playoffs. However I’m still fairly impressed by the performance of the model, especially given that it does not make any predictions about team activity. The same winning percentage is used as the base for calculations (before the random variation is applied) for every remaining game in the season. This means the effects of the pitching rotation are ignored. The Mariners would effectively have the same odds of winning any game that Felix Hernandez starts as they would in any game that Roenis Elias starts. This assumption is difficult to avoid since rotations can be extremely difficult to predict more than five days in advance. However the random variation applied to each game could become more structured to possibly account for this.

Similarly injury concerns or future trades aren’t factored into the model. Neither is the chance that a team out of the running will use most of its 40 man roster in September and drop its expected winning percentage further. The expected run values can also be difficult to calculate after all the offseason activity, and for that reason the playoff odds are kept constant for the first month of the season.

After analyzing the odds though, I’ll probably take them a little more seriously next season, at least before I can perform another analysis with an added year of samples. After all large division leads early in the season still mean something. On that same June 6th, the Giants had a 9 game division lead and a 98% playoff probability. They may have lost the division lead, but they still clinched a wild card berth with 3 games left to play.