Pages

Tuesday, August 27, 2019

Value Points: Exploring Linear Regression in MLB Showdown


              Wins.  We play to win the game.  Whether you’re a fan, a statistician, a gambler, a Showdown owner, or Herm Edwards your focus is on winning and who’s going to be the catalyst.  Let me lead with, this article won’t be for every fan, owner, or enthusiast.  I tried to write this to read detailed, concise, relatable, and in terms that weren’t solely geared for me.  Yes, it’s centered on the creation of an advanced metric, but it is also a sports posting.  Much like the Red Sox vs Yankee’s rivalry, the intention isn’t for everyone to agree…it’s to win.


                I also want to add that I’m not involved with this league in any way.  I’m an all-things-sports consumer and old school Showdown enthusiast that likes to place wagers and was looking for a leg up. While there was some small hedging of bets, I was able to find value future bets in the Astros and Red Sox the past 2 summers through applying these formulas - the MLB versions.

                You’re correct in guessing that I won’t be placing bets on my predictions for this Showdown league, but being the number junkie I am I thought it would be interesting to apply the metric at the player level, as well as to Showdown results since the charts are based on actual MLB results.  The main focus I had when starting this analysis was to quietly apply the metric out of curiosity until I realized others may have a mild interest.

                So what is linear regression?   First off, it can be found in the basis of most predictive sports metrics.  How it's applied may vary from stat to stat, but the overall concept remains the same.  The linear regression concept is to take an input variable (the regressor) and find how it impacts predicting an outcome (outcome variable).  As you move forward in the regression formula you look to find which specific variable has the biggest impact on the outcome...i.e. ‘what leads to W-L?’.  Sports is naturally an exercise in speculation on how to get the advantage.  The lefty-righty match up, the defensive shift, the prized free-agent signing, or leveraging prospects for the deadlines hottest Ace pitcher.  In the name of winning, the consistent goal is what do I do, and how do I find out the best approach to achieving it?  Enter the ‘Value Point’ (VP).

I’ll table the propaganda on winning, and move to explain what the ‘Value Point (VP)’ is.  It’s a speculator's approach to predicting who’s winning the World Series...but predicting that in August when the better value is to be had.  It assigns consistent point values to the stats we all know and puts those stats on the same playing field by using constants and league averages.  Those numbers are then regressed to see how stable or sustainable that productivity is.  Finally, that productivity can then be used to measure a team’s performance to its competitors, and whether it really was bad luck, or if it’s just an average team that plays some close games.

                The VP, while detailed, is not a guarantee.  At the player level, it’s meant to show the rate that a player generates value, and how that value is impacting their teams winning.  At the team level, the intention is to find a team’s efficiency in converting their rosters value to winning.  It’s best to use VP metrics for seasonal outcome projecting…i.e. what a team will be at the end of the regular season, not a game to game projection.  This is why regression is a key consideration.  Pitchers have off days, the un-foreseen couple days rest for a star hitter, or sometimes it is actually just bad luck.

                Let’s look at the points system used in Showdown VP and some of the terms used to determine VPWp (Value Point Wins projection):






Potential gaps/cons:
  •         Evenly weights pitching and hitting on a game's outcome
  •         Pitching values SO and IP to heavily
  •         Player usage is not a driving factor…by design
  •         The regression rate is too stable
  •         While a “ballpark factor” is included, fielding is not considered in the influence on a game’s outcome
  •         Applying a team level designed metric to player level stats


Let’s circle back to the basic explanation of what linear regression is….

“…take an input variable (the regressor) and find how it impacts predicting an outcome (outcome variable).  As you move forward in the regression formula you look to find which specific variable has the biggest impact on the outcome...i.e. ‘what leads to W-L?’.  “

In the VP formula a player or teams hitting and pitching points are the regressor, and the above “terms to learn” are the outcome variables.  By consistently awarding points per category, and then multiplying by ‘Ballpark Factors’, equal regression projections, and league averages, it puts the players and teams on a level playing field.  That level footing then highlights their efficiency in the opportunities they get to produce win outcomes.  In the case of VP, an opportunity for a win gets granular; for hitters, it’s based on positive outcomes in plate appearances, and for pitchers its positive outcomes in batters faced.  This means that’s there a winner and a loser is every interaction during a game, and those wins and losses have varying levels of impact on the overall final W-L results.

The main factor in projecting VP win formulas is the team aspect.  One might think, ‘if you are measuring a players productivity, why factor the team?’.   This is done because we know being the best player on a bad team clearly doesn’t translate to team winning - think Giancarlo Stanton on the Marlins.  VP is intended for projecting winning as a group, and what a player contributes to that winning.  You’re going to see that some top or solid performers are on the bottom teams in the league and that a player’s surrounding supporting cast directly drives overall results.

                So, why regress the numbers?  The quickest and clearest example I can give you is tied to opening day.  When the Yankee’s won their 2019 season opener against the Orioles they gave themselves a winning % of 100% and by that math an ‘on-pace’ target of 162-0 record.  Luckily for this Sox fan, the Yankee’s lost their next 2 games and saved baseball fans the terror that would follow if the Bronx Bombers went undefeated.  That 3 game sample size gave the Yankee’s the full spectrum of a very literal W% and shows why regression needs to be factored...100%, then 50%, then 33%.  By factoring in the performance, and opportunities you get a more balanced projection:

Regressed Winning % = (Win’s if record is .500 + Actual Wins) / (Total Games + Wins + Losses)

Applied to the Yankee’s after 3 games that looks like this:

(1.5 + 1) / (3 + 1 + 2) = .416

                Sure the Evil Empire has out-paced that 41% projection after 3 games, but you can see how a standard W% is not always an accurate measure of success, and that regressed W% does not automatically mean a reduced W%.  This makes it a more stable number.


                Since this article is not intended to be a complex showcase or explanation on VP, or even linear regression, but instead intended to give a basic presentation on a new metric in assisting speculators, let’s just get to the results of this leagues “at the break” performances.





Players likely to get better

Chipper Division: Jed Lowrie - 
Not the player most would have guessed, but he has worked his way to deserving a mention in the MVP conversation.  103.53 Total VP’s, 3.45 regressed VP per game, and a projected VPWp of 2.95 wins.  He’s on the Juggernaut that is Dat Boy X and has been contributing, not just holding his own.  At the break, he shows 40 hits, 12 BB’s, 22 RBI’s, and a plus .400 OBP out of the 6 hole in the line-up.  The team is a monster, which factors into his near 3 VPWp…offense feeds off offense.  He’s a $300 cost player performing like a $430+ player.

Sheffield Division: Alex Bregman - 
Likely not a surprise for some compared to Lowrie.  His VP metrics show that he’s a solid performing player on an average team in the Bandidos.  He’s positioned at 6th in the division for regressed VP per game at a smooth clip of 3.40 per game.  He’s managed to put a well-rounded season to an at the break VP total of 102.01.  His table setting projects to a very respectable 2.39 wins for his team, and should give this Skyway team some hope for improvement over the 2nd half.  He and his 3B/SS teammate Johan Camargo are the offensive leaders of the line-up and project to leading the team to a 26-22 record, but they likely aren’t a strong enough duo to propel the Bandidos to overtaking their cross Skyway rivals the Rainiers.


Players Quietly Out Performing Their Cost

Chipper Division: Mallex Smith - 
He’s hit his way to the top 20 of Value Point Win projection (VPWp) and nearly broke the 100 VP point threshold in the first half (99.84).  His productivity, while quiet at first glance, has been excellent for a $300 cost player.  At 3.33 regressed VP per game, he’s overcome 0 HR’s, thanks to 11 triples!  He has 42 hits and a plus .300 AVG, but his 39 SO’s over 27 games has limited his OBP.  His value’s there, but he finds himself on the Crush, who has projected to be a league-average team at season close.

Sheffield Division: Jean Segura - 
$240 cost, but producing a higher VPWp then his crosstown Skyway counterpart Alex Bregman that came in at a $500 cost.  His 0.70 VP points per PA leads to a regressed per game VP of nearly (2.96) and a sustainable cost per VP point of only $2.70.  A players cost per VP improves over the season as VP is accumulated, but with his small salary Segura has already outperformed the value the Rainiers likely anticipated when he was drafted.  He’s putting up impactful small ball numbers on a winning team.  Since part of the VP approach is a team factor as a whole, an elite offense amplifies an above-average players VP because there are others there to capitalize on the previous batter or pitchers success.  It’s a team metric, in a team sport and Segura on paper has done his part.


Teams to Watch for Some Improvement

Chipper Division: Acuna Matata - 
This projects as either a 2nd half-monster or the unluckiest team in the league.  Going into the break at 13-14, and 4 games back of 1st they have an uphill battle to 1st place.  When regressing their stats, they show that they are poised to explode and go 14-7 over the 2nd half.  Similar to the Yankee’s example used earlier, regressing stats is not meant to lessen a team’s numbers, but instead, stabilize them.  Powered by over 37 VP per game, a sub-$5.00 cost per VP, and is 1 of 3 teams with over 1000 total VP they can make some waves.  Their ability to continue to generate VP at an extremely high rate will be the deciding factor in reaching their VPWp of 27 wins.  They have, however, been extremely inefficient in converting their produced VP to wins which is also what could keep them a .500 team.

Sheffield Division: Wisconsin Wolves - 
Similar to the Matata’s the Wolves look ready to pounce.  They have a blistering VPWp of 29 even though they have an at the break record of 14-13. To this point of the season, they have been what could be argued as the most efficient team with a total of 983 VP, a 70.22 VP per W (VPW), and a Ballpark factor 1.03 (4th in the league).  They also have the highest VP/game of their division at 36.41.  If they stay the course, watch for them to challenge the Rainiers for the division’s best record.

Teams to Watch for a Step Back

Chipper Division: Aurora Anteaters - 
With the lowest ballpark factor in the division (0.94), the worst VP per Win rate (89.37), and a current league-worst record of 10-17 the Anteaters already long season doesn’t project well over the home streak. They also, unfortunately, are carrying one of the most bloated payrolls for the production they are getting…2nd worst in the league when it comes to cost per VP at $5.59.

Sheffield Division: Cobra Chickens - 
Their current record puts them on-pace for 21 W’s, but they have a team VPWp of 19.  While a VP per Win of 66.04 would lead one to think that they are efficient in winning, it can also show that a team doesn’t generate a lot of VP and hasn’t won a lot of games.  That is the case for the Cobra Chickens who are the only team currently generating less than 30 VP/game.  Coupled with the league-worst Ballpark factor (0.83), this team is just not productive enough to win it all.  Did I mention that have the worst cost per VP rating as well?  $6.31 per!  Yikes.


VP Offensive MVP

Chipper  Division:
1st) Manny Machado – Leads the division with regressed VP per game at 4.15, thanks to his 0.80 VP per PA.  Leads the Chipper division with 103 total bases (2nd most is 93), he is tied for most HR’s with 10, and has a division-leading 22 extra-base hits.

2nd) Ronald Acuna – 7 HR, 30 RBI, 43 hits, 24 runs, and a .350 AVG.  The kid is well rounded and contributing.  For him to finish 1st or 2nd in MVP voting the Matata will have to achieve their potential over the 2nd half.  His 0.80 regressed VP per PA is tied for tops in the division so going into the second half he looks like he will continue putting up strong numbers.

3rd) Justin Turner – Having a solid season and actually hitting for a higher AVG than the two players listed before him, .390 AVG and division 1st OBP .464 thanks to a division high in 48 hits. Combined with less than 25 K’s, and 23 runs scored you can see he puts the ball in play and gets on base.  Problem is his pop…only 4 HR’s and 13 RBI.

Honorable/Sleeper Mention: Jed Lowrie – A measly $2.90 per VP point, 0.73 VP per PA, and a 3rd best in the division with 2.95 VPWp


Sheffield Division:
1st) Christian Yelich – Who the hell hits .426 over 117 PA’s?  A guy that has 49 hits, .480 OBP, swipes 6 bags, and leads the league with 29 runs scored.  Let’s not forget that the same .426 hitter slugs at a .809 rate thanks to 26 extra-base hits.  Yelich is the only player in the league to have a regressed VP per PA that is over 1 (1.02).  His 144.09 VP points stack up nearly 20 points higher than 2nd place Machado.  That all combines to him projecting for a monstrous 4.31 VPWp…as in, his productivity alone will be responsible for over 4 wins.

2nd) Bryce Harper – The league leader with 11 HR.  He gets to also boast 30 RBI’s, while scoring 21 runs himself. SLG’ing over .650 and nearly 100 total bases on the season.  With the season Yelich is having Harper won’t outproduce him, but cutting down those strikeout numbers would have helped his .404 OBP sniff the .450 mark and break that 100 TB mark.

3rd) JD Martinez – Having one of those well rounded quiet Paul Goldschmidt in AZ type seasons.  He also gives the Wolves 2 bonified MVP candidates.  While all his metrics come in at impressive clips, the biggest knock would be his 85 total bases, which is good, but lower than the other ‘pop’ guys.  Impressive numbers and an excellent 1st half, but he too finds himself in the tier below Yelich.

Honorable/Sleeper Mention: Max Muncy – Second best Sheffield division AVG at .362, 42 hits, 19 BB’s, and a stellar .452 OBP.  42 K’s appears to have dampened his RBI counts a bit.


VP Pitching MVP

Sheffield Division:
SP 1st) Blake Snell – a VPWp of 1.78, 91.57 total VP, and creating VP at a per-game clip north of 2.50.  He totes a sub 3 ERA, over 13.33 S0/9, and 63.2 IP.  His numbers look sustainable and he’s on an above-average team that will help fuel wins.

RP 1st) Craig Kimbrel – One can easily build a case for him to be the divisions Cy Young.  The stats get nastier the further down the sheet you look…, .770 ERA, 11 saves, 2 earned runs, 23.1 IP, 61 SO’s.  For $220 at closer you need elite production and he’s delivering.  A VPWp of 1.77 which is 0.01 less than the division-leading Snell, and good for 4th best in the league for all pitchers. Filth.

RP 2nd) Seranthony Dominguez – Maybe a bit of a surprise for most to see, but he has an insane WHIP of .563.  Coupled with a 7 for 7 in save opportunities and a 1.125 ERA.  While he’s providing 1.42 VP per game he’s, unfortunately, having to close games out for the Lemons.  Good closer on a bad team…sounds like a prime deadline deal piece in the real MLB.


Chipper Division:
SP 1st) Chris Sale – 13.56 SO/9…filthy.  2.53 ERA, 1.088 WHIP, 57 IP, and .75 VP per IP.  He’s pricey, but he’s backing it up with his Chipper pitcher leading 91.54 total VP points.  He produces like an above-average everyday hitter but plays 20-25% less.

SP 2nd) Trevor Bauer – Division-leading ERA and IP, 2nd most SO’s, but his walks are a problem at 4.89/9 IP.  He eats his innings, makes it hard on himself, and still gets it's done.  That is why he outpaces all other Chipper pitchers with a projected VPWp of 2.18 wins.

RP 1st) Sean Doolittle – Projects to provide 1.24 wins for the Titans with his outstanding 1.216 ERA, .676 WHIP, and 6 for 6 in save opportunities.  The top-rated closer, in the division and a relief pitcher that generates VP like a #2 top of the rotation SP.


 RP 2nd) Jeremy Jeffress – 1 of 2 pitchers in the Chipper division to generate 1 or more VP per IP.  He’s also 6th in the division in VPWp at 1.52.  Who does that put him in company with?  Just behind Kershaw and Sale, and positioned ahead of Syndergaard and Carrasco…that’s elite for an RP.






             So what is a driving force in VP W-L?  What are the input variables that drive winning productivity, not just productivity?  It proves out the simple old baseball saying of ‘put the ball in play, and good things will happen.’  For hitters, it’s about contact.  For pitchers, it’s about limiting balls put in play.  If you dive into the VP spreadsheet for the league you’ll see the top pitchers are going deep into games with high K rates, or being destructively dominant in relief.  For the hitters, it’s not about going deep.  There are plenty of small ball style batters that are producing robust statistics on good teams.  It’s about a blend, a mix of all elements pitching and hitting.  There’s nothing revolutionary in that comment that most fans don’t already know, but VP is yet another way in finding and communicating that blended value.

For this fan, the VP Showdown exercise has been unique and interesting to see.  While I’ll make a stand and call out projections, after all, I am a gambler, I also anticipate flaws.  The VP metric wasn’t initially created to be applied at the player level.  I feel comfortable that it’s a solid tool in projecting team level winning, and clearly, players are what drive winning outcomes, so the application does make sense if tweaked and applied correctly.  For great teams, it reinforces and justifies what we know from the eye test.  For those middle of the pack, ‘we got unlucky’ teams, it provides clarity to whether it was unlucky or actual reality.

I tried to not dive too far into all the factors, or elements of the metric, but I’m open to sharing more information if someone is interested.  Your feedback, insights or questions are welcomed as it just strengthens the formula through the community’s refinement.  Again, I don’t carry a team in this league, don’t know any of the owners, and have no stake in the outcomes.  While I’m direct in calling out winning and losing it’s not an attack on that team.  Baseballs a funny sport and projections don’t always come true.  I think that's where some of these well-constructed teams find themselves…no one would have guessed the real 2019 Chris Sale would allow 24 HR’s before being shut down.  I will be keeping an eye out on results, and hope to apply the VP formulas to the end of season statistics with a focus on how it correlates to predictive winning in Showdown (if at all).  This truly is a shot in the dark until the season concludes, but for now, the results are aligning and reinforcing what we do know based on the 1st half of the season.  The league and concept of it in the first place are awesome, and I'm just glad to be able to share this.

With a large enough sample size, and a shrinking window for negative Adhoc season-changing impacts, I will also be working on flagging my MLB future wagers over the next 2 weeks.  I’ll admit my Sox don’t have it in them this year, but I’m not working off the gut, I’m looking for an advantage.


After all, as Herm taught us, “We play to win the game”.





4 comments:

  1. That’s amazing man thank you so much for sharing this, an awesome read, and when it comes to mvp voting I’m going to probably ask you for another write up cuz this is amazing

    ReplyDelete
  2. Glad you enjoyed man. This has been fun for me to work on recently.

    I know these types of posts can sometimes be overwhelming, but I just appreciate being able share this for anyone that has interest.

    ReplyDelete
    Replies
    1. I agree with Matt -- it was AMAZING!!!!

      And rather than overwhelming, this is exactly the sort of in-depth and thorough analysis I've been craving!

      Plus it didn't hurt that your models showed my team should be one of the best, just makes me feel good about my drafting strategy even though I'm 3 games out of the playoff race right now ;)

      Delete
    2. Your strategy seems to be working.

      Interesting to see those players like Lowrie and Jeffress that are out performing their salary or draft position, and those that are having solid seasons but under performing due to cost.

      Similar to Josh Hader being randomly lights out last season, and how Bryce Harper is not playing to his new Phillies contract this year.

      I'll be interested to see how the rest of your season shapes up and if these projections turn out to be trash or close to the real results.

      I'll move to the background again and observe, but you know and Matt know how to get in touch if you guys need/want anything.

      Delete