Learn Sports Betting - Find an Edge - Get Sportsbook Bonus Reviews - Become an Advantage Bettor  - Beat the Bookie - Discuss in the Sports Betting Forum
 
Data Mining PDF Print E-mail

Making observations while looking over sports and sportsbook data is how most sports bettors make a determination about future bets.  Unfortunately, most do not engage in any formal process to do this, and subsequently make mistakes in believing that some events are correlated when they are not.  An MLB bettor, for example, might make the observation that during the past week, the left-handed pitchers won 80% of the games played.  He may wish he had bet on the lefties all last week, and determines that he will bet the lefties all next week.  He has made a legitimate observation, but he has not done any formal analysis to ensure that the observation he has made is relevant to future bets.  He is essentially betting by feel, not by logic.

Let me make a poker analogy.  Let's say that I play texas holdem poker for 100 hands, and the hands can be numbered sequentially from 1 - 100.  I make the best hand at the table in hands numbered 8, 21, 35, 44, 58, 64, 72, 87, and 92.  Now I want to play another 100 hands.  Would my best strategy be to bet heavily on hands 8, 21, 35, 44, 58, 64, 72, 87, and 92 in this second set of 100 hands?  Certainly not!  While it is true that this strategy might have worked well for the past 100 hands, there's no relationship between that sequence and poker in general.  With a random distribution of cards, there's no predicting which hands I will get that will be the best in a future series of 100 hands.

So it goes in sports betting.  The lefties may have won 80% of their games all last week, but that is not necessarily a predictor of next week's outcomes.  

Proper Use of Data in Sports Betting.

How then can our sports bettor properly follow up on his observation that lefties won 80% of the tine last week?  First off, he should look to collect more data.  There is a finite number of MLB games played in a week, and an even smaller number of them were pitched by lefties.  He should broaden the time frame to include more data points.  Look back over an entire month, for example.  The larger the data sample size, the more likely it is to provide statistically relevant information.  Our bettor goes back over the past month and finds that lefties have won 77% of the time.  That still appears to be something worth exploring.  At this point the bettor has moved past a simple observation, he is developing a hypothesis.  His hypothesis is that left-handed pitchers are statistically more likely to win than right-handed pitchers.What now?

The next step is to ditch all the data collected thus far!  Sounds a bit absurd, perhaps, but it must be done.  Let's go back to the poker hands example I gave above.  Say my hypothesis was that the game was rigged, and I was always going to get the best hand in the same sequence every 100 hands.  I would have to test that against a different sequence of 100 hands, not the same 100 hands I had used to formulate my hypothesis.  

This must also be done in testing a sports betting hypothesis.  Certainly your hypothesis looks true for the data you have already observed (the past month) but that does not mean it can be used to predict future events.  To gain any confidence in your hypothesis you need to test it against fresh data.  You need to discard the current month's worth of games and get some new data.

The best source of that fresh data is upcoming games.  You should test your theory by simulating bets over the next month or two and collecting the results.  Unfortunately,  this is time consuming and you'll burn up most of the season collecting fresh data to test. 

A reasonable alternative is to look backwards at other games already played.  You can't use this past month's games as they were used to form the hypothesis, but the month before that can be used.  You can possibly use all of last season, as well.  

The thing to be careful of when going back in time to get data is that the sport changes over time, making older data less applicable.  This is particularly true in NFL, where there are only a few hundred games every season.  You might use up 3 or 4 seasons worth of data to get a decent sample size when formulating your hypothesis.  Now to check it you may need to look at games from 5-8 years ago.  There have been rule changes and new playing styles evolving over the past 8 years.  Results from that time frame may not be an accurate predictor of future games.

Once you have your data sample to test your hypothesis, you need to know if your results are statistically relevant.  You can use the binomial distribution to see if it meets your threshold for significance.

 

Mrs. Bull's Doghouse 2005-2009