Probability Analysis with Web Scraping and Linear Regression
Resumen
Since the dawn of sports, probability and trends have played an important role in predicting the outcome of an event. It gives the
public a general idea of how the match will occur, providing a powerful tool in analytics. This project intends to extract raw tennis match data from verified sources and apply mathematical equations to predict the probability of the outcome of a particular match. Firstly, with the help of the programming language Python and a popular technique known as web scraping, the data can be
extracted from a verified source, such as the Association of Tennis Professionals (ATP), and validated. After the data is extracted, in this project, three different algorithms will be applied, with the goal of predicting the outcome of a particular tennis match. These algorithms are known as linear regression (decision tree, ridge, and lasso) and are made with the programming language Python. There will be a section describing each equation in detail and how it works. Finally, the results will show which of the
equations best predicted the outcome of the different matches and conclusions will be drawn based on the results of each equation. Key Terms ¾ ATP, linear regression, Python, web scraping.