A Comparative Study of Supervised Machine Learning based Models for Winner Prediction in Mixed Martial Art

(1)

Vol. 29, No. 1, (2020), pp. 477 - 485

A Comparative Study of Supervised Machine Learning based Models for Winner Prediction in Mixed Martial Art

Atul Kumar Uttam¹, Gaurav Sharma², Mayank Agrawal³, Anuj Mangal⁴

Computer Engineering and Applications Department, GLA University Mathura, U.P. India

[email protected], [email protected], [email protected], [email protected]

Abstract

The Mixed martial art is a popular sport around the world. And in mixed martial art sport, the Ultimate Fighting Championship (UFC) is one of the most famous flourishing organization. The mixed martial art as the name suggests is a combination of many fighting techniques like boxing, wrestling, jiu-jitsu, and many others. The ufcstats.com is the official website where the data related to all UFC fights from 1993 to till date can be found. The website consists of several fighter characteristics in detail. In this paper, we try to predict the accuracy of different supervised machine learning algorithms for the forecast of the winner based on a fighter’s various characteristics is done. The effectiveness of several supervised machine learning classifier based on Decision Tree, Perceptron, Random Forest(RF), Stochastic Gradient Descent (SGD), Bayes Classifier, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and eXtreme Gradient Boosting (XGBoost) classifiers are tested on a time series data of a fighter’s several features. An accuracy of up to 68% is achieved with these models.

Keywords: Mixed Martial Art, Supervised Machine Learning, Winner, Ultimate Fighting Championship (UFC), XGBoost, Random Forest.

1. Introduction

Mixed Martial Art (MMA) is a very popular sport in the world. The contemporary MMA started on November 12, 1993, through the relay of the Ultimate Fighting Championship (UFC). Initially, the Ultimate Fighting Championship had no rules except very few like no eye gashing and no biting. It is a very technical and complex sport in which fighters several characteristics are tested against each other in a three to five-round per event. A round can last up to five minutes or until a fighter give-up. Like in boxing in UFC also one fighter is assigned the red label and another is assigned a blue label. The red label is assigned to the highest rank fighter and blue label for another. A fight bout can be won by a fighter through knockout or submission within the stipulated time limit, or by the decision based on point by judges based on the different factors when the time limit ends [10].

The paper is organized as follows: in section 2 we present a summary of the data set used in this paper. Section 3 presents a literature survey on the studies done earlier by the authors on the UFC data set. Section 4 presents an overview of the various supervised machine learning-based approaches. Section 5 presents our system model used for prediction based on the machine learning algorithm described in section 4. Section 6 presents the result and discussion. Section 7 presents the conclusion and finally, Section 8 presents the references used for this study.

(2)

Vol. 29, No. 1, (2020), pp. 477 - 485

478

2. Data and Feature Manipulation

For the study purpose, we use the data from Kaggle, composed [9] from November 1993 through 2019. The data consist of fighter records and their characteristics (features). It consists of approximately 3600 records and around 160 features. According to the data Red side has approximately 2380 wins as compared with blue which has a total of 1220 wins. As the data suggest that the red side has approx 51.2% more wins than the blue side. A heat map of the ten features from the dataset (as the number of features is almost 160) is generated (Figure 1) to show the correlation of data. Unknown values in the dataset are replaced with zero and all categorical data are replaced with numeric values in the pre-processing step.

Figure 1. Heat map of the partial feature set

3. Literature Survey

In reference [1] authors have applied seven machine learning approaches namely: Decision Tree, Perceptron, Bayes Classifier, Stochastic Gradient Descent (SGD), Random Forest(RF), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) on raw data set collected by them and then again applied the machine learning-based above seven models on processed data after feature reduction. They have achieved a 61 % of accuracy for winner prediction based on the SVM model. A further increase in accuracy requires a larger dataset.

In reference [2], Johnson J.D. in his master’s thesis has used logistic regression to predict the win/loss based on a fighter’s career statistics. The author had considered 14 primary variables and grouped them into four categories (count variables), then created five secondary variables from primary variables, which are grouped into three categories. The author has used the logistic regression model to predict the win with 29.73%, which is a less significant outcome.

In reference [3], Gift P. has explored the determinant of mixed martial art player victory from a judging point of view, with the extreme center on the effect of non-performance components in judging choices. The author has tried to find whether there is favoritism to titleholders in the decision of judges. The author has also surveyed details about the judging process in mixed martial art.

(3)

Vol. 29, No. 1, (2020), pp. 477 - 485

In reference [12], Collier et al. has estimated a probability-based regression model to forecast the winner based on the fighter's characteristics. Their model tried to recognize the effect of fighter’s characteristics on the likelihood of winning about in mixed martial art. In this study, the author has found the most significant variable and less significant variables and their effect on winning. The data used for study in this paper from November 2000 through the end of 2009.

In all these studies the winning probability of a player is less significant, so a detailed study is required to predict the accuracies of the machine learning-based model using fighter’s characteristics.

4. Methodology

In this paper, we have taken following supervised machine learning classifier for testing the accuracy of different models especially we have used extreme gradient boosting algorithm.

4.1. Perceptron learning algorithm

It is a binary classifier based on supervised machine learning. In Perceptron there is an input layer of neurons, then one or a few hidden layer of neurons and finally an output layer of neurons. Initially, the input feature vector is used as input for the first layer. The output of first layer is used as input of next layer in this model. The final output layer determines the input vector’s class.

Suppose our input feature vector (X) consists of n dimensions. And the output vector (d) consists of m dimensions. Then by processing the input feature vector (X), the perceptron generates the output vector Y(X,w). Here w is the weight vector. Then the error is calculated and corrective adjustments are made to minimize the error [6].

4.2. The Naive Bayes Classifier

A Naive Bayes classifier is a probability-based supervised machine learning model. It can also be used for the classification task. It is based on the Bayes theorem, which is as follows.

p(x|y)=(p(y|x)p(x))/p(y)

Where p(x|y) = probability of event x occurring given that y is true. Similarly, p(y|x) = probability of event y occurring given that x is true. And p(x) and p(y) are the likelihood of examining x and y independently of each other. The hypothesis made here is that the features are independent of each other. Another hypothesis made here is that all the features have an identical result on the conclusion [11].

4.3. The Random Forest Classifier

Random forest is one of the most used supervised machine learning algorithm.

It is based on the decision tree concept. It can be used for solving the classification and regression problems. It creates a number of decision trees based on a random number of features for each tree. Every tree in a random forest generates a class outcome. A class with the highest number of outcomes becomes the final outcome of the model. The random forest can be used to predict the significance of features [5].

4.4. Decision Tree

(4)

Vol. 29, No. 1, (2020), pp. 477 - 485

480

A decision tree is the one of the basic supervised machine learning algorithm that is widely used in the classification problems [13]. This functions for categorical as well as continuous variables of input and output. The decision tree model splits the input into two or more similar sets depending on the most significant input features.

Decision tree can be constructed by splitting the training data set into several unique nodes. A node consists of majorly one type of training data subset. Each feature from the data set is taken one by one. To split the data the importance of features is used. Then we visit another branch to split the data. When a decision tree is completely constructed through all the features then we can predict the class label of an unknown sample, in our case the winner of the fight which is either red or blue.

4.5. Stochastic gradient descent (SGD)

To optimize an objective function, it requires an iterative process. The actual gradient (measured from the complete data set) is reinstating by an approximation (measured from the randomly selected subset of data). It minimizes the computational complexity, with the drawback of lower convergence rate [8].

4.6. Support Vector Machine

The Support Vector Machine (SVM) is a supervised classification machine learning algorithm. The SVM classifies the data set by calculating a separator, which divides the data set into different classes. Initially, the data set is mapped into a multidimensional feature space. This process is called kernelling. For kernelling linear, polynomial, sigmoid or radial basis function can be used. In the next step, a separator is estimated based on which the hyper-plane divides the data in the feature space for classification. A separator is selected in such a way that it is at sufficient distance from the classes extreme points known as support vector.

In SVM over fitting can be occurring if the number of features is more than the training data set.

A support vector machine model separates the new examples based on the hyper plane which is generated through the trained labeled dataset. In two dimensional spaces, this hyper plane divides the new examples into two separate regions [6].

4.7. K nearest Neighbors Classifier (KNN)

It is a supervised machine learning algorithm in which an output class of a new input is determined by the class label of its nearest neighbors. In K nearest neighbors K stands for the total number of nearest neighbors considered for making a class decision. In KNN a data point’s (a new fighter in our case) class label is identified by K nearest neighbors’ class label similarity. The basis of the KNN is that the similar neighbors are nearby to each other. To estimate the nearest neighbor Euclidian distance is calculated from unknown data points to all known data points. Then K-number of observations is selected in the training data set which has optimum Euclidian distance. To predict the class label of unknown data point the K n earest neighbors’ vote. A class having a maximum vote is the class of the unknown data point [7].

(5)

Vol. 29, No. 1, (2020), pp. 477 - 485

4.8. XGBoost Classifier

Extreme gradient boosting (XGBoost) is a gradient-boosted decision tree that is optimized for speed and performance. It is an impl ementation of [4]

gradient boosting machines, now with contributions from several other developers. XGBoost is a highly efficient framework for solving tree -based algorithms. It has the capability for cross-validation and finding important features. It supports a different kind of objective functions like regression, ranking, etc. Computation using XGBoost is fast since it uses parallel computation on a single machine. XGBoost starts working with all the training dataset to put on a decision tree which creates a first classifier which is less effective in prediction. Then it takes those samples which are wrongly predicted and make another decision tree and creates a second classifier. This process is repeated for a fixed number of times which can vary with t he problem set. Each decision tree provides a classifier. The samples which are wrongly classified by a decision tree are fed as input to the next decision tree.

A final classifier is made with these all decision trees. XGBoost can handle the missing values easily. It is also robust to outliers. It does not require scaling.

5. Our System Model

In this paper, we try to find the likelihood of the winner based on the fighter characteristics using several machine learning algorithms. We have used the system model as depicted in figure 2.

Figure 2. System Model

6. Result and Discussion

From figure 3 it is apparent that the XGB classifier outperforms all other classifiers. The random forest model also performs well and gives a slightly less accuracy of 67 %, than XGB based classifier with an accuracy of 68 %.

While the KNN, Naïve Bayes and decision tree classifier provides poor

(6)

Vol. 29, No. 1, (2020), pp. 477 - 485

482 Figure 3. Accuracy of different models

Table 1. Accuracy of all models

Classifier Accuracy

Perceptron 0.650904033379694 Random Forests 0.6717663421418637 Decision Tree 0.5479833101529903 SGD Classifier 0.6230876216968011

SVM 0.6216968011126565

Bayes 0.4242002781641168

KNN 0.5326842837273992

XGB 0.6759388038942976

Table 2. Result of Perceptron Based Model Model 1- Perceptron

Precision recall f1-score support

Blue 0.46 0.20 0.28 242

Red 0.68 0.88 0.77 477

Accuracy 0.65 719

Macro Avg 0.57 0.54 0.53 719 Weighted Avg 0.61 0.65 0.61 719

Table 3. Result of Random Forest based model

(7)

Vol. 29, No. 1, (2020), pp. 477 - 485 Model 2 - Random Forest

Blue 0.52 0.37 0.43 242

Red 0.72 0.83 0.77 477

Accuracy 0.67 719

Table 4. Result of Decision Tree model Model 3 - Decision Tree

Loss 0.35 0.38 0.36 242

Win 0.67 0.63 0.65 477

Accuracy 0.55 719

Table 5. Result of Stochastic Gradient Descent model Model 4 - Stochastic Gradient Descent (SGD)

precision recall f1-score support

Blue 0.44 0.43 0.44 242

Red 0.71 0.72 0.72 477

Accuracy 0.62 719

Table 6. Result of Support Vector Machine model Model 5 - Support vector Machine (SVM)

Blue 0.46 0.67 0.54 242

Red 0.78 0.60 0.68 477

Accuracy 0.62 719

Table 7. Result of Baysein Model Model 6 - Bayesian Model

Blue 0.36 0.92 0.52 242

(8)

Vol. 29, No. 1, (2020), pp. 477 - 485

484

Accuracy 0.42 719

Table 8. Result of K Nearest Neighbour model Model 7- K Nearest Neighbor

Blue 0.39 0.69 0.50 242

Red 0.75 0.45 0.56 477

Accuracy 0.53 719

Table 9. Result of XGB Model Model 8 - XGB Classifier

Blue 0.53 0.33 0.40 242

Red 0.71 0.85 0.78 477

Accuracy 0.68 719

7. Conclusion

We have estimated the probability of winning based on fighter characteristics through various supervised machine learning models. In this study, we conclude that XGBoost based model outperforms all other supervised machine learning-based models used for study in this paper and a sufficient number of features and datasets could enhance the a ccuracy of different machine learning-based classifier as compared with the study of [1]

and Table 1. By analyzing table 2, table 3, table 4, table 5, table 6, table 7, and table 8 closely we can say that the XGB classifier provides satisfactory results as compared with another classifier on the same dataset. Still, there is a need for improvement to increase the accuracy of these models.

References

[1] Hitkul, Aggarwal K., Yadav N., Dwivedy M. (2019) ‘A Comparative Study of Machine Learning Algorithms for Prior Prediction of UFC Fights’. In: Yadav N., Yadav A., Bansal J., Deep K., Kim J. (eds) Harmony Search and Nature Inspired Optimization Algorithms. Advances in Intelligent Systems and Computing, vol 741.

Springer, Singapore

[2] Johnson, J. D. ‘Predicting Outcomes in Mixed Martial Arts Fights with Novel Fight Variables’. (Master of Science). University of Georgia, (2009).

[3] Gift, P. ‘Performance Evaluation and Favoritism: Evidence From Mixed Martial Arts’. Journal of Sports Economics, 19(8), 1147–1173, (2018).

https://doi.org/10.1177/1527002517702422

[4] Tianqi Chen and Carlos Guestrin. (2016). ‘XGBoost: A Scalable Tree Boosting System’. In Proceedings of the 22nd ACM SIGKDD International Conference on

(9)

Vol. 29, No. 1, (2020), pp. 477 - 485

Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 785-794. DOI: https://doi.org/10.1145/2939672.2939785

[5] Leo Breiman. (2001). ‘Random Forests. Mach. Learn. 45’, 1 (October 2001), 5-32.

DOI: https://doi.org/10.1023/A:1010933404324

[6] E.A. Zanaty, ‘Support Vector Machines (SVMs) versus Multilayer Perception (MLP) in data classification’, Egyptian Informatics Journal, Volume 13, Issue 3,2012,Pages 177-183,ISSN 1110-8665,https://doi.org/10.1016/j.eij.2012.08.002

[7] Wei Meng Lee ‘Supervised Learning-Classification Using K-Nearest Neighbours (KNN)’. (2019). Python® Machine Learning, 205–220.

doi:10.1002/9781119557500.ch9

[8] Qian, Q., Jin, R., Yi, J., Zhang, L., & Zhu, S. (2014). ‘Efficient distance metric learning by adaptive sampling and mini-batch stochastic gradient descent (SGD).

Machine Learning’, 99(3), 353–372. doi:10.1007/s10994-014-5456-x

[9] Warrier R.. (2019,July). UFC-Fight historical data from 1993 to 2019, Version 2.

Retrieved, October 2019 from https://www.kaggle.com/rajeevw/ufcdata/metadata.

[10] https://www.theguardian.com/sport/2016/mar/04/the-fight-game-reloaded-how-mma- conquered-world-ufc.

[11] Constantinou, Anthony & Fenton, Norman & Neil, Martin. (2012). pi-football: A Bayesian network model for forecasting Association Football match outcomes.

Knowledge-Based Systems, 36, 322-339. Knowledge-Based Systems. 36. 332-339.

10.1016/j.knosys.2012.07.008.

[12] Collier, Trevor & Johnson, Andrew & Ruggiero, John. (2012). Aggression in Mixed Martial Arts: An Analysis of the Likelihood of Winning a Decision. 10.1007/978-1- 4419-6630-8_7

[13] Kamiński, B., Jakubczyk, M., & Szufel, P. (2018). A framework for sensitivity analysis of decision trees. Central European journal of operations research, 26(1), 135–159. doi:10.1007/s10100-017-0479-6