Hello folks! In this post, we’re going to look at two of the most popular classification algorithms – Naive Bayes and Decision trees.
Machine learning is a vast area with various algorithms to train the data on a model’s outcome and forecast it, thus improving business standards and strategies.
Let’s get started.
What is Naive Bayes?
The Naïve Bayes Classifier is based on the Bayes Theorem and is a probabilistic classifier.
A classification problem in Machine Learning represents the choice of the best hypothesis given the data.
We try to classify the classmark this new data instance belongs to, provided a new data point. Previous experience of previous data allows us to identify a new data point.
The Naive Bayes Theorem
The theorem of Bayes gives us the probability that event A will occur since event B has occurred. For instance,
Given gloomy weather, what is the chance that it will rain? It is possible to call the possibility of rain as our hypothesis and to call the event reflecting cloudy weather as evidence.
As a posterior likelihood, P(A|B) is labeled
P(B|A)-is the conditional likelihood of B given A.
P(A)-is defined as the preceding likelihood of event A.
P(B) irrespective of the hypothesis
How Does the Naïve Bayes Classifier Work?
To demonstrate how the Naïve Bayes classifier works, we will consider an Email Spam Classification problem which classifies whether an Email is a SPAM or NOT.
Let’s consider we have total 12 emails. 8 of which are NOT-SPAM and remaining 4 are SPAM.
- Number of NOT-SPAM emails – 8
- Number of SPAM emails – 4
- Total Emails – 12
- Therefore, P(NOT-SPAM) = 8/12 = 0.666 , P(SPAM) = 4/12 = 0.333
Suppose that only four terms [Friend, Bid, Money, Wonderful] constitute the entire Corpus. In each type, the following histogram represents the word count of each word.
Now we’ll measure each word’s conditional probabilities.
The below formula will measure the likelihood of the word Friend occurring as the mail is NOT-SPAM.
Calculating the probabilities for the whole text corpus.
We may apply the Bayes theorem to it, now that we have both the prior and conditional probabilities.
Suppose we get an email: “Offer Money” and we need to categorize it as SPAM or NOT-SPAM based on our previously determined probabilities.
Types of Naïve Bayes Classifier:
- Multinomial – It is used for Discrete Counts. The one we described in the example above is an example of Multinomial Type Naïve Bayes.
- Gaussian – This type of Naïve Bayes classifier assumes the data to follow a Normal Distribution.
- Bernoulli – This type of Classifier is useful when our feature vectors are Binary.
Implementation
Let’s import the required libraries:
#Loading the Dataset from sklearn.datasets import load_breast_cancer data_loaded = load_breast_cancer() X = data_loaded.data y = data_loaded.target
Then we fit training data to our model:
#Splitting the dataset into training and testing variables from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2,random_state=20) #keeping 80% as training data and 20% as testing data.
The feature data (X-train) and the target variables are required as input arguments(y-train) by the .fit method of the GaussianNB class.
Now, let’s find out how accurate the accuracy metrics were used in our model.
#Import metrics class from sklearn from sklearn import metrics metrics.accuracy_score(y_predicted , y_test)
Accuracy = 0.956140350877193
We got accuracy around 95.61 %
Feel free to experiment with the code. You can apply various transformations to the data prior to fitting the algorithm.
What are Decision trees ?
The Decision Tree is a machine learning algorithm that uses a decision model and provides an event’s outcome/prediction in terms of chances or probabilities.
It is a non-parametric and predictive algorithm that provides the result on the basis of modeling those decisions/rules framed by observing the data characteristics.
Decision Trees learn from historic data as a supervised Machine Learning algorithm.
So, based on it, the algorithm trains and constructs the model of decisions, and then predicts the result.
Decision Trees can fulfill any of the following tasks:
- Record classification on the basis of probabilities to which group the records belong.
- A quality estimate of a certain target value.
In the upcoming segment, we will learn more about the above tasks.
A Decision Tree represents a structure resembling a flowchart that delivers the result based on probabilistic circumstances.
- The internal node for the attributes/variables represents the state.
- The outcome value of the condition is expressed by any branch.
- After resolving all the conditions of each attribute, the leaf nodes provide details about the decision to be made.
Different algorithms such as ID3, CART, C5.0, etc. are used by Decision Tree to determine the best attribute to be put as the root node value, which implies the best homogeneous set of data variables.
It starts with the comparison of the root node and the tree attributes. The comparison value tests the decision model. This continues until the expected outcome classification or estimation value hits the leaf node.
Now, let us take a look in-depth at the various forms of Decision Trees.
Types of DTrees
The following types of values are predicted by the Decision Tree:
- Classification: for the categorical data variables it predicts the class of the target value. Prediction, for instance, if the attribute gives the outcome as YES or NO.
- Regression: For numeric/continuous variables, it estimates the outcome value of the target variable. Prediction of the Bike Rental Count in the near time, for instance.
Assumptions
- The whole training dataset is assumed as the value for the root node before the first iteration, i.e. before the model of decisions is specified.
- The values/attributes of the data are recursively distributed.
- It is believed that a mathematical approach is used to finalize the order in which the attribute is put as root/attribute node values.
Let’s implement
Having understood the working of Decision Trees, let us now implement the same in Python.
Let’s import the important libraries:
import os import pandas as pd import numpy as np from sklearn.model_selection import train_test_split import numpy as np from sklearn.tree import DecisionTreeRegressor
We have now specified a function to judge the error value of the Decision Tree’s output.
For this algorithm, we have used MAPE error metrics as this is an issue with regression evaluation.
def MAPE(Y_actual,Y_Predicted): Mape = np.mean(np.abs((Y_actual - Y_Predicted)/Y_actual))*100 return Mape
We segregate the dataset into two data sets consisting of independent and dependent variables, moving it forward. In addition, we use the train-test-split() function to split the dataset into data training and testing.
bike = pd.read_csv("Bike.csv") X = bike.drop(['cnt'],axis=1) Y = bike['cnt'] # Splitting the dataset into 80% training data and 20% testing data. X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)
Now is the time for the decision tree algorithm to be implemented. To build and train the model, we used the DecisionTreeRegressor() function.
In addition, using the predict() function, we predict the values of the target variable in the testing dataset.
After this, to get the accuracy of the model, we calculate the MAPE value (error value) and deduct it from 100.
DT_model = DecisionTreeRegressor(max_depth=5).fit(X_train,Y_train) DT_predict = DT_model.predict(X_test) #Predictions on Testing data DT_MAPE = MAPE(Y_test,DT_predict) Accuracy_DT = 100 - DT_MAPE print("MAPE: ",DT_MAPE) print('Accuracy of Decision Tree model: {:0.2f}%.'.format(Accuracy_DT)) print("Predicted values:\n",DT_predict)
Output:
MAPE:
18.076637888252062
Accuracy of Decision Tree model:
81.92
%
.
Predicted values:
[
3488.27906977
4623.46721311
2424.
4623.46721311
4883.19047619
6795.11
7686.75
1728.12121212
2602.66666667
2755.17391304
4623.46721311
4623.46721311
1728.12121212
6795.11
5971.14285714
1249.52
4623.46721311
6795.11
3488.27906977
5971.14285714
3811.63636364
2424.
1249.52
6795.11
6795.11
3936.34375
3978.5
1728.12121212
6795.11
1728.12121212
3488.27906977
6795.11
6795.11
5971.14285714
1728.12121212
4623.46721311
1249.52
3936.34375
3219.58333333
7686.75
4623.46721311
2679.5
6795.11
2755.17391304
6060.78571429
6795.11
4623.46721311
6795.11
6795.11
4623.46721311
4883.19047619
4623.46721311
2602.66666667
4623.46721311
7686.75
6795.11
7099.5
1728.12121212
6795.11
1249.52
4623.46721311
6795.11
4623.46721311
1728.12121212
4883.19047619
4623.46721311
6795.11
6795.11
4623.46721311
1728.12121212
4623.46721311
4623.46721311
1178.5
6795.11
4623.46721311
6795.11
6795.11
1728.12121212
3488.27906977
6795.11
5971.14285714
6060.78571429
4623.46721311
3936.34375
1249.52
1728.12121212
4676.33333333
3936.34375
4623.46721311
6795.11
4623.46721311
4623.46721311
6795.11
5971.14285714
4883.19047619
4883.19047619
3488.27906977
6795.11
6795.11
6795.11
3811.63636364
6795.11
6795.11
6060.78571429
4623.46721311
1728.12121212
4623.46721311
6795.11
6795.11
4623.46721311
1728.12121212
6795.11
1728.12121212
3219.58333333
1728.12121212
6795.11
1249.52
6795.11
2755.17391304
1728.12121212
6795.11
5971.14285714
1728.12121212
4623.46721311
4623.46721311
4676.33333333
6795.11
3936.34375
4883.19047619
3978.5
4623.46721311
4623.46721311
3488.27906977
6795.11
6795.11
3488.27906977
4623.46721311
3936.34375
2600.
3488.27906977
2755.17391304
6795.11
4883.19047619
3488.27906977
]