[Vernacular machine learning] Algorithm theory + actual combat AdaBoost algorithm

[Vernacular machine learning] Algorithm theory + actual combat AdaBoost algorithm

1. Write on the front

If you want to work in data mining or machine learning, it is necessary to master commonly used machine learning algorithms. Common machine learning algorithms:

In order to understand these principles in detail, I have read watermelon books, statistical learning methods, machine learning practical combat and other books, and have also listened to some machine learning courses, but I always feel that the words are rather esoteric, I am impatient to read, and theories are everywhere. Actual combat is the most important , so I want to write a vernacular machine learning algorithm theory + actual combat series in the most simple and understandable language .

I personally think that understanding the idea and usage behind the algorithm is more important than understanding its mathematical derivation. Idea will give you an intuitive feeling and understand the rationality of the algorithm. Mathematical derivation is just to express this rationality in more rigorous language. For example, a pear is very sweet and can be expressed in mathematical language. The sugar content is 90%, but only by taking a bite yourself can you really feel how sweet this pear is, and you can really understand what mathematics 90% sugar is like. If the algorithm is a pear, the primary purpose of this article is to lead everyone to take a bite. There are also the following purposes:

  • Test your understanding of the algorithm and make a brief summary of the algorithm theory
  • Be able to happily learn the core ideas of these algorithms, find interest in learning these algorithms, and lay a foundation for in-depth learning of these algorithms.
  • The theory of each class will include a practical case, which can truly apply what you have learned. It can not only exercise programming ability, but also deepen the grasp of algorithm theory.
  • I also want to put all the previous notes and references together for the convenience of later viewing.

In the process of learning algorithms, you should not only get algorithm theory, but also have fun and the ability to solve practical problems!

Today is the fourth part of vernacular machine learning algorithm theory + actual combat, the AdaBoost algorithm, which is an integrated method. Through today's learning, the principle of getting AdaBoost quickly, and finally using the AdaBoost algorithm to realize the prediction of Boston housing prices.

Nowadays, integrated algorithms are generally used. This is very powerful, such as the popular xgboost, lightgbm and so on. After learning this, you will find what decision tree, KNN's are all younger brothers.

AdaBoost is not used much now, but why should I learn it? Because these are the basics, traditional machine learning algorithms are not so commonly used now. We still have to learn, and do not learn with utilitarianism. With these foundations, we can master more advanced algorithms faster, like xgboost , Lightgbm, catboost, etc. If you don t even know what a decision tree is, how can you learn this knowledge? The so-called never-changing is inseparable from its ancestry, that is the truth. To learn, the emphasis is on learning the way of thinking, and the emphasis is on keeping the same in response to ever-changing .

The outline is as follows :

  • The working principle of AdaBoost (three heads, the best Zhuge Liang)
  • Examples of AdaBoost (through examples, quickly understand the principle)
  • AdaBoost's actual combat: Predicting Boston housing prices and comparing it with younger brother's algorithm

OK, let's go!

2. Adaboost? Let's start with that proverb first!

Before I talk about AdaBoost, let's first understand a story:

There is an article in the elementary school Chinese textbook titled "3.Heads Meet Zhuge Liang". The article wrote that Zhuge Liang led troops across the river, the river was turbulent, and there were mostly reefs protruding from the water. Ordinary bamboo rafts and boats are very difficult to pass. The first boats were washed away by the water and sank. Zhuge Liang was helpless and could not figure out a good way. 3.leather craftsmen came to offer advice at night. Tell Zhuge Liang to buy the cow, and then peel the cow off the entire belly, seal the cut and let the soldiers blow into it to make a cowhide raft. Such a raft is not afraid of collision. Zhuge Liang tried this method and crossed the river smoothly.


This is the story of "3.heads are worthy of Zhuge Liang". Why tell this story first? One is that I am afraid that I will be full of official words as soon as I come up, and a meal of mathematical formulas will dispel the interest in learning. The second is that this story tells us a truth: brainstorming and learning from others . This is the meaning of integration.

In today's society, it is very difficult to accomplish things with the power of one person alone. Isn't it all about teamwork? The same is writing a piece of software. If a team does it, even if the staff are all students, they are studying now, but if everyone has a different division of labor and is responsible for their own part, I believe that this task can be done soon, but if it is a person To write, even a big cow is tired to death, so teamwork brings us not only efficiency, but also time, and time is money. (Take it far, pull it back, let s talk about integration)

The meaning of integration, as we said above, is to brainstorm and draw on the best of others. When we make a decision, we first listen to the opinions of multiple experts before making a decision.

There are usually two types of ensemble algorithms: bagging and boosting.

  • The scene of voting is similar to calling experts to a conference table. When making a decision, let K experts (K models) classify (make a decision) respectively, and then choose the class (decision) with the most occurrences as The final classification result. (Have you heard of the great random forest, it is to train a lot of trees, the few obey the majority)
  • Re-learning is equivalent to weighted fusion of K experts (K classifiers) to form a new super expert (strong classifier), and let this super expert make judgments. (And the great AdaBoost is this way)

Pay attention to the difference between bagging and boosting: according to the above description,

  • The meaning of Boosting is to improve. Its function is to improve the previous training every time you train. During the training process, these K "experts" are dependent on each other. When the Kth "expert" is introduced "Experts" (the Kth classifier) is actually an optimization of the top K-1 experts.
  • While bagging can be calculated in parallel when doing voting, that is, K "experts" are independent of each other when making judgments, and there is no dependence.


3. How AdaBoost works

(The principles involved may be a bit high-powered. It is inevitable that there will be a few formulas and mandarins, but don t be afraid, AdaBoost is still very easy to understand. If you are afraid of this, how can you build Zhuge Liang? Next, use ordinary soldiers Create a character like Zhuge Liang, and you will earn it if you meet. Okay, let s talk about business.)

The AdaBoost algorithm is a way of relearning. The English full name is Adaptive Boosting, and the Chinese meaning is adaptive boosting algorithm. It was proposed by Freund et al. in 1995 and is an implementation of the Boosting algorithm.

What is the Boosting algorithm? Boosting algorithm is one of the integrated algorithms, and it is also the general term for a class of algorithms. This type of algorithm trains multiple weak classifiers and combines them into a strong classifier, which is what we say "three heads, one Zhuge Liang". Why do you do that? Because heads are easy to train, Zhuge Liang is hard to ask for. Therefore, to build a Zhuge Liang, the best way is to train multiple heads, and then let these heads combine, which often can get very good results. This is the principle of the Boosting algorithm. Looking at the above picture, we can use multiple weak classifiers to combine a strong classifier, so there is a problem? How to combine it? What's the basis? Looking at the picture, you will find that these weak classifiers are combined according to different weights.

Assuming that the weak classifier is Gi(x) and its weight in the strong classifier is i, then the strong classifier f(x) can be obtained: See, this is how Zhuge Liang came from. Many soldiers have different importance. It is weighted and then added. Then there are two problems:

  1. How to get these weak classifiers (soldiers), that is, how to get the optimal weak classifiers (soldiers) during each iteration of training?
  2. How is the weight of each weak classifier (soldier) calculated?

Let's take a look at the second question first, how to calculate the weight? Then the first feeling is that who is doing well, the higher the weight. Haha, it's really like this

In fact, in a strong classifier composed of K weak classifiers, if the classification effect of the weak classifier is good, then the weight should be relatively large, if the classification effect of the weak classifier is average, the weight should be reduced. So we need to determine its weight based on the classification error rate of the sample by this weak classifier. The formula is: where ei represents the classification error rate of the i-th classifier. 

Don't worry about how this formula comes from, just know that this formula can guarantee that the higher the classification error rate of the classifier, the greater the corresponding weight. Specific formula derivation (see my link below, AdaBoost of statistical learning methods)


Then let's look at the first question, how to choose the best weak classifier during each training iteration?

Adaboost is achieved by changing the data distribution of the sample. AdaBoost will determine whether the sample for each training is correctly classified. For the correctly classified sample, its weight is reduced, and for the incorrectly classified sample, its weight is increased. Based on the classification accuracy obtained last time, the weight of each sample in this training sample is determined. Then the new data set with modified weights is passed to the classifier of the next layer for training. The advantage of this is that through the dynamic weight of each round of training samples, the focus of training can be focused on samples that are difficult to classify, and the resulting combination of weak classifiers is easier to obtain higher classification accuracy.


This is how the process understanding is. At the beginning of my training samples, there will always be a probability distribution, that is, weights. For example, for n samples, I assume that the weight of each sample is 1/n, which means that it is equally important, but after we train a classifier A, if this classifier A can correctly classify the previous samples, it means these The samples that are correctly classified can be handled by A. When we train classifier B in the next round, we don t need too much attention. Let B pay more attention to the samples that are misclassified by A? How to do this? Then reduce the weight of samples that are correctly classified in A, and increase the weight of samples that are incorrectly classified . In this way, when B is training, he can pay more attention to these wrong samples, because once these samples are classified incorrectly, the loss will skyrocket (heavier weight). In order to reduce the loss, B will classify as much as possible These samples are not separated by A, and the problem is solved. So what if the trained B is already very good, the error is very small, and there are still things that can't be distinguished? For the same reason, increase the weight of these and hand it over to C in the next round . Each round of classifiers has its own expertise.

Now that the vernacular is finished, let s see how to calculate the weight of each sample:

We can use Dk+1 to represent the weight set of samples in the k+1 round of training, where Wk+1,1 represents the weight of the first sample in the k+1 round, and so on, Wk+1, N represents the weight of the first sample in the k+1 round. The weight of the Nth sample in the k+1 round, so the formula is expressed as: The weight of the sample in the k+1 round is determined by the weight of the sample in the kth round and the accuracy of the kth classifier. The specific formula is: seeing this formula is awkward, or that sentence, don't worry about the formula, just know that this formula guarantees that if the current classifier classifies the sample incorrectly, then the w of the sample will be If the classification is correct, w will decrease. Here Zk is the normalization coefficient. Is (wk,i exp(- kyiGk(xi))


Seeing this, if you still don t understand how AdaBoost is calculated, take a look at the following example to ensure you are refreshed!

4. AdaBoost algorithm example

Before looking at the example, let's recall two problems in AdaBoost:

  1. How to get these weak classifiers (soldiers), that is, how to get the optimal weak classifiers (soldiers) during each iteration of training? --- Change the weight of the sample or call the data distribution
  2. How is the weight of each weak classifier (soldier) calculated? --- Pass the error rate and the formula

Well, look at the following example, suppose there are 10 training samples: I want to build a strong classifier (by Zhuge Liang) through AdaBoost, how do I do it? Simulate it:

  • First of all, I have to divide the importance of these 10 samples, that is, the weight. Since it is the beginning, it is equal, which is 1/10. That is, the initial weight D1=(0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1). Suppose the three basic classifiers I trained are as follows: Of course, this is one iterative training one at a time. In order to explain this process, there are these three first.
  • Then, we carry out the first round of training , we can know:

The error rate of classifier f1 is 0.3, that is, the classification is wrong when x takes the value of 6, 7,
and 8. The error rate of classifier f2 is 0.4, that is, the classification is wrong when x takes the value of 0, 1, 2, and 9; the
classifier f3 The error rate of is 0.3, that is, the classification is wrong when the value of x is 3, 4, and 5. According to the minimum error rate, I trained a classifier as follows (choose f1): The error rate of this classifier is 0.3 (the classification error is when x takes the value of 6, 7, 8), which is the lowest error rate (how to train It can be trained with a decision tree), that is, e1 = 0.3


  • Then get the weight of the first weak classifier according to the weight formula:


  • Then, we have to update the weights of our training samples according to this classifier

According to this formula, the weight matrix can be calculated as: D2=(0.0715, 0.0715, 0.0715, 0.0715, 0.0715, 0.0715, 0.1666, 0.1666, 0.1666, 0.0715).

You will find that the weights of the 6, 7, 8 samples become larger, and the other weights become smaller (this means that when the next classifier is trained, focus on the three samples 6, 7, 8)


  • Then we carry out the second round of training , continue to count the accuracy of the three classifiers, we can get:

The error rate of the classifier f1 is 0.1666 * 3, that is, the classification error occurs when the value of x is 6, 7, or 8. The error rate of the classifier f2 is 0.0715 * 4, that is, the classification error occurs when the value of x is 0, 1, 2, or 9. The error rate of the classifier f3 is 0.0715 * 3, that is, the classification error occurs when x takes the value 3, 4, or 5.
Among the three classifiers, the error rate of the f3 classifier is the lowest, so we choose f3 as the optimal classifier for the second round of training, that is, according to the classifier weight formula:


  • Similarly, we calculate the weight value for the next round of sample update

You can get D3=(0.0455, 0.0455, 0.0455, 0.1667, 0.1667, 0.01667, 0.1060, 0.1060, 0.1060, 0.0455).

You will find that the weights of the three samples 3, 4, and 5 that are incorrectly classified by G2 have become larger, indicating that the next round of the classifier focuses on the upper three samples.


  • Next we start the third round of training , we continue to count the accuracy of the three classifiers, we can get

The error rate of the classifier f1 is 0.1060 * 3, that is, the classification is wrong when x takes the value of 6, 7, or 8.
The error rate of the classifier f2 is 0.0455 * 4, that is, the classification error occurs when the value of x is 0, 1, 2, or 9.
The error rate of the classifier f3 is 0.1667 * 3, that is, the classification error occurs when x takes values 3, 4, and 5.
Among these three classifiers, the error rate of the f2 classifier is the lowest, so we choose f2 as the optimal classifier for the third round of training, that is: we get according to the classifier weight formula:


Suppose we only conduct 3 rounds of training, select 3 weak classifiers, and combine them into a strong classifier, then the final strong classifier

G(x) = 0.4236G1(x) + 0.6496G2(x)+0.7514G3(x).


In this way, we got the Zhuge Liang we wanted. Did you discover it? In fact, this process is not difficult, a simple combing is:

  1. Determine the weight of the initial sample, then train the classifier, select the classifier according to the smallest error, get the error rate, and calculate the weight of the classifier
  2. Then recalculate the weight of the sample according to the error of the classifier
  3. For the next round of training, if you don't stop, repeat the above process.

Understand this is actually a problem of using the enemy to make your own soldiers stronger. Suppose there are 10 people in the enemy, and 5 people on my side (training for 5 rounds).
First of all, I asked these 5 people to fight the 10 separately, and select the most powerful one as the first round of classifier. Then among the 10 enemies he can fight, the importance can be reduced, and the focus is on the research that he can t fight. Those people s routines are

then trained, so that the second person selected can deal with some enemies that the first person can t defeat.
In the same way, we will focus on those people who cannot be beaten by the second person later, let the third person fight, and continue slowly until the end.
In this way, you will find that, although these five people only come up with one, when my enemy's 10 heads-ups, no one can win 10 games, but the combination of these 5 people can win the 10 games. Bureau.


This is how three heads can beat Zhuge Liang. No matter how good Zhuge Liang is, his level is that he can win 10 rounds of heads-up, and I use five ordinary soldiers, after 5 rounds of training, this combination can also win. 10 games, and the training cost of the latter is much lower than the training cost of a Zhuge Liang.

This is the core of AdaBoost's thinking.

5. AdaBoost in action: predicting housing prices

After understanding the principle of the algorithm, the key is actual combat. First know how to use the AdaBoost tool.

5.1 sklearn's AdaBoost tool

We can use AdaBoost directly in sklearn. If we want to use AdaBoost for classification, we need to quote the code before use:

sklearn.ensemble from Import AdaBoostClassifier copy the code

If you see the Classifier class, it generally corresponds to the Regressor class. AdaBoost is no exception. The reference code of the regression toolkit is as follows:

sklearn.ensemble from Import AdaBoostRegressor copy the code

Here is an introduction to creating an AdaBoost classifier:

  • When classifying, such a function is needed:
AdaBoostClassifier (base_estimator = None, n_estimators = 50 , learning_rate = 1.0 , algorithm = 'SAMME.R', random_state = None) to copy the code

Let's take a look at the meaning of these parameters:

  1. base_estimator: represents a weak classifier. This parameter is available in AdaBoost's classifier and regressor. The decision tree is used by default in AdaBoost. Generally, we don't need to modify this parameter. Of course, you can also specify a specific classifier.
  2. n_estimators: The maximum number of iterations of the algorithm, which is also the number of classifiers. Each iteration will introduce a new weak classifier to increase the combination ability of the original classifier. The default is 50.
  3. learning_rate: Represents the learning rate, the value is between 0-1, and the default is 1.0. If the learning rate is small, more iterations are needed to converge, which means that the learning rate and the number of iterations are correlated. When you adjust learning_rate, you often need to adjust the parameter n_estimators.
  4. algorithm: It represents which boosting algorithm we want to use. There are two choices: SAMME and SAMME.R. The default is SAMME.R. The difference between the two lies in the calculation of the weak classification weight.
  5. random_state: Represents the setting of the random number seed, the default is None. The random seed is used to control the random pattern. When the random seed takes a value, a random rule is determined. Other people can get the same result by taking this value. If you do not set the random seed, the random number you get will be different each time.


  • How to create AdaBoost regression?
AdaBoostRegressor (base_estimator = None, n_estimators = 50 , learning_rate = 1.0 , Loss = 'Linear', random_state = None) to copy the code

The parameters of regression and classification are basically the same. The difference is that there is no algorithm parameter in the regression algorithm, but there is an additional loss parameter.
Loss represents the setting of the loss function. There are 3 choices, namely linear, square and exponential. Their meanings are linear, square and exponential respectively. The default is linear. Generally, good results can be obtained by using linearity.


After creating the AdaBoost classifier or regressor, we can input the training set to train it.

  • We use the fit function to pass in the sample feature value train_X and the result train_y in the training set, and the model will be automatically fitted.
  • Use the predict function to make predictions, pass in the sample feature value test_X in the test set, and then you can get the prediction result.

5.2 How to predict house prices with AdaBoost

We use the Boston housing price data set that comes with sklearn and use AdaBoost to predict housing prices:

First is the data set

This data set includes a total of 506 pieces of housing information data, and each piece of data includes 13 indicators and a housing price.
For the meaning of the 13 indicators, you can refer to the following table:


Processing ideas (or the previous processing routine):

First load the data, divide the data into training set and test set, then create an AdaBoost regression model, pass in the training set data for fitting, and then pass in the test set data for prediction, you can get the prediction result. Finally, the predicted result is compared with the actual result, and the error between the two is obtained.


code show as below:

from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error from sklearn.datasets import load_boston from sklearn.ensemble import AdaBoostRegressor # Download Data data=load_boston() # Split data train_x, test_x, train_y, test_y = train_test_split(data.data, data.target, test_size = 0.25 , random_state = 33 ) # Use AdaBoost regression model regressor=AdaBoostRegressor() regressor.fit(train_x,train_y) pred_y = regressor.predict(test_x) mse = mean_squared_error(test_y, pred_y) Print ( "Rate predictors" , pred_y) Print ( "the mean square error =" , round (MSE, 2 )) copying the code

operation result:

Rate predictor [ 20.2 10.4137931 14.63820225 17.80322581 24.58931298 21.25076923 27.52222222 17.8372093 31.79642857 20.86428571 27.87431694 31.09142857 12.81666667 24.13131313 12.81666667 24.58931298 17.80322581 17.66333333 27.83 24.58931298 17.66333333 20.90823529 20.10555556 20.90823529 28.20877193 20.10555556 21.16882129 24.58931298 13.27619048 31.09142857 17.08095238 26.19217391 9.975 21.03404255 26.74583333 31.09142857 25.83960396 11.859375 13.38235294 24.58931298 14.97931034 14.46699029 30.12777778 17.66333333 26.19217391 20.10206186 17.70540541 18.45909091 26.19217391 20.10555556 17.66333333 33.31025641 14.97931034 17.70540541 24.64421053 20.90823529 25.83960396 17.08095238 24.58931298 21.43571429 19.31617647 16.33733333 46.04888889 21.25076923 17.08095238 25.83960396 24.64421053 11.81470588 17.80322581 27.63636364 23.59731183 17.94444444 17.66333333 27.7253886 20.21465517 46.04888889 14.97931034 9.975 17.08095238 24.13131313 21.03404255 13.4 11.859375 26.19214286 21.25076923 21.03404255 47.11395349 16.33733333 43.21111111 31.65730337 30.12777778 20.10555556 17.8372093 18.40833333 14.97931034 33.31025641 24.58931298 22.88813559 18.27179487 17.80322581 14.63820225 21.16882129 26.91538462 24.64421053 13.05 14.97931034 9.975 26.19217391 12.81666667 26.19214286 49.46511628 13.27619048 17.70540541 25.83960396 31.09142857 24.13131313 21.25076923 21.03404255 26.91538462 21.03404255 21.16882129 17.8372093 12.81666667 21.03404255 21.03404255 17.08095238 45.16666667 ] Mean square error = 18.05 duplicated code

Let s compare the performance of my younger brother (decision tree and KNN)

# Use decision tree regression model dec_regressor=DecisionTreeRegressor() dec_regressor.fit(train_x,train_y) pred_y = dec_regressor.predict(test_x) mse = mean_squared_error(test_y, pred_y) print ( "Decision tree mean square error = " , round(mse, 2 )) # Use KNN regression model knn_regressor=KNeighborsRegressor() knn_regressor.fit(train_x,train_y) pred_y = knn_regressor.predict(test_x) mse = mean_squared_error(test_y, pred_y) Print ( "the KNN mean square error =" , round (MSE, 2 )) copying the code

operation result:

Tree mean square error = 23.84 the KNN mean square error = 27.87 duplicated code

Here you will find that the mean square error of AdaBoost is smaller, that is, the result is better. Although AdaBoost uses weak classifiers, the strong classifiers formed by combining 50 or more weak classifiers are better than other algorithms in many cases. Therefore, AdaBoost is also one of the commonly used classification and regression algorithms.

5.3 Comparison of AdaBoost and Decision Tree Model

In sklearn, AdaBoost uses a decision tree model by default. We can randomly generate some data, and then compare the classification accuracy of the weak classifier in AdaBoost (that is, the decision tree weak classifier), the decision tree classifier and the AdaBoost model Performance.

If you want to generate data randomly, we can use the make_hastie_10_2 function in sklearn to generate binary data. Suppose we generate 12000 data, take the first 2000 as the test set, and the rest as the training set.


Let's look at the code and results directly, and then experience the power of AdaBoost:

import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.metrics import zero_one_loss from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import AdaBoostClassifier # Set the number of AdaBoost iterations n_estimators= 200 # Use X,y=datasets.make_hastie_10_2(n_samples = 12000 ,random_state= 1 ) # Take the front from 12000 data 2000 rows as the test set, and the rest as the training set train_x, train_y = X[ 2000 :],y[ 2000 :] test_x, test_y = X[: 2000 ],y[: 2000 ] # Weak classifier dt_stump = DecisionTreeClassifier(max_depth = 1 ,min_samples_leaf = 1 ) dt_stump.fit(train_x, train_y) dt_stump_err = 1.0 -dt_stump.score(test_x, test_y) # Decision tree classifier dt = DecisionTreeClassifier() dt.fit(train_x, train_y) dt_err = 1.0 -dt.score(test_x, test_y) # AdaBoost classifier ada = AdaBoostClassifier(base_estimator=dt_stump,n_estimators=n_estimators) ada.fit(train_x, train_y) # Visualization of the error rate of the three classifiers fig = plt.figure() # Set plt to display Chinese correctly plt.rcParams[ 'font.sans-serif' ] = [ 'SimHei' ] ax = fig.add_subplot( 111 ) ax.plot([ 1 ,n_estimators],[dt_stump_err]* 2 , 'k-' , label= u'decision tree weak classifier error rate' ) ax.plot([ 1 ,n_estimators],[dt_err]* 2 , 'k--' , label= u'decision tree model error rate' ) ada_err = np.zeros((n_estimators,)) # The result of traversing each iteration i is the number of iterations, and pred_y is the predicted result for i,pred_y in enumerate(ada.staged_predict(test_x)): # Statistical error rate ada_err[i]=zero_one_loss(pred_y, test_y) # Plot the AdaBoost error rate of each iteration ax.plot(np.arange(n_estimators)+ 1 , ada_err, label = 'AdaBoost Test error rate' , color = 'orange' ) ax.set_xlabel( 'Number of iterations' ) ax.set_ylabel( 'error rate' ) leg=ax.legend(loc = 'upper right' ,fancybox=True) plt.show() Copy code

Running results: As you can see from the figure, the weak classifier has the highest error rate, which is only slightly better than the random classification result, and the accuracy rate is slightly greater than 50%. The error rate of the decision tree model is obviously much lower. However, the error rate of the AdaBoost model has decreased significantly after the number of iterations exceeds 25 times, and the change of the error rate tends to be flat after 125 iterations.

Therefore, we can see that although a single decision tree weak classifier is not effective, the AdaBoost classifier formed by combining multiple decision tree weak classifiers has a better classification effect than the decision tree model.

6. Summary

Today, I learned the AdaBoost algorithm, from the principle of integration to AdaBoost to the final small actual combat, all through it, through today's study, we will find that the integration algorithm is powerful and low cost. Nowadays, many applications use the integration technology, AdaBoost is not used much now, whether it is playing games or daily applications, I like to use xgboost, lightgbm, catboost these algorithms. Of course, despite the in-depth study, these algorithms will certainly come out in the vernacular. But before coming out, let's first understand the principle of AdaBoost, so that it is easy to compare, and the comparison, the more impressive.

Okay, let's stop here for the story of the head cobbler and Zhuge Liang.


Wonderful review of past issues Route and data download suitable for beginners to get started with artificial intelligence. Machine learning online manual Deep learning online manual AI basic download (pdf updated to 25 episodes) qq group 1003271085 on this site , join the WeChat group, please reply to "add group" to get a discount on the knowledge of this site planet coupons, please reply "knowledge planet" like articles, a point in looking copy the code