首页 Ensemble and Boosting

Ensemble and Boosting

Ensemble and Boosting Ensemble and Boosting Presented by Lunjun Wan University of Science & Technology of China 2 An ensemble is a group of learning models solving problems jointly Ensemble 3 Ensemble Design Individual precision Diversity ACC = 90% ACC = 90% ACC = 90...

Ensemble and Boosting Presented by Lunjun Wan University of Science & Technology of China 2 An ensemble is a group of learning models solving problems jointly Ensemble 3 Ensemble Design Individual precision Diversity ACC = 90% ACC = 90% ACC = 90% classifier 1 classifier 2 classifier 3 3 similar classifiers ACC = 50% ACC = 50% ACC = 50% classifier 1 classifier 2 classifier 3 3 random classifiers ACC = 90% ACC = 90% ACC = 90% classifier 1 classifier 2 classifier 3 3 similar classifiers ACC = 67% ACC = 67% ACC = 67% classifier 1 classifier 2 classifier 3 3 perfectly diverse classifiers Handling the accuracy/diversity trade-off is crucial 4 Boosting（Y. Freund, 1995） Bagging（L. Breiman, 1996） Negative Correlation Learning （X. Yao, 1996） Random Forest （L. Breiman, 2001） …… Ensemble Methods 5 Boosting Training Sample classifier C(0)(x) Weighted Sample re-weight classifier C(1)(x) Weighted Sample re-weight classifier C(2)(x) Weighted Sample re-weight Weighted Sample re-weight classifier C(3)(x) classifier C(m)(x) ClassifierN (i) i i y(x) w C (x)  6 AdaBoost Training Sample classifier C(0)(x) Weighted Sample re-weight classifier C(1)(x) Weighted Sample re-weight classifier C(2)(x) Weighted Sample re-weight Weighted Sample re-weight classifier C(3)(x) classifier C(m)(x) err err err 1 f with : f misclassified events f all events   ClassifierN (i) (i)err (i) i err 1 f y(x) log C (x) f         AdaBoost re-weights events misclassified by previous classifier by: AdaBoost weights the classifiers also using the error rate of the individual classifier according to: 7 AdaBoost 1. Start with weights 𝑤𝑖 = 1/𝑁 𝑖 = 1,… , 𝑁 2. Repeat for m=1,2,…,M: a) Fit the classifier 𝐺𝑚 𝑥 ∈ −1,1 using weights 𝑤𝑖 on the training data b) Compute 𝑒𝑟𝑟𝑚 = 𝐸𝑤 1𝑦≠𝐺𝑚 𝑥 = 𝑖=1 𝑁 𝑤𝑖𝐼(𝑦≠𝐺𝑚(𝑥𝑖)) 𝑖=1 𝑁 𝑤𝑖 , c) Compute 𝛼𝑚 = log 𝑙−𝑒𝑟𝑟𝑚 𝑒𝑟𝑟𝑚 d) 𝑤𝑖← 𝑤𝑖 ∗ exp 𝛼𝑚𝐼 𝑦 ≠ 𝐺𝑚 𝑥𝑖 , 𝑖 = 1,2,… ,𝑁 3. 𝐺 𝑥 = 𝑠𝑖𝑔𝑛[ 𝑚=1 𝑀 𝛼𝑚𝐺𝑚(𝑥)] 8 AdaBoost fits an additive model in a forward stagewise manner. “A Statistical View of Boosting” (Friedman et.al 2000) 9 Additive models  Suppose we have M basic functions 𝐹 𝑥 = 𝑚=1 𝑀 𝛽𝑚𝑏(𝑥; 𝛾𝑚)  Select specific loss function 𝐿(𝑦, 𝑓 𝑥 ) min {𝛽𝑚,𝛾𝑚} 1 𝑀 𝑖=1 𝑁 𝐿(𝑦𝑖 , 𝑚=1 𝑀 𝛽𝑚𝑏(𝑥𝑖; 𝛾𝑚)) It’s very hard to solve  10 Additive models But, if we use a single basis function once a time, then Much easier  min 𝛽,𝛾 𝑖=1 𝑁 𝐿(𝑦𝑖 , 𝛽𝑏(𝑥𝑖; 𝛾)) 1. Start with 𝑓0 𝑥 ← 0 2. Repeat for m=1,2,…,M: a) 𝛽𝑚, 𝛾𝑚 = argmin 𝛽,𝛾 𝑖=1 𝑁 𝐿(𝑦𝑖 , 𝑓𝑚−1 𝑥𝑖 + 𝛽𝑏(𝑥𝑖; 𝛾) ) b) 𝑓𝑚 𝑥 = 𝑓𝑚−1 𝑥 + 𝛽𝑚𝑏(𝑥𝑖; 𝛾𝑚) Forward Stagewise Additive Method 11 𝐿 𝑦, 𝑓 𝑥 = exp(−𝑦𝑓(𝑥)) If we take exponential loss like use 𝐺𝑚 𝑥 ∈ {−1,1} from AdaBoost as the basis function, rewrite Boosting fits an additive model (1) 1. Start with 𝑓0 𝑥 ← 0 2. Repeat for m=1,2,…,M: a) 𝛽𝑚, 𝐺𝑚 = argmin 𝛽,𝐺 𝑖=1 𝑁 exp[−𝑦𝑖(𝑓𝑚−1 𝑥𝑖 + 𝛽𝐺(𝑥𝑖))] b) 𝑓𝑚 𝑥 = 𝑓𝑚−1 𝑥 + 𝛽𝑚𝐺𝑚(𝑥) Forward Stagewise Additive Method 12  Define 𝑤𝑖 𝑚 = exp(−𝑦𝑖𝑓𝑚−1(𝑥𝑖)) Boosting fits an additive model (2) 𝛽𝑚, 𝐺𝑚 = argmin 𝛽,𝐺 𝑖=1 𝑁 exp[−𝑦𝑖(𝑓𝑚−1 𝑥𝑖 + 𝛽𝐺(𝑥𝑖))] = argmin 𝛽,𝐺 (𝑒𝛽−𝑒−𝛽) 𝑖=1 𝑁 𝑤𝑖 𝑚I(𝑦𝑖 ≠ 𝐺 𝑥𝑖 ) + 𝑒 −𝛽 𝑖=1 𝑁 𝑤𝑖 𝑚 = argmin 𝛽,𝐺 𝑒−𝛽 𝑦𝑖=𝐺(𝑥𝑖) 𝑤𝑖 𝑚 + 𝑒𝛽 𝑦𝑖≠𝐺(𝑥𝑖) 𝑤𝑖 𝑚 = argmin 𝛽,𝐺 𝑖=1 𝑁 𝑤𝑖 𝑚exp[−𝑦𝑖𝛽𝐺(𝑥𝑖))] 13 Boosting fits an additive model (3) 𝐺𝑚 = argmin 𝐺 𝑖=1 𝑁 𝑤𝑖 𝑚𝐼(𝑦𝑖 ≠ 𝐺(𝑥𝑖)) 𝛽𝑚 = 1 2 log 1 − 𝑒𝑟𝑟𝑚 𝑒𝑟𝑟𝑚 𝑒𝑟𝑟𝑚 = 𝑖=1 𝑁 𝑤𝑖𝐼(𝑦 ≠ 𝐺𝑚(𝑥𝑖)) 𝑖=1 𝑁 𝑤𝑖 𝑤𝑖 𝑚+1 = exp −𝑦𝑖𝑓𝑚 𝑥𝑖 = exp −𝑦𝑖 𝑓𝑚−1 𝑥𝑖 + 𝛽𝑚𝐺𝑚 𝑥𝑖 = 𝑤𝑖 𝑚exp −𝑦𝑖𝛽𝑚𝐺𝑚 𝑥𝑖 = 𝑤𝑖 𝑚exp 𝛼𝑚𝐼 𝑦𝑖 ≠ 𝐺𝑚 𝑥𝑖 ∗ 𝑒 −𝛽𝑚 where 𝛼𝑚 = 2𝛽𝑚 𝑛𝑜𝑡𝑖𝑐𝑒: −𝑦𝑖𝐺𝑚 𝑥𝑖 = 2𝐼 𝑦𝑖 ≠ 𝐺𝑚 𝑥𝑖 − 1 AdaBoost builds an additive model for minimizing exp(−𝑦𝑓(𝑥)) 14 Gradient Boosting Machine (1)  Considering gradient descent method  using {𝑥𝑖 , 𝑔𝑚(𝑥𝑖)} to fit the gradient over data distribution 15 Distribution Loss Function −𝝏𝑳(𝒚𝒊, 𝑭(𝒙𝒊))/𝝏𝑭(𝒙𝒊) Least-squares 1 2 [𝑦𝑖 − 𝐹(𝑥𝑖)] 2 𝑦𝑖 − 𝐹(𝑥𝑖) LAD |𝑦𝑖 − 𝐹(𝑥𝑖)| 𝑠𝑖𝑔𝑛[𝑦𝑖 − 𝐹(𝑥𝑖)] Binominal log-likelihood log(1 + exp(−2𝑦𝑖𝐹(𝑥𝑖))) 2𝑦𝑖 1 + exp 2𝑦𝑖𝐹 𝑥𝑖 AdaBoost exp(−𝑦𝑖𝐹(𝑥𝑖)) 𝑦𝑖exp(−𝑦𝑖𝐹(𝑥𝑖))  Gradient Boost is a way to implement “boosting” with arbitrary “loss functions” by approximating “somehow” the gradient of the loss function  Loss functions Gradient Boosting Machine (2)

                    本文档为【Ensemble and Boosting】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

Ensemble and Boosting

你可能还喜欢