首页 Ensemble and Boosting

Ensemble and Boosting

举报
开通vip

Ensemble and Boosting Ensemble and Boosting Presented by Lunjun Wan University of Science & Technology of China 2 An ensemble is a group of learning models solving problems jointly Ensemble 3 Ensemble Design Individual precision Diversity ACC = 90% ACC = 90% ACC = 90...

Ensemble and Boosting
Ensemble and Boosting Presented by Lunjun Wan University of Science & Technology of China 2 An ensemble is a group of learning models solving problems jointly Ensemble 3 Ensemble Design Individual precision Diversity ACC = 90% ACC = 90% ACC = 90% classifier 1 classifier 2 classifier 3 3 similar classifiers ACC = 50% ACC = 50% ACC = 50% classifier 1 classifier 2 classifier 3 3 random classifiers ACC = 90% ACC = 90% ACC = 90% classifier 1 classifier 2 classifier 3 3 similar classifiers ACC = 67% ACC = 67% ACC = 67% classifier 1 classifier 2 classifier 3 3 perfectly diverse classifiers Handling the accuracy/diversity trade-off is crucial 4 Boosting(Y. Freund, 1995) Bagging(L. Breiman, 1996) Negative Correlation Learning (X. Yao, 1996) Random Forest (L. Breiman, 2001) …… Ensemble Methods 5 Boosting Training Sample classifier C(0)(x) Weighted Sample re-weight classifier C(1)(x) Weighted Sample re-weight classifier C(2)(x) Weighted Sample re-weight Weighted Sample re-weight classifier C(3)(x) classifier C(m)(x) ClassifierN (i) i i y(x) w C (x)  6 AdaBoost Training Sample classifier C(0)(x) Weighted Sample re-weight classifier C(1)(x) Weighted Sample re-weight classifier C(2)(x) Weighted Sample re-weight Weighted Sample re-weight classifier C(3)(x) classifier C(m)(x) err err err 1 f with : f misclassified events f all events   ClassifierN (i) (i)err (i) i err 1 f y(x) log C (x) f         AdaBoost re-weights events misclassified by previous classifier by: AdaBoost weights the classifiers also using the error rate of the individual classifier according to: 7 AdaBoost 1. Start with weights 𝑤𝑖 = 1/𝑁 𝑖 = 1,… , 𝑁 2. Repeat for m=1,2,…,M: a) Fit the classifier 𝐺𝑚 𝑥 ∈ −1,1 using weights 𝑤𝑖 on the training data b) Compute 𝑒𝑟𝑟𝑚 = 𝐸𝑤 1𝑦≠𝐺𝑚 𝑥 = 𝑖=1 𝑁 𝑤𝑖𝐼(𝑦≠𝐺𝑚(𝑥𝑖)) 𝑖=1 𝑁 𝑤𝑖 , c) Compute 𝛼𝑚 = log 𝑙−𝑒𝑟𝑟𝑚 𝑒𝑟𝑟𝑚 d) 𝑤𝑖← 𝑤𝑖 ∗ exp 𝛼𝑚𝐼 𝑦 ≠ 𝐺𝑚 𝑥𝑖 , 𝑖 = 1,2,… ,𝑁 3. 𝐺 𝑥 = 𝑠𝑖𝑔𝑛[ 𝑚=1 𝑀 𝛼𝑚𝐺𝑚(𝑥)] 8 AdaBoost fits an additive model in a forward stagewise manner. “A Statistical View of Boosting” (Friedman et.al 2000) 9 Additive models  Suppose we have M basic functions 𝐹 𝑥 = 𝑚=1 𝑀 𝛽𝑚𝑏(𝑥; 𝛾𝑚)  Select specific loss function 𝐿(𝑦, 𝑓 𝑥 ) min {𝛽𝑚,𝛾𝑚} 1 𝑀 𝑖=1 𝑁 𝐿(𝑦𝑖 , 𝑚=1 𝑀 𝛽𝑚𝑏(𝑥𝑖; 𝛾𝑚)) It’s very hard to solve  10 Additive models But, if we use a single basis function once a time, then Much easier  min 𝛽,𝛾 𝑖=1 𝑁 𝐿(𝑦𝑖 , 𝛽𝑏(𝑥𝑖; 𝛾)) 1. Start with 𝑓0 𝑥 ← 0 2. Repeat for m=1,2,…,M: a) 𝛽𝑚, 𝛾𝑚 = argmin 𝛽,𝛾 𝑖=1 𝑁 𝐿(𝑦𝑖 , 𝑓𝑚−1 𝑥𝑖 + 𝛽𝑏(𝑥𝑖; 𝛾) ) b) 𝑓𝑚 𝑥 = 𝑓𝑚−1 𝑥 + 𝛽𝑚𝑏(𝑥𝑖; 𝛾𝑚) Forward Stagewise Additive Method 11 𝐿 𝑦, 𝑓 𝑥 = exp(−𝑦𝑓(𝑥)) If we take exponential loss like use 𝐺𝑚 𝑥 ∈ {−1,1} from AdaBoost as the basis function, rewrite Boosting fits an additive model (1) 1. Start with 𝑓0 𝑥 ← 0 2. Repeat for m=1,2,…,M: a) 𝛽𝑚, 𝐺𝑚 = argmin 𝛽,𝐺 𝑖=1 𝑁 exp[−𝑦𝑖(𝑓𝑚−1 𝑥𝑖 + 𝛽𝐺(𝑥𝑖))] b) 𝑓𝑚 𝑥 = 𝑓𝑚−1 𝑥 + 𝛽𝑚𝐺𝑚(𝑥) Forward Stagewise Additive Method 12  Define 𝑤𝑖 𝑚 = exp(−𝑦𝑖𝑓𝑚−1(𝑥𝑖)) Boosting fits an additive model (2) 𝛽𝑚, 𝐺𝑚 = argmin 𝛽,𝐺 𝑖=1 𝑁 exp[−𝑦𝑖(𝑓𝑚−1 𝑥𝑖 + 𝛽𝐺(𝑥𝑖))] = argmin 𝛽,𝐺 (𝑒𝛽−𝑒−𝛽) 𝑖=1 𝑁 𝑤𝑖 𝑚I(𝑦𝑖 ≠ 𝐺 𝑥𝑖 ) + 𝑒 −𝛽 𝑖=1 𝑁 𝑤𝑖 𝑚 = argmin 𝛽,𝐺 𝑒−𝛽 𝑦𝑖=𝐺(𝑥𝑖) 𝑤𝑖 𝑚 + 𝑒𝛽 𝑦𝑖≠𝐺(𝑥𝑖) 𝑤𝑖 𝑚 = argmin 𝛽,𝐺 𝑖=1 𝑁 𝑤𝑖 𝑚exp[−𝑦𝑖𝛽𝐺(𝑥𝑖))] 13 Boosting fits an additive model (3) 𝐺𝑚 = argmin 𝐺 𝑖=1 𝑁 𝑤𝑖 𝑚𝐼(𝑦𝑖 ≠ 𝐺(𝑥𝑖)) 𝛽𝑚 = 1 2 log 1 − 𝑒𝑟𝑟𝑚 𝑒𝑟𝑟𝑚 𝑒𝑟𝑟𝑚 = 𝑖=1 𝑁 𝑤𝑖𝐼(𝑦 ≠ 𝐺𝑚(𝑥𝑖)) 𝑖=1 𝑁 𝑤𝑖 𝑤𝑖 𝑚+1 = exp −𝑦𝑖𝑓𝑚 𝑥𝑖 = exp −𝑦𝑖 𝑓𝑚−1 𝑥𝑖 + 𝛽𝑚𝐺𝑚 𝑥𝑖 = 𝑤𝑖 𝑚exp −𝑦𝑖𝛽𝑚𝐺𝑚 𝑥𝑖 = 𝑤𝑖 𝑚exp 𝛼𝑚𝐼 𝑦𝑖 ≠ 𝐺𝑚 𝑥𝑖 ∗ 𝑒 −𝛽𝑚 where 𝛼𝑚 = 2𝛽𝑚 𝑛𝑜𝑡𝑖𝑐𝑒: −𝑦𝑖𝐺𝑚 𝑥𝑖 = 2𝐼 𝑦𝑖 ≠ 𝐺𝑚 𝑥𝑖 − 1 AdaBoost builds an additive model for minimizing exp(−𝑦𝑓(𝑥)) 14 Gradient Boosting Machine (1)  Considering gradient descent method  using {𝑥𝑖 , 𝑔𝑚(𝑥𝑖)} to fit the gradient over data distribution 15 Distribution Loss Function −𝝏𝑳(𝒚𝒊, 𝑭(𝒙𝒊))/𝝏𝑭(𝒙𝒊) Least-squares 1 2 [𝑦𝑖 − 𝐹(𝑥𝑖)] 2 𝑦𝑖 − 𝐹(𝑥𝑖) LAD |𝑦𝑖 − 𝐹(𝑥𝑖)| 𝑠𝑖𝑔𝑛[𝑦𝑖 − 𝐹(𝑥𝑖)] Binominal log-likelihood log(1 + exp(−2𝑦𝑖𝐹(𝑥𝑖))) 2𝑦𝑖 1 + exp 2𝑦𝑖𝐹 𝑥𝑖 AdaBoost exp(−𝑦𝑖𝐹(𝑥𝑖)) 𝑦𝑖exp(−𝑦𝑖𝐹(𝑥𝑖))  Gradient Boost is a way to implement “boosting” with arbitrary “loss functions” by approximating “somehow” the gradient of the loss function  Loss functions Gradient Boosting Machine (2)
本文档为【Ensemble and Boosting】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_492351
暂无简介~
格式:pdf
大小:574KB
软件:PDF阅读器
页数:0
分类:
上传时间:2013-03-23
浏览量:7