当前位置: X-MOL 学术J. Optim. Theory Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On Optimizing Ensemble Models using Column Generation
Journal of Optimization Theory and Applications ( IF 1.9 ) Pub Date : 2024-02-22 , DOI: 10.1007/s10957-024-02391-9
Vanya Aziz , Ouyang Wu , Ivo Nowak , Eligius M. T. Hendrix , Jan Kronqvist

In recent years, an interest appeared in integrating various optimization algorithms in machine learning. We study the potential of ensemble learning in classification tasks and how to efficiently decompose the underlying optimization problem. Ensemble learning has become popular for machine learning applications and it is particularly interesting from an optimization perspective due to its resemblance to column generation. The challenge for learning is not only to obtain a good fit for the training data set, but also good generalization, such that the classifier is generally applicable. Deep networks have the drawback that they require a lot of computational effort to get to an accurate classification. Ensemble learning can combine various weak learners, which individually require less computational time. We consider binary classification problems studying a three-phase algorithm. After initializing a set of base learners refined by a bootstrapping approach, base learners are generated using the solution of an linear programming (LP) master problem and then solving a machine learning sub-problem regarding a reduced data set, which can be viewed as a so-called pricing problem. We theoretically show that the algorithm computes an optimal ensemble model in the convex hull of a given model space. The implementation of the algorithm is part of an ensemble learning framework called decolearn. Numerical experiments with CIFAR-10 data set show that the base learners are diverse and that both the training and generalization error are reduced after several iterations.



中文翻译:

使用列生成优化集成模型

近年来,人们对在机器学习中集成各种优化算法产生了兴趣。我们研究集成学习在分类任务中的潜力以及如何有效地分解底层优化问题。集成学习在机器学习应用中已经变得很流行,并且由于其与列生成的相似性,从优化的角度来看特别有趣。学习的挑战不仅是获得对训练数据集的良好拟合,而且是良好的泛化性,使得分类器具有普遍适用性。深度网络的缺点是需要大量的计算工作才能获得准确的分类。集成学习可以组合各种弱学习器,这些弱学习器单独需要更少的计算时间。我们考虑研究三相算法的二元分类问题。在初始化一组通过自举方法细化的基础学习器后,使用线性规划(LP)主问题的解决方案生成基础学习器,然后解决关于缩减数据集的机器学习子问题,这可以被视为所谓定价问题。我们从理论上证明,该算法在给定模型空间的凸包中计算最佳集成模型。该算法的实现是称为decolearn 的集成学习框架的一部分。CIFAR-10数据集的数值实验表明,基学习器是多样化的,并且经过多次迭代后,训练和泛化误差都得到了降低。

更新日期:2024-02-22
down
wechat
bug