当前位置: X-MOL 学术Optim. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A gradient-based bilevel optimization approach for tuning regularization hyperparameters
Optimization Letters ( IF 1.6 ) Pub Date : 2023-09-29 , DOI: 10.1007/s11590-023-02057-x
Ankur Sinha , Tanmay Khandait , Raja Mohanty

Hyperparameter tuning in the area of machine learning is often achieved using naive techniques, such as random search and grid search. However, most of these methods seldom lead to an optimal set of hyperparameters and often get very expensive. The hyperparameter optimization problem is inherently a bilevel optimization task, and there exist studies that have attempted bilevel solution methodologies to solve this problem. These techniques often assume a unique set of weights that minimizes the loss on the training set. Such an assumption is violated by deep learning architectures. We propose a bilevel solution method for solving the hyperparameter optimization problem that does not suffer from the drawbacks of the earlier studies. The proposed method is general and can be easily applied to any class of machine learning algorithms that involve continuous hyperparameters. The idea is based on the approximation of the lower level optimal value function mapping that helps in reducing the bilevel problem to a single-level constrained optimization task. The single-level constrained optimization problem is then solved using the augmented Lagrangian method. We perform extensive computational study on three datasets that confirm the efficiency of the proposed method. A comparative study against grid search, random search, Tree-structured Parzen Estimator and Quasi Monte Carlo Sampler shows that the proposed algorithm is multiple times faster and leads to models that generalize better on the testing set.



中文翻译:

用于调整正则化超参数的基于梯度的双层优化方法

机器学习领域的超参数调整通常是使用简单的技术来实现的,例如随机搜索和网格搜索。然而,这些方法中的大多数很少会产生一组最佳的超参数,并且通常非常昂贵。超参数优化问题本质上是一个双层优化任务,并且已有研究尝试使用双层求解方法来解决该问题。这些技术通常假设一组独特的权重,以最大限度地减少训练集的损失。深度学习架构违反了这样的假设。我们提出了一种双层求解方法来解决超参数优化问题,该方法没有早期研究的缺点。所提出的方法是通用的,可以轻松应用于涉及连续超参数的任何类别的机器学习算法。该想法基于较低级别最优值函数映射的近似,有助于将双层问题简化为单层约束优化任务。然后使用增强拉格朗日方法求解单级约束优化问题。我们对三个数据集进行了广泛的计算研究,证实了所提出方法的效率。与网格搜索、随机搜索、树结构 Parzen 估计器和准蒙特卡洛采样器的比较研究表明,所提出的算法速度快了数倍,并且模型在测试集上具有更好的泛化能力。该想法基于较低级别最优值函数映射的近似,有助于将双层问题简化为单层约束优化任务。然后使用增强拉格朗日方法求解单级约束优化问题。我们对三个数据集进行了广泛的计算研究,证实了所提出方法的效率。与网格搜索、随机搜索、树结构 Parzen 估计器和准蒙特卡洛采样器的比较研究表明,所提出的算法速度快了数倍,并且模型在测试集上具有更好的泛化能力。该想法基于较低级别最优值函数映射的近似,有助于将双层问题简化为单层约束优化任务。然后使用增强拉格朗日方法求解单级约束优化问题。我们对三个数据集进行了广泛的计算研究,证实了所提出方法的效率。与网格搜索、随机搜索、树结构 Parzen 估计器和准蒙特卡洛采样器的比较研究表明,所提出的算法速度快了数倍,并且模型在测试集上具有更好的泛化能力。然后使用增强拉格朗日方法求解单级约束优化问题。我们对三个数据集进行了广泛的计算研究,证实了所提出方法的效率。与网格搜索、随机搜索、树结构 Parzen 估计器和准蒙特卡洛采样器的比较研究表明,所提出的算法速度快了数倍,并且模型在测试集上具有更好的泛化能力。然后使用增强拉格朗日方法求解单级约束优化问题。我们对三个数据集进行了广泛的计算研究,证实了所提出方法的效率。与网格搜索、随机搜索、树结构 Parzen 估计器和准蒙特卡洛采样器的比较研究表明,所提出的算法速度快了数倍,并且模型在测试集上具有更好的泛化能力。

更新日期:2023-10-01
down
wechat
bug