The Need for Speed: Pruning Transformers with One Recipe,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Need for Speed: Pruning Transformers with One Recipe
arXiv - CS - Machine Learning Pub Date : 2024-03-26 , DOI: arxiv-2403.17921
Samir Khaki, Konstantinos N. Plataniotis

We introduce the $\textbf{O}$ne-shot $\textbf{P}$runing $\textbf{T}$echnique for $\textbf{I}$nterchangeable $\textbf{N}$etworks ($\textbf{OPTIN}$) framework as a tool to increase the efficiency of pre-trained transformer architectures $\textit{without requiring re-training}$. Recent works have explored improving transformer efficiency, however often incur computationally expensive re-training procedures or depend on architecture-specific characteristics, thus impeding practical wide-scale adoption. To address these shortcomings, the OPTIN framework leverages intermediate feature distillation, capturing the long-range dependencies of model parameters (coined $\textit{trajectory}$), to produce state-of-the-art results on natural language, image classification, transfer learning, and semantic segmentation tasks $\textit{without re-training}$. Given a FLOP constraint, the OPTIN framework will compress the network while maintaining competitive accuracy performance and improved throughput. Particularly, we show a $\leq 2$% accuracy degradation from NLP baselines and a $0.5$% improvement from state-of-the-art methods on image classification at competitive FLOPs reductions. We further demonstrate the generalization of tasks and architecture with comparative performance using Mask2Former for semantic segmentation and cnn-style networks. OPTIN presents one of the first one-shot efficient frameworks for compressing transformer architectures that generalizes well across different class domains, in particular: natural language and image-related tasks, without $\textit{re-training}$.

中文翻译：

对速度的极品：用一种方法修剪变压器

我们介绍$\textbf{O}$ne-shot $\textbf{P}$运行$\textbf{T}$技术用于$\textbf{I}$可互换的$\textbf{N}$网络（$\textbf {OPTIN}$) 框架作为提高预训练 Transformer 架构效率的工具 $\textit{无需重新训练}$。最近的工作已经探索了提高变压器效率的方法，但通常会产生计算成本高昂的重新训练程序或依赖于特定于架构的特性，从而阻碍了实际的大规模采用。为了解决这些缺点，OPTIN 框架利用中间特征蒸馏，捕获模型参数的远程依赖性（称为 $\textit{trajectory}$），以在自然语言、图像分类、迁移学习和语义分割任务$\textit{无需重新训练}$。给定 FLOP 约束，OPTIN 框架将压缩网络，同时保持有竞争力的准确性性能和提高的吞吐量。特别是，我们发现，与 NLP 基线相比，准确率下降了 $\leq 2$%，而在具有竞争性的 FLOPs 减少的情况下，图像分类的准确率比最先进的方法提高了 $\leq 2$%。我们使用 Mask2Former 进行语义分割和 cnn 式网络，进一步展示了具有比较性能的任务和架构的泛化。 OPTIN 提出了第一个用于压缩 Transformer 架构的一次性高效框架之一，该框架可以很好地泛化到不同的类域，特别是：自然语言和图像相关任务，而无需 $\textit{re-training}$。

更新日期：2024-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>