当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
StreamliNet: Cost-aware layer-wise neural network linearization for fast and accurate private inference
Information Sciences ( IF 8.1 ) Pub Date : 2024-03-15 , DOI: 10.1016/j.ins.2024.120463
Zhi Pang , Lina Wang , Fangchao Yu , Kai Zhao , Bo Zeng

Private inference (PI) allows a client and a server to perform cryptographically-secure deep neural network inference without disclosing their sensitive data to each other. Despite the strong security guarantee, existing models are ill-suited for PI since their overused non-linear operations such as ReLUs are computationally expensive in the regime of ciphertext and therefore dominate the PI latency. Previous solutions on ReLU optimization either ignore the intrinsic importance of ReLU or suffer from significant accuracy loss. In this work, we propose , an importance-driven gradient-based framework to speed up PI latency and retain inference accuracy. Specifically, we first present a novel notion of as a proxy for the ReLU importance in a multivariate metric to precisely identify layer-wise budgets. Then, our automates the selection of performance-insensitive ReLUs for linearization and learns the non-linearity sparse model where ReLUs are present in each layer with appropriate counts and locations. Moreover, in order to reduce the activation map discrepancy, we develop a cost-aware post-activation consistency constraint to prioritize the linearization of ReLUs with low cost while further mitigating the model performance degradation. Extensive experiments on various models and datasets demonstrate that outperforms the state-of-the-arts such as SNL (ICML 22) and SENet (ICLR 23) by boosting 3.09% more accuracy with iso-ReLU budget or requiring 2× fewer ReLUs with iso-accuracy, on CIFAR-100.

中文翻译:

StreamliNet:成本感知的逐层神经网络线性化,可实现快速准确的私人推理

私有推理 (PI) 允许客户端和服务器执行加密安全的深度神经网络推理,而无需向彼此泄露敏感数据。尽管有强大的安全保证,但现有模型并不适合 PI,因为它们过度使用的非线性操作(例如 ReLU)在密文机制中计算成本很高,因此主导了 PI 延迟。以前的 ReLU 优化解决方案要么忽略了 ReLU 的内在重要性,要么遭受严重的准确性损失。在这项工作中,我们提出了一种基于重要性驱动的梯度框架,以加快 PI 延迟并保持推理准确性。具体来说,我们首先提出了一种新颖的概念,作为多元指标中 ReLU 重要性的代理,以精确识别分层预算。然后,我们自动选择对性能不敏感的 ReLU 进行线性化,并学习非线性稀疏模型,其中 ReLU 存在于每一层中,并具有适当的计数和位置。此外,为了减少激活图差异,我们开发了一种成本感知的激活后一致性约束,以低成本优先考虑 ReLU 的线性化,同时进一步缓解模型性能下降。对各种模型和数据集的大量实验表明,通过使用 iso-ReLU 预算将准确度提高 3.09%,或者使用 iso 减少 2 倍的 ReLU,其性能优于 SNL (ICML 22) 和 SENet (ICLR 23) 等最先进的技术-CIFAR-100 上的准确度。
更新日期:2024-03-15
down
wechat
bug