当前位置: X-MOL 学术ACM Trans. Embed. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration
ACM Transactions on Embedded Computing Systems ( IF 2 ) Pub Date : 2024-01-19 , DOI: 10.1145/3634704
Huamei Qi 1 , Fang Ren 1 , Leilei Wang 1 , Ping Jiang 1 , Shaohua Wan 2 , Xiaoheng Deng 3
Affiliation  

Edge intelligence has emerged as a promising paradigm to accelerate DNN inference by model partitioning, which is particularly useful for intelligent scenarios that demand high accuracy and low latency. However, the dynamic nature of the edge environment and the diversity of end devices pose a significant challenge for DNN model partitioning strategies. Meanwhile, limited resources of the edge server make it difficult to manage resource allocation efficiently among multiple devices. In addition, most of the existing studies disregard the different service requirements of the DNN inference tasks, such as its high accuracy-sensitive or high latency-sensitive. To address these challenges, we propose a Multi-Compression Scale DNN Inference Acceleration (MCIA) based on cloud-edge-end collaboration. We model this problem as a mixed-integer multi-dimensional optimization problem, jointly optimizing the DNN model version choice, the partitioning choice, and the allocation of computational and bandwidth resources to maximize the tradeoff between inference accuracy and latency depending on the property of the tasks. Initially, we train multiple versions of DNN inference models with different compression scales in the cloud, and deploy them to end devices and edge server. Next, a deep reinforcement learning-based algorithm is developed for joint decision making of adaptive collaborative inference and resource allocation based on the current multi-compression scale models and the task property. Experimental results show that MCIA can adapt to heterogeneous devices and dynamic networks, and has superior performance compared with other methods.



中文翻译:

基于云边端协同的多压缩尺度DNN推理加速

边缘智能已成为一种通过模型划分加速 DNN 推理的有前途的范例,这对于需要高精度和低延迟的智能场景特别有用。然而,边缘环境的动态特性和终端设备的多样性对 DNN 模型划分策略提出了重大挑战。同时,边缘服务器有限的资源使得难以有效管理多个设备之间的资源分配。此外,大多数现有研究忽视了DNN推理任务的不同服务需求,例如其高精度敏感或高延迟敏感。为了应对这些挑战,我们提出了基于云边端协作的多压缩规模DNN推理加速(MCIA)。我们将此问题建模为混合整数多维优化问题,联合优化 DNN 模型版本选择、分区选择以及计算和带宽资源的分配,以根据模型的属性最大化推理精度和延迟之间的权衡。任务。最初,我们在云端训练具有不同压缩比例的多个版本的DNN推理模型,并将它们部署到终端设备和边缘服务器。接下来,基于当前的多压缩尺度模型和任务属性,开发了一种基于深度强化学习的算法,用于自适应协作推理和资源分配的联合决策。实验结果表明,MCIA能够适应异构设备和动态网络,与其他方法相比具有优越的性能。

更新日期:2024-01-19
down
wechat
bug