Deep learning parallel computing and evaluation for embedded system clustering architecture processor,Design Automation for Embedded Systems

当前位置： X-MOL 学术 › Des. Autom. Embed. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep learning parallel computing and evaluation for embedded system clustering architecture processor
Design Automation for Embedded Systems ( IF 1.4 ) Pub Date : 2020-03-07 , DOI: 10.1007/s10617-020-09235-5
Yue Zu

In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods.

中文翻译：

嵌入式系统集群架构处理器的深度学习并行计算与评估

在智能时代，处理大量信息和各种智能应用程序都需要依靠嵌入式设备。这种趋势使机器学习算法扮演着越来越重要的角色。高性能嵌入式计算是解决嵌入式设备计算能力不足的有效手段。针对基于机器学习技术的新型智能嵌入式应用的计算量较大，传统嵌入式系统的计算能力难以满足其需求的问题，本文研究了Parallella中卷积神经网络的并行优化和实现技术。平台。给出了异构多核系统集群架构处理器上卷积神经网络的并行优化策略。然后研究了卷积神经网络在Parallella平台上的高性能实现，并实现了卷积神经网络系统的功能。提出了一套嵌入式并行处理器的性能评估方法。从S698P的应用角度出发，选择eCos操作系统作为平台。在模拟器GRSIM上比较了单核模式和多核模式，并给出了并行性能评估。实验表明，与传统的并行方法相比，深度学习任务的效率得到了显着提高。提出了一套嵌入式并行处理器的性能评估方法。从S698P的应用角度出发，选择eCos操作系统作为平台。在模拟器GRSIM上比较了单核模式和多核模式，并给出了并行性能评估。实验表明，与传统的并行方法相比，深度学习任务的效率得到了显着提高。提出了一套嵌入式并行处理器的性能评估方法。从S698P的应用角度出发，选择eCos操作系统作为平台。在模拟器GRSIM上比较了单核模式和多核模式，并给出了并行性能评估。实验表明，与传统的并行方法相比，深度学习任务的效率得到了显着提高。

更新日期：2020-03-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>