当前位置: X-MOL 学术ACM Trans. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
UNIQ
ACM Transactions on Computer Systems ( IF 1.5 ) Pub Date : 2021-03-26 , DOI: 10.1145/3444943
Chaim Baskin 1 , Natan Liss 1 , Eli Schwartz 2 , Evgenii Zheltonozhskii 1 , Raja Giryes 2 , Alex M. Bronstein , Avi Mendelson 1
Affiliation  

We present a novel method for neural network quantization. Our method, named UNIQ , emulates a non-uniform k -quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. Our non-uniform quantization approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We further propose a novel complexity metric of number of bit operations performed (BOPs), and we show that this metric has a linear relation with logic utilization and power. We suggest evaluating the trade-off of accuracy vs. complexity (BOPs). The proposed method, when evaluated on ResNet18/34/50 and MobileNet on ImageNet, outperforms the prior state of the art both in the low-complexity regime and the high accuracy regime. We demonstrate the practical applicability of this approach, by implementing our non-uniformly quantized CNN on FPGA.

中文翻译:

优衣库

我们提出了一种新的神经网络量化方法。我们的方法,命名为优衣库, 模拟非均匀ķ-分位数量化器,并通过在训练时向权重注入噪声,使模型在量化权重下表现良好。作为将噪声注入权重的副产品,我们发现激活也可以量化到低至 8 位,而精度只有轻微的下降。我们的非均匀量化方法为神经网络的现有均匀量化技术提供了一种新颖的替代方案。我们进一步提出了一种新的复杂度度量,即执行的位操作数(BOP),并且我们表明该度量与逻辑利用率和功率具有线性关系。我们建议评估准确性与复杂性 (BOP) 的权衡。所提出的方法在 ResNet18/34/50 和 ImageNet 上的 MobileNet 上进行评估时,在低复杂度和高精度状态下均优于现有技术。
更新日期:2021-03-26
down
wechat
bug