Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers

被引：1

作者：

Dehghanpour, Alireza ^{[1
]}

Kordestani, Javad Khodamoradi ^{[1
]}

Dehyadegari, Masoud ^{[1
,2
]}

机构：

[1] K N Toosi Univ Technol, Fac Comp Engn, Tehran 1631714191, Iran

[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, 193955746, Tehran, Iran

来源：

NEURAL PROCESSING LETTERS | 2023年 / 55卷 / 09期

关键词：

Deep neural networks; Floating point; Sorting; AlexNet; Convolutional neural networks;

D O I：

10.1007/s11063-023-11409-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A 32-bit floating-point format is often used for the development and training of deep neural networks. Training and inference in deep learning-optimized codecs can result in enormous performance and energy efficiency advantages. However, training and inferring low-bit neural networks still pose a significant challenge. In this study, we propose a sorting method that maintains accuracy in numerical formats with a low number of bits. We tested this method on convolutional neural networks, including AlexNet. Using our method, we found that in our convolutional neural network, the accuracy achieved with 11 bits matches that of the IEEE 32-bit format. Similarly, in AlexNet, the accuracy achieved with 10 bits matches that of the IEEE 32-bit format. These results suggest that the sorting method shows promise for calculations with limited accuracy.

引用

页码：12061 / 12078

页数：18

共 50 条

[31] New directions in floating-point arithmetic
Beebe, Nelson H. F.
COMPUTATION IN MODERN SCIENCE AND ENGINEERING VOL 2, PTS A AND B, 2007, 2 : 155 - 158
[32] A SURVEY OF FLOATING-POINT ARITHMETIC IMPLEMENTATIONS
ERCEGOVAC, MD
PROCEEDINGS OF THE SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS, 1983, 431 : 60 - 64
[33] Unum: Adaptive Floating-Point Arithmetic
Morancho, Enric
19TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2016), 2016, : 651 - 656
[34] CORRECTION OF SUM IN FLOATING-POINT ARITHMETIC
PICHAT, M
NUMERISCHE MATHEMATIK, 1972, 19 (05) : 400 - &
[35] Binary floating-point arithmetic [1]
Zuras, Dan
Dr. Dobb's Journal, 2005, 30 (04):
[36] Floating-point arithmetic in the Coq system
Melquiond, Guillaume
INFORMATION AND COMPUTATION, 2012, 216 : 14 - 23
[37] NUMERICAL INVESTIGATION OF FLOATING-POINT ARITHMETIC
BAKHRAKH, SM
VELICHKO, SV
PILIPCHATIN, NE
SPIRIDONOV, VF
SUKHOV, EG
FEDOROVA, YG
KHEIFETS, VI
PROGRAMMING AND COMPUTER SOFTWARE, 1992, 18 (06) : 255 - 258
[38] A FLOATING-POINT RESIDUE ARITHMETIC UNIT
TAYLOR, FJ
HUANG, CH
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 1981, 311 (01): : 33 - 53
[39] Floating-Point Formats and Arithmetic for Highly Accurate Multi-Layer Perceptrons
Niknia, Farzad
Wang, Ziheng
Liu, Shanshan
Reviriego, Pedro
Louri, Ahmed
Lombardi, Fabrizio
2023 IEEE 23RD INTERNATIONAL CONFERENCE ON NANOTECHNOLOGY, NANO, 2023, : 587 - 591
[40] Optimization Modulo the Theories of Signed Bit-Vectors and Floating-Point Numbers
Trentin, Patrick
Sebastiani, Roberto
JOURNAL OF AUTOMATED REASONING, 2021, 65 (07) : 1071 - 1096

← 1 2 3 4 5 →