Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers

被引：1

作者：

Dehghanpour, Alireza ^{[1
]}

Kordestani, Javad Khodamoradi ^{[1
]}

Dehyadegari, Masoud ^{[1
,2
]}

机构：

[1] K N Toosi Univ Technol, Fac Comp Engn, Tehran 1631714191, Iran

[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, 193955746, Tehran, Iran

来源：

NEURAL PROCESSING LETTERS | 2023年 / 55卷 / 09期

关键词：

Deep neural networks; Floating point; Sorting; AlexNet; Convolutional neural networks;

D O I：

10.1007/s11063-023-11409-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A 32-bit floating-point format is often used for the development and training of deep neural networks. Training and inference in deep learning-optimized codecs can result in enormous performance and energy efficiency advantages. However, training and inferring low-bit neural networks still pose a significant challenge. In this study, we propose a sorting method that maintains accuracy in numerical formats with a low number of bits. We tested this method on convolutional neural networks, including AlexNet. Using our method, we found that in our convolutional neural network, the accuracy achieved with 11 bits matches that of the IEEE 32-bit format. Similarly, in AlexNet, the accuracy achieved with 10 bits matches that of the IEEE 32-bit format. These results suggest that the sorting method shows promise for calculations with limited accuracy.

引用

页码：12061 / 12078

页数：18

共 50 条

[1] Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers
Alireza Dehghanpour
Javad Khodamoradi Kordestani
Masoud Dehyadegari
Neural Processing Letters, 2023, 55 : 12061 - 12078
[2] FLOATING-POINT ARITHMETIC WITH 84-BIT NUMBERS
GREGORY, RT
RANEY, JL
COMMUNICATIONS OF THE ACM, 1964, 7 (01) : 10 - 13
[3] Arithmetic Coding for Floating-Point Numbers
Fischer, Marc
Riedel, Oliver
Lechler, Armin
Verl, Alexander
2021 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (DSC), 2021,
[4] Accurate and Reliable Computing in Floating-Point Arithmetic
Rump, Siegfried M.
MATHEMATICAL SOFTWARE - ICMS 2010, 2010, 6327 : 105 - 108
[5] Accurate Complex Multiplication in Floating-Point Arithmetic
Lefevre, Vincent
Muller, Jean-Michel
2019 IEEE 26TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2019, : 23 - 29
[6] NOTE ON TRIPLE-PRECISION FLOATING-POINT ARITHMETIC WITH 132-BIT NUMBERS
IKEBE, Y
COMMUNICATIONS OF THE ACM, 1965, 8 (03) : 175 - &
[7] Accurate evaluation of Chebyshev polynomials in floating-point arithmetic
Tomasz Hrycak
Sebastian Schmutzhard
BIT Numerical Mathematics, 2019, 59 : 403 - 416
[8] Accurate evaluation of Chebyshev polynomials in floating-point arithmetic
Hrycak, Tomasz
Schmutzhard, Sebastian
BIT NUMERICAL MATHEMATICS, 2019, 59 (02) : 403 - 416
[9] Floating-point arithmetic
Boldo, Sylvie
Jeannerod, Claude-Pierre
Melquiond, Guillaume
Muller, Jean-Michel
ACTA NUMERICA, 2023, 32 : 203 - 290
[10] SIMULATING LOW PRECISION FLOATING-POINT ARITHMETIC
Higham, Nicholas J.
Pranesh, Srikara
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2019, 41 (05): : C585 - C602

← 1 2 3 4 5 →