Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers

被引:1
|
作者
Dehghanpour, Alireza [1 ]
Kordestani, Javad Khodamoradi [1 ]
Dehyadegari, Masoud [1 ,2 ]
机构
[1] K N Toosi Univ Technol, Fac Comp Engn, Tehran 1631714191, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, 193955746, Tehran, Iran
关键词
Deep neural networks; Floating point; Sorting; AlexNet; Convolutional neural networks;
D O I
10.1007/s11063-023-11409-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A 32-bit floating-point format is often used for the development and training of deep neural networks. Training and inference in deep learning-optimized codecs can result in enormous performance and energy efficiency advantages. However, training and inferring low-bit neural networks still pose a significant challenge. In this study, we propose a sorting method that maintains accuracy in numerical formats with a low number of bits. We tested this method on convolutional neural networks, including AlexNet. Using our method, we found that in our convolutional neural network, the accuracy achieved with 11 bits matches that of the IEEE 32-bit format. Similarly, in AlexNet, the accuracy achieved with 10 bits matches that of the IEEE 32-bit format. These results suggest that the sorting method shows promise for calculations with limited accuracy.
引用
收藏
页码:12061 / 12078
页数:18
相关论文
共 50 条
  • [1] Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers
    Alireza Dehghanpour
    Javad Khodamoradi Kordestani
    Masoud Dehyadegari
    Neural Processing Letters, 2023, 55 : 12061 - 12078
  • [2] FLOATING-POINT ARITHMETIC WITH 84-BIT NUMBERS
    GREGORY, RT
    RANEY, JL
    COMMUNICATIONS OF THE ACM, 1964, 7 (01) : 10 - 13
  • [3] Arithmetic Coding for Floating-Point Numbers
    Fischer, Marc
    Riedel, Oliver
    Lechler, Armin
    Verl, Alexander
    2021 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (DSC), 2021,
  • [4] Accurate and Reliable Computing in Floating-Point Arithmetic
    Rump, Siegfried M.
    MATHEMATICAL SOFTWARE - ICMS 2010, 2010, 6327 : 105 - 108
  • [5] Accurate Complex Multiplication in Floating-Point Arithmetic
    Lefevre, Vincent
    Muller, Jean-Michel
    2019 IEEE 26TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2019, : 23 - 29
  • [7] Accurate evaluation of Chebyshev polynomials in floating-point arithmetic
    Tomasz Hrycak
    Sebastian Schmutzhard
    BIT Numerical Mathematics, 2019, 59 : 403 - 416
  • [8] Accurate evaluation of Chebyshev polynomials in floating-point arithmetic
    Hrycak, Tomasz
    Schmutzhard, Sebastian
    BIT NUMERICAL MATHEMATICS, 2019, 59 (02) : 403 - 416
  • [9] Floating-point arithmetic
    Boldo, Sylvie
    Jeannerod, Claude-Pierre
    Melquiond, Guillaume
    Muller, Jean-Michel
    ACTA NUMERICA, 2023, 32 : 203 - 290
  • [10] SIMULATING LOW PRECISION FLOATING-POINT ARITHMETIC
    Higham, Nicholas J.
    Pranesh, Srikara
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2019, 41 (05): : C585 - C602