Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers

被引:1
|
作者
Dehghanpour, Alireza [1 ]
Kordestani, Javad Khodamoradi [1 ]
Dehyadegari, Masoud [1 ,2 ]
机构
[1] K N Toosi Univ Technol, Fac Comp Engn, Tehran 1631714191, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, 193955746, Tehran, Iran
关键词
Deep neural networks; Floating point; Sorting; AlexNet; Convolutional neural networks;
D O I
10.1007/s11063-023-11409-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A 32-bit floating-point format is often used for the development and training of deep neural networks. Training and inference in deep learning-optimized codecs can result in enormous performance and energy efficiency advantages. However, training and inferring low-bit neural networks still pose a significant challenge. In this study, we propose a sorting method that maintains accuracy in numerical formats with a low number of bits. We tested this method on convolutional neural networks, including AlexNet. Using our method, we found that in our convolutional neural network, the accuracy achieved with 11 bits matches that of the IEEE 32-bit format. Similarly, in AlexNet, the accuracy achieved with 10 bits matches that of the IEEE 32-bit format. These results suggest that the sorting method shows promise for calculations with limited accuracy.
引用
收藏
页码:12061 / 12078
页数:18
相关论文
共 50 条
  • [31] New directions in floating-point arithmetic
    Beebe, Nelson H. F.
    COMPUTATION IN MODERN SCIENCE AND ENGINEERING VOL 2, PTS A AND B, 2007, 2 : 155 - 158
  • [32] A SURVEY OF FLOATING-POINT ARITHMETIC IMPLEMENTATIONS
    ERCEGOVAC, MD
    PROCEEDINGS OF THE SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS, 1983, 431 : 60 - 64
  • [33] Unum: Adaptive Floating-Point Arithmetic
    Morancho, Enric
    19TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2016), 2016, : 651 - 656
  • [34] CORRECTION OF SUM IN FLOATING-POINT ARITHMETIC
    PICHAT, M
    NUMERISCHE MATHEMATIK, 1972, 19 (05) : 400 - &
  • [35] Binary floating-point arithmetic [1]
    Zuras, Dan
    Dr. Dobb's Journal, 2005, 30 (04):
  • [36] Floating-point arithmetic in the Coq system
    Melquiond, Guillaume
    INFORMATION AND COMPUTATION, 2012, 216 : 14 - 23
  • [37] NUMERICAL INVESTIGATION OF FLOATING-POINT ARITHMETIC
    BAKHRAKH, SM
    VELICHKO, SV
    PILIPCHATIN, NE
    SPIRIDONOV, VF
    SUKHOV, EG
    FEDOROVA, YG
    KHEIFETS, VI
    PROGRAMMING AND COMPUTER SOFTWARE, 1992, 18 (06) : 255 - 258
  • [38] A FLOATING-POINT RESIDUE ARITHMETIC UNIT
    TAYLOR, FJ
    HUANG, CH
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 1981, 311 (01): : 33 - 53
  • [39] Floating-Point Formats and Arithmetic for Highly Accurate Multi-Layer Perceptrons
    Niknia, Farzad
    Wang, Ziheng
    Liu, Shanshan
    Reviriego, Pedro
    Louri, Ahmed
    Lombardi, Fabrizio
    2023 IEEE 23RD INTERNATIONAL CONFERENCE ON NANOTECHNOLOGY, NANO, 2023, : 587 - 591
  • [40] Optimization Modulo the Theories of Signed Bit-Vectors and Floating-Point Numbers
    Trentin, Patrick
    Sebastiani, Roberto
    JOURNAL OF AUTOMATED REASONING, 2021, 65 (07) : 1071 - 1096