共 50 条
- [1] Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2019, 45 (02):
- [2] Towards Numerical Benchmark for Half-Precision Floating Point Arithmetic 2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,
- [3] Batched Small Tensor-Matrix Multiplications on GPUs 2020 IEEE 27TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2020), 2020, : 305 - 314
- [4] Fast Kronecker Matrix-Matrix Multiplication on GPUs PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 390 - 403
- [6] Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed-Precision Solvers on GPUs PROCEEDINGS OF SCALA 2019: 2019 IEEE/ACM 10TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS (SCALA), 2019, : 17 - 24
- [7] The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques COMPUTATIONAL SCIENCE - ICCS 2018, PT I, 2018, 10860 : 586 - 600
- [8] Demystifying Tensor Cores to Optimize Half-Precision Matrix Multiply 2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 634 - 643
- [10] tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores 2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 1 - 11