共 25 条
- [1] CUDA toolkit documentation
- [2] HARRIS M, SENGUPTA S, OWENS J D., Parallel prefix sum (scan) with CUDA, pp. 851-876, (2007)
- [3] YAN S G, LONG G P, ZHANG Y Q., StreamScan:fast scan algorithms for GPUs without global barrier synchronization[C], Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 229-238, (2013)
- [4] WANG X Y, YANG J L, ZHAO Y L, Et al., TCIM:triangle counting acceleration with processing-in-MRAM architecture[C], Proceedings of 57th ACM/IEEE Design Automation Conference, pp. 1-6, (2020)
- [5] DOTSENKO Y, GOVINDARAJU N K, SLOAN P P, Et al., Fast scan algorithms on graphics processors[C], Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 205-213, (2008)
- [6] SENGUPTA S, HARRIS M, ZHANG Y, Et al., Scan primitives for GPU computing[C], Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 97-106, (2007)
- [7] LONG Y, NA T, MUKHOPADHYAY S., ReRAM-based processing-in-memory architecture for recurrent neural network acceleration[J], IEEE Transactions on Very Large Scale Integration Systems, 26, 12, pp. 2781-2794, (2018)
- [8] YANG X X, YAN B N, LI H, Et al., ReTransformer:ReRAM-based processing-in-memory architecture for transformer acceleration[C], Proceedings of IEEE/ACM International Conference on Computer Aided Design, pp. 1-9, (2020)
- [9] CHEN Y R, LI H, CHEN Y Z, Et al., Current status and prospects of neuromorphic computing, AI-View, 5, 2, pp. 46-58, (2018)
- [10] JI Y, ZHANG Y H, ZHENG W M., Approximate computing method based on memristors[J], Journal of Tsinghua University (Science and Technology), 61, 6, pp. 610-617, (2021)