共 25 条
- [1] Ioffe S, Szegedy C., Batch normalization: accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning, pp. 448-456, (2015)
- [2] Wang L, Xiong Y, Wang Z, Et al., Temporal segment networks: towards good practices for deep action recognition, European Conference on Computer Vision, pp. 20-36, (2016)
- [3] Tran D, Bourdev L, Fergus R, Et al., Learning spatiotemporal features with 3D convolutional networks, International Conference on Computer Vision, pp. 4489-4497, (2015)
- [4] Hochreiter S, Schmidhuber J., Longshort-term memory, Neural Computation, 9, 8, pp. 1735-1780, (1997)
- [5] Donahue J, Hendricks L A, Guadarrama S, Et al., Long-termrecurrent convolutional networks for visual recognition and description, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 4, pp. 677-691, (2017)
- [6] Brox T, Bruhn A, Papenberg N, Et al., High accuracy optical flow estimation based on a theory for warping, Computer Vision, 3024, 10, pp. 25-36, (2004)
- [7] Xu K, Ba J, Kiros R, Et al., Show, attend and tell: neural image caption generation with visual attention, International Conference on Machine Learning, pp. 2048-2057, (2015)
- [8] Sharma S, Kiros R, Salakhutdinov R., Action Recognition Using Visual Attention
- [9] Yan S, Smith J S, Lu W, Et al., Hierarchical multi-scale attention networks for action recognition, Signal Processing: Image Communication, 61, pp. 73-84, (2018)
- [10] Yu T, Guo C, Wang L, Et al., Joint spatial-temporal attention for action recognition, Computer Science, 112, 2018, pp. 226-233, (2018)