Action Recognition of Temporal Segment Network Based on Feature Fusion

被引：0

作者：

Li H. ^{[1
,2
,3
,4
]}

Ding Y. ^{[1
]}

Li C. ^{[1
]}

Zhang S. ^{[1
,3
]}

机构：

[1] School of Information Science and Technology, Nantong University, Nantong, 226019, Jiangsu

[2] State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing

[3] Nantong Research Institute for Advanced Communication Technologies, Nantong, 226019, Jiangsu

[4] Tongke School of Microelectronics, Nantong, 226019, Jiangsu

来源：

Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2020年 / 57卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Action recognition; Feature fusion; Sparse features; Temporal segment network; Two-stream convolution network;

D O I：

10.7544/issn1000-1239.2020.20190180

中图分类号：

学科分类号：

摘要：

Action recognition is a research hot topic and a challenging task in the field of computer vision nowadays. Action recognition analysis is closely related to its network input data type, network structure and feature fusion. At present, the main input data of action recognition network is RGB images and optical flow images, and the network structure is mainly based on two-stream and three dimension convolution. While the selection of features directly affects the efficiency of recognition and there are still many problems to be solved in multi-layer feature fusion. In view of the limitation of the RGB images and optical flow images which are the input of the popular two-stream convolution network, using sparse features in low rank space can effectively capture the information characteristics of moving objects in video and supplement the network input data. Meanwhile, for the lack of information interaction in the deep network, the high-level semantic information and the low-level detailed information are combined to recognize actions together, which makes temporal segment network performance more advantageous. Extensive experiments in subjective and objective comparison are performed on UCF101 and HMDB51 and the results show that the proposed algorithm is significantly better than several state-of-the-art algorithms, and the average accuracy rate of the proposed algorithm reaches 97.1% and 76.7%. The experimental results show that our method can effectively improve the recognition rate of action recognition. © 2020, Science Press. All right reserved.

引用

页码：145 / 158

页数：13

共 50 条

[1] Yao G., Lei T., Zhong J., A review of convolutional-neural-network-based action recognition, Pattern Recognition Letters, 118, 1, pp. 14-22, (2019)
[2] Shan Y., Zhang Z., Huang K., Visual human action recognition: History, status and prospects, Journal of Computer Research and Development, 53, 1, pp. 93-112, (2016)
[3] Lei C., Song Z., Lu J., Et al., Learning principal orientations and residual descriptor for action recognition, Pattern Recognition, 86, 2, pp. 14-26, (2019)
[4] Hao Y., Zheng Q., Chen Y., Et al., Recognition of abnoamal behavior based on data of public opinion on the Web, Journal of Computer Research and Development, 53, 3, pp. 611-620, (2016)
[5] Laptev I., On space-time interest points, International Journal of Computer Vision, 64, 2-3, pp. 107-123, (2005)
[6] Harris C.J., A combined corner and edge detector, Proc of the 4th Alvey Vision Conf, pp. 147-151, (1988)
[7] Oikonomopoulos A., Patras I., Pantic M., Spatiotemporal salient points for visual recognition of human actions, IEEE Transactions on Cybernetics, 36, 3, pp. 710-719, (2006)
[8] Dollar P., Rabaud V., Cottrell G., Et al., Behavior recognition via sparse spatio-temporal features, Proc of IEEE Int Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, (2006)
[9] Tong M., Wang F., Wang S., Et al., A new framework of action recognition: 3DHOGTCC and 3DHOOFG, Journal of Computer Research and Development, 52, 12, pp. 2802-2812, (2015)
[10] Willems G., Tuytelaars T., Gool L.V., An efficient dense and scale-invariant spatio-temporal interest point detector, Proc of European Conf on Computer Vision, pp. 650-663, (2008)

← 1 2 3 4 5 →