Listen to Look: Action Recognition by Previewing Audio

被引:132
|
作者
Gao, Ruohan [1 ,2 ]
Oh, Tae-Hyun [2 ,3 ]
Grauman, Kristen [1 ,2 ]
Torresani, Lorenzo [2 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Facebook AI Res, Austin, TX 78701 USA
[3] POSTECH, Dept EE, Pohang, South Korea
关键词
D O I
10.1109/CVPR42600.2020.01047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the face of the video data deluge, today's expensive clip-level classifiers are increasingly impractical. We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redun-dancies. First, we devise an IMGAUD2VID framework that hallucinates clip-level features by distilling from lighter modalities-a single frame and its accompanying audio-reducing short-term temporal redundancy for efficient clip-level recognition. Second, building on IMGAUD2VID, we further propose IMGAUD-SKIMMING, an attention-based long short-term memory network that iteratively selects useful moments in untrimmed videos, reducing long-term temporal redundancy for efficient video-level recognition. Extensive experiments on four action recognition datasets demonstrate that our method achieves the state-of-the-art in terms of both recognition accuracy and speed.
引用
收藏
页码:10454 / 10464
页数:11
相关论文
共 50 条
  • [31] Look, listen, and feel
    Quinones-Baldrich, William J.
    JOURNAL OF VASCULAR SURGERY, 2017, 66 (05) : 1314 - 1320
  • [32] STOP ... LOOK ... LISTEN
    COSTA, G
    DB-SOUND ENGINEERING MAGAZINE, 1984, 18 (03): : 53 - 53
  • [33] LOOK AND LISTEN - THERE IS NO SUBSTITUTE
    WAGNER, RW
    AV COMMUNICATION REVIEW, 1962, 10 (02) : 119 - 123
  • [34] STOP, LOOK, LISTEN
    Bergelin, Annie
    LANDSCAPE ARCHITECTURE MAGAZINE, 2014, 104 (02) : 22 - 22
  • [35] SHOP, LOOK AND LISTEN
    BAXLEY, SL
    PLATING AND SURFACE FINISHING, 1989, 76 (05): : 6 - 6
  • [36] Learning to Look and Listen
    Miller, Elvena
    JOURNAL OF SPEECH AND HEARING DISORDERS, 1952, 17 (02): : 231 - 232
  • [37] Stop, look, and listen
    Hoerr, Thomas R.
    EDUCATIONAL LEADERSHIP, 2008, 65 (08) : 88 - 89
  • [38] Stop, Look, and Listen
    Ballou, Mary Jane
    SACRED MUSIC, 2011, 138 (01): : 69 - 71
  • [39] AUDIO CAPTION: LISTEN AND TELL
    Wu, Mengyue
    Dinkel, Heinrich
    Yu, Kai
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 830 - 834
  • [40] LOOK, LISTEN, AND LEARN. A Manual on the Use of Audio-Visual Materials in Informal Education
    Gilkinson, Howard
    Howell, William S.
    QUARTERLY JOURNAL OF SPEECH, 1948, 34 (04) : 529 - 530