Frame-by-frame annotation of video recordings using deep neural networks

被引:8
|
作者
Conway, Alexander M. [1 ]
Durbach, Ian N. [1 ,2 ]
McInnes, Alistair [3 ,4 ]
Harris, Robert N. [5 ]
机构
[1] Univ Cape Town, Ctr Stat Ecol Environm & Conservat, Cape Town, South Africa
[2] Univ St Andrews, Ctr Res Ecol & Environm Modelling, St Andrews, Fife, Scotland
[3] BirdLife South Africa, Seabird Conservat Programme, Johannesburg, South Africa
[4] Nelson Mandela Univ, Percy FitzPatrick Inst, DST NRF Ctr Excellence, Dept Zool, Port Elizabeth, South Africa
[5] Univ St Andrews, Sea Mammal Res Unit, St Andrews, Fife, Scotland
来源
ECOSPHERE | 2021年 / 12卷 / 03期
基金
新加坡国家研究基金会;
关键词
animal‐ borne video; automated detection; deep learning; image classification; neural networks; video classification; IDENTIFICATION;
D O I
10.1002/ecs2.3384
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Video data are widely collected in ecological studies, but manual annotation is a challenging and time-consuming task, and has become a bottleneck for scientific research. Classification models based on convolutional neural networks (CNNs) have proved successful in annotating images, but few applications have extended these to video classification. We demonstrate an approach that combines a standard CNN summarizing each video frame with a recurrent neural network (RNN) that models the temporal component of video. The approach is illustrated using two datasets: one collected by static video cameras detecting seal activity inside coastal salmon nets and another collected by animal-borne cameras deployed on African penguins, used to classify behavior. The combined RNN-CNN led to a relative improvement in test set classification accuracy over an image-only model of 25% for penguins (80% to 85%), and substantially improved classification precision or recall for four of six behavior classes (12-17%). Image-only and video models classified seal activity with very similar accuracy (88 and 89%), and no seal visits were missed entirely by either model. Temporal patterns related to movement provide valuable information about animal behavior, and classifiers benefit from including these explicitly. We recommend the inclusion of temporal information whenever manual inspection suggests that movement is predictive of class membership.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Frame-by-Frame Determination of Emotions in a Video Recording Using Multilayer Neural Networks
    Akhiyarov, F. R.
    Derevyagin, L. A.
    Makarov, V. V.
    Tsurkov, V., I
    Yakovlev, A. N.
    [J]. JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 2022, 61 (02) : 213 - 218
  • [2] Frame-by-Frame Determination of Emotions in a Video Recording Using Multilayer Neural Networks
    F. R. Akhiyarov
    L. A. Derevyagin
    V. V. Makarov
    V. I. Tsurkov
    A. N. Yakovlev
    [J]. Journal of Computer and Systems Sciences International, 2022, 61 : 213 - 218
  • [3] Frame-by-frame language identification in short utterances using deep neural networks
    Gonzalez-Dominguez, Javier
    Lopez-Moreno, Ignacio
    Moreno, Pedro J.
    Gonzalez-Rodriguez, Joaquin
    [J]. NEURAL NETWORKS, 2015, 64 : 49 - 58
  • [4] Security pitfalls of frame-by-frame approaches to video watermarking
    Doérr, G
    Dugelay, JL
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (10) : 2955 - 2964
  • [5] DoWe Really Need Frame-by-Frame Annotation Datasets for Object Tracking ?
    Hu, Lei
    Huang, Shaoli
    Wang, Shilei
    Liu, Wei
    Ning, Jifeng
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4949 - 4957
  • [6] Efficient Frame-By-Frame QP Assignment Method for Internet Video Coding
    Park, Sang-hyo
    Xu, Haiyan
    Jang, Euee S.
    [J]. 2016 IEEE 5TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS, 2016,
  • [7] A Time Series Intra-Video Collusion Attack on Frame-by-Frame Video Watermarking
    Behseta, Sam
    Lam, Charles
    Sutton, Joseph E.
    Webb, Robert L.
    [J]. DIGITAL WATERMARKING, 2009, 5450 : 31 - +
  • [8] Omnidirectional Panoramic Video System With Frame-by-Frame Ultrafast Viewpoint Control
    Hu, Shaopeng
    Dong, Hongyu
    Shimasaki, Kohei
    Jiang, Mingjun
    Senoo, Taku
    Ishii, Idaku
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02): : 4086 - 4093
  • [9] MAGNIFICATION OF AFFECT USING FRAME-BY-FRAME FILM ANALYSIS
    PEERY, JC
    [J]. ENVIRONMENTAL PSYCHOLOGY AND NONVERBAL BEHAVIOR, 1978, 3 (01): : 58 - 61
  • [10] Optimal frame-by-frame result combination strategy for OCR in video stream
    Bulatov, Konstantin
    Lynchenko, Aleksander
    Krivtsov, Valeriy
    [J]. TENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2017), 2018, 10696