Engagement Recognition in Online Learning Based on an Improved Video Vision Transformer

被引:1
|
作者
Guo, Zijian [1 ]
Zhou, Zhuoyi [1 ]
Pan, Jiahui [1 ]
Liang, Yan [1 ]
机构
[1] South China Normal Univ, Sch Software, Guangzhou, Peoples R China
关键词
D O I
10.1109/IJCNN54540.2023.10191579
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online learning has gained wide attention and application due to its flexibility and convenience. However, due to the separation of time and space, the level of students' engagement is not easily informed by teachers, which affects the effectiveness of teaching. Automatic detection of students' engagement is an effective way to solve this problem. It can help teachers obtain timely feedback from students and adjust the teaching schedule. In this paper, transformer is first applied in engagement recognition and a novel network based on an improved video vision transformer (ViViT) is proposed to detect student engagement. A new transformer encoder, named Transformer Encoder with Low Complexity (TELC) is proposed. It adopts unit force operated attention (UFO-attention) to eliminate the nonlinearity of the original self-attention in standard ViViT and Patch Merger to fuse the input patches, which allows the network to significantly reduce computational complexity while improving performance. The proposed method is evaluated on the Dataset for Affective States in E-learning Environments (DAiSEE) and achieves an accuracy of 63.91% in the four-level classification task, which is superior to state-of-the-art methods. The experimental results demonstrate the effectiveness of our method, which is more suitable for the practical application of online learning.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Dendritic Learning-Incorporated Vision Transformer for Image Recognition
    Zhiming Zhang
    Zhenyu Lei
    Masaaki Omura
    Hideyuki Hasegawa
    Shangce Gao
    [J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11 (02) : 539 - 541
  • [22] ViViT: A Video Vision Transformer
    Arnab, Anurag
    Dehghani, Mostafa
    Heigold, Georg
    Sun, Chen
    Lucic, Mario
    Schmid, Cordelia
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6816 - 6826
  • [23] Online learning from local features for video-based face recognition
    Mian, Ajmal
    [J]. PATTERN RECOGNITION, 2011, 44 (05) : 1068 - 1075
  • [24] Online learning of probabilistic appearance manifolds for video-based recognition and tracking
    Lee, KC
    Kriegman, D
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 852 - 859
  • [25] A human activity recognition method based on Vision Transformer
    Han, Huiyan
    Zeng, Hongwei
    Kuang, Liqun
    Han, Xie
    Xue, Hongxin
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [26] Adaptive computer vision: Online learning for object recognition
    Bekel, H
    Bax, I
    Heidemann, G
    Ritter, H
    [J]. PATTERN RECOGNITION, 2004, 3175 : 447 - 454
  • [27] Facial Expression Recognition Based on Squeeze Vision Transformer
    Kim, Sangwon
    Nam, Jaeyeal
    Ko, Byoung Chul
    [J]. SENSORS, 2022, 22 (10)
  • [28] Vision based pose recognition in video game
    Jang, Dong Heon
    Jin, Xiang Hua
    Kim, Tae Yong
    [J]. TECHNOLOGIES FOR E-LEARNING AND DIGITAL ENTERTAINMENT, PROCEEDINGS, 2008, 5093 : 391 - 400
  • [29] ViTframe: Vision Transformer Acceleration via Informative Frame Selection for Video Recognition
    Qi, Chunyu
    Li, Zilong
    Song, Zhuoran
    Liang, Xiaoyao
    [J]. 2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 383 - 390
  • [30] DeepFake detection algorithm based on improved vision transformer
    Heo, Young-Jin
    Yeo, Woon-Ha
    Kim, Byung-Gyu
    [J]. APPLIED INTELLIGENCE, 2023, 53 (07) : 7512 - 7527