EgoCom: A Multi-Person Multi-Modal Egocentric Communications Dataset

被引:5
|
作者
Northcutt, Curtis G. [1 ]
Zha, Shengxin [2 ]
Lovegrove, Steven [3 ]
Newcombe, Richard [3 ]
机构
[1] MIT, Dept Elect & Comp Sci, Cambridge, MA 02139 USA
[2] Facebook AI, Menlo Pk, CA 94025 USA
[3] Oculus Res, Facebook Real Labs, Redmond, WA 98052 USA
关键词
Task analysis; Artificial intelligence; Visualization; Synchronization; Natural languages; Computer vision; Education; Egocentric; multi-modal data; EgoCom; communication; turn-taking; human-centric; embodied intelligence; VIDEOS;
D O I
10.1109/TPAMI.2020.3025105
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal datasets in artificial intelligence (AI) often capture a third-person perspective, but our embodied human intelligence evolved with sensory input from the egocentric, first-person perspective. Towards embodied AI, we introduce the Egocentric Communications (EgoCom) dataset to advance the state-of-the-art in conversational AI, natural language, audio speech analysis, computer vision, and machine learning. EgoCom is a first-of-its-kind natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives. EgoCom includes 38.5 hours of synchronized embodied stereo audio, egocentric video with 240,000 ground-truth, time-stamped word-level transcriptions and speaker labels from 34 diverse speakers. We study baseline performance on two novel applications that benefit from embodied data: (1) predicting turn-taking in conversations and (2) multi-speaker transcription. For (1), we investigate Bayesian baselines to predict turn-taking within 5 percent of human performance. For (2), we use simultaneous egocentric capture to combine Google speech-to-text outputs, improving global transcription by 79 percent relative to a single perspective. Both applications exploit EgoCom's synchronous multi-perspective data to augment performance of embodied AI tasks.
引用
收藏
页码:6783 / 6793
页数:11
相关论文
共 50 条
  • [31] Egocentric Human Trajectory Forecasting With a Wearable Camera and Multi-Modal Fusion
    Qiu, Jianing
    Chen, Lipeng
    Gu, Xiao
    Lo, Frank P-W
    Tsai, Ya-Yen
    Sun, Jiankai
    Liu, Jiaqi
    Lo, Benny
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 8799 - 8806
  • [32] Person Tracking Association Using Multi-modal Systems
    Belmonte-Hernandez, A.
    Solachidis, V.
    Theodoridis, T.
    Hernandez-Penaloza, G.
    Conti, G.
    Vretosl, N.
    Alvarez, F.
    Daras, P.
    2017 14TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2017,
  • [33] Recognition of multi-person action
    Bobick, A
    MANAGING INTERACTIONS IN SMART ENVIRONMENTS, 2000, : 19 - 33
  • [34] Customized Multi-person Tracker
    Ma, Liqian
    Tang, Siyu
    Black, Michael J.
    Van Gool, Luc
    COMPUTER VISION - ACCV 2018, PT II, 2019, 11362 : 612 - 628
  • [35] Multi-modal anchor adaptation learning for multi-modal summarization
    Chen, Zhongfeng
    Lu, Zhenyu
    Rong, Huan
    Zhao, Chuanjun
    Xu, Fan
    NEUROCOMPUTING, 2024, 570
  • [36] WEMAC: Women and Emotion Multi-modal Affective Computing dataset
    Miranda Calero, Jose A.
    Gutierrez-Martin, Laura
    Rituerto-Gonzalez, Esther
    Romero-Perales, Elena
    Lanza-Gutierrez, Jose M.
    Pelaez-Moreno, Carmen
    Lopez-Ongil, Celia
    SCIENTIFIC DATA, 2024, 11 (01)
  • [37] MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations
    Gia-Bao Dinh Hoc
    Tan, Chang Wei
    Darban, Zahra Zamanzadeh
    Salehi, Mahsa
    Haffari, Gholamreza
    Buntine, Wray
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 314 - 326
  • [38] Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding
    Jia, Ao
    He, Yu
    Zhang, Yazhou
    Uprety, Sagar
    Song, Dawei
    Lioma, Christina
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1512 - 1522
  • [39] Gesture Recognition on a New Multi-Modal Hand Gesture Dataset
    Schak, Monika
    Gepperth, Alexander
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 122 - 131
  • [40] A multi-modal panel dataset to understand the psychological impact of the pandemic
    Isabelle van der Vegt
    Bennett Kleinberg
    Scientific Data, 10