EgoCom: A Multi-Person Multi-Modal Egocentric Communications Dataset

被引:5
|
作者
Northcutt, Curtis G. [1 ]
Zha, Shengxin [2 ]
Lovegrove, Steven [3 ]
Newcombe, Richard [3 ]
机构
[1] MIT, Dept Elect & Comp Sci, Cambridge, MA 02139 USA
[2] Facebook AI, Menlo Pk, CA 94025 USA
[3] Oculus Res, Facebook Real Labs, Redmond, WA 98052 USA
关键词
Task analysis; Artificial intelligence; Visualization; Synchronization; Natural languages; Computer vision; Education; Egocentric; multi-modal data; EgoCom; communication; turn-taking; human-centric; embodied intelligence; VIDEOS;
D O I
10.1109/TPAMI.2020.3025105
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal datasets in artificial intelligence (AI) often capture a third-person perspective, but our embodied human intelligence evolved with sensory input from the egocentric, first-person perspective. Towards embodied AI, we introduce the Egocentric Communications (EgoCom) dataset to advance the state-of-the-art in conversational AI, natural language, audio speech analysis, computer vision, and machine learning. EgoCom is a first-of-its-kind natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives. EgoCom includes 38.5 hours of synchronized embodied stereo audio, egocentric video with 240,000 ground-truth, time-stamped word-level transcriptions and speaker labels from 34 diverse speakers. We study baseline performance on two novel applications that benefit from embodied data: (1) predicting turn-taking in conversations and (2) multi-speaker transcription. For (1), we investigate Bayesian baselines to predict turn-taking within 5 percent of human performance. For (2), we use simultaneous egocentric capture to combine Google speech-to-text outputs, improving global transcription by 79 percent relative to a single perspective. Both applications exploit EgoCom's synchronous multi-perspective data to augment performance of embodied AI tasks.
引用
收藏
页码:6783 / 6793
页数:11
相关论文
共 50 条
  • [41] MOFA: A novel dataset for Multi-modal Image Fusion Applications
    Xiao, Kaihua
    Kang, Xudong
    Liu, Haibo
    Duan, Puhong
    INFORMATION FUSION, 2023, 96 : 144 - 155
  • [42] A multi-modal panel dataset to understand the psychological impact of the pandemic
    van der Vegt, Isabelle
    Kleinberg, Bennett
    SCIENTIFIC DATA, 2023, 10 (01)
  • [43] Multi-modal Gesture Recognition Challenge 2013: Dataset and Results
    Escalera, Sergio
    Gonzalez, Jordi
    Baro, Xavier
    Reyes, Miguel
    Lopes, Oscar
    Guyon, Isabelle
    Athitsos, Vassilis
    Escalante, Hugo J.
    ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 445 - 452
  • [44] Ticino: A multi-modal remote sensing dataset for semantic segmentation
    Barbato, Mirko Paolo
    Piccoli, Flavio
    Napoletano, Paolo
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [45] A multi-modal open dataset for mental-disorder analysis
    Hanshu Cai
    Zhenqin Yuan
    Yiwen Gao
    Shuting Sun
    Na Li
    Fuze Tian
    Han Xiao
    Jianxiu Li
    Zhengwu Yang
    Xiaowei Li
    Qinglin Zhao
    Zhenyu Liu
    Zhijun Yao
    Minqiang Yang
    Hong Peng
    Jing Zhu
    Xiaowei Zhang
    Guoping Gao
    Fang Zheng
    Rui Li
    Zhihua Guo
    Rong Ma
    Jing Yang
    Lan Zhang
    Xiping Hu
    Yumin Li
    Bin Hu
    Scientific Data, 9
  • [46] FatigueSet: A Multi-modal Dataset for Modeling Mental Fatigue and Fatigability
    Kalanadhabhatta, Manasa
    Min, Chulhong
    Montanari, Alessandro
    Kawsar, Fahim
    PERVASIVE COMPUTING TECHNOLOGIES FOR HEALTHCARE, PERVASIVE HEALTH 2021, 2022, 431 : 204 - 217
  • [47] A multi-modal open dataset for mental-disorder analysis
    Cai, Hanshu
    Yuan, Zhenqin
    Gao, Yiwen
    Sun, Shuting
    Li, Na
    Tian, Fuze
    Xiao, Han
    Li, Jianxiu
    Yang, Zhengwu
    Li, Xiaowei
    Zhao, Qinglin
    Liu, Zhenyu
    Yao, Zhijun
    Yang, Minqiang
    Peng, Hong
    Zhu, Jing
    Zhang, Xiaowei
    Gao, Guoping
    Zheng, Fang
    Li, Rui
    Guo, Zhihua
    Ma, Rong
    Yang, Jing
    Zhang, Lan
    Hu, Xiping
    Li, Yumin
    Hu, Bin
    SCIENTIFIC DATA, 2022, 9 (01)
  • [48] MUMBAI: multi-person, multimodal board game affect and interaction analysis dataset
    Metehan Doyran
    Arjan Schimmel
    Pınar Baki
    Kübra Ergin
    Batıkan Türkmen
    Almıla Akdağ Salah
    Sander C. J. Bakkes
    Heysem Kaya
    Ronald Poppe
    Albert Ali Salah
    Journal on Multimodal User Interfaces, 2021, 15 : 373 - 391
  • [49] MUMBAI: multi-person, multimodal board game affect and interaction analysis dataset
    Doyran, Metehan
    Schimmel, Arjan
    Baki, Pinar
    Ergin, Kubra
    Turkmen, Batikan
    Salah, Almila Akdag
    Bakkes, Sander C. J.
    Kaya, Heysem
    Poppe, Ronald
    Salah, Albert Ali
    JOURNAL ON MULTIMODAL USER INTERFACES, 2021, 15 (04) : 373 - 391
  • [50] Exploring Multi-Scenario Multi-Modal CTR Prediction with a Large Scale Dataset
    Huan, Zhaoxin
    Ding, Ke
    Li, Ang
    Zhang, Xiaolu
    Min, Xu
    He, Yong
    Zhang, Liang
    Zhou, Jun
    Mo, Linjian
    Gu, Jinjie
    Liu, Zhongyi
    Zhong, Wenliang
    Zhang, Guannan
    Li, Chenliang
    Yuan, Fajie
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1232 - 1241