POLARDB: FORMULA-DRIVEN DATASET FOR PRE-TRAINING TRAJECTORY ENCODERS

被引:0
|
作者
Miyamoto, Sota [1 ]
Yagi, Takuma [3 ]
Makimoto, Yuto [1 ]
Ukai, Mahiro [1 ]
Ushiku, Yoshitaka [2 ]
Hashimoto, Atsushi [2 ]
Inoue, Nakamasa [1 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
[2] OMRON SINIC X Corp, Bunkyo, Japan
[3] Natl Inst Adv Ind Sci & Technol, Ibaraki, Japan
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
Formula-driven supervised learning; Polar equations; Fine-grained action recognition; Cutting method recognition;
D O I
10.1109/ICASSP48485.2024.10448448
中图分类号
学科分类号
摘要
Formula-driven supervised learning (FDSL) is a growing research topic for finding simple mathematical formulas that generate synthetic data and labels for pre-training neural networks. The main advantage of FDSL is that there is no risk of generating data with ethical implications such as gender bias and racial bias because it does not rely on real data as discussed in previous studies using fractals and polygons for pre-training image encoders. While FDSL has been proposed for pre-training image encoders, it has not been considered for temporal trajectory data. In this paper, we introduce PolarDB, the first formula-driven dataset for pre-training trajectory encoders with an application to fine-grained cutting-method recognition using hand trajectories. More specifically, we generate 270k trajectories for 432 categories on the basis of polar equations and use them to pre-train a Transformer-based trajectory encoder in an FDSL manner. In the experiments, we show that pre-training on PolarDB improves the accuracy of fine-grained cutting-method recognition on cooking videos of EPIC-KITCHEN and Ego4D datasets, where the pre-trained trajectory encoder is used as a plug-in module for a video recognition network.
引用
收藏
页码:5465 / 5469
页数:5
相关论文
共 47 条
  • [31] Evaluation of Dataset Selection for Pre-Training and Fine-Tuning Transformer Language Models for Clinical Question Answering
    Soni, Sarvesh
    Roberts, Kirk
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5532 - 5538
  • [32] ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain
    Zhang, Mike
    van der Goot, Rob
    Plank, Barbara
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11871 - 11890
  • [33] OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network
    Zhao, Tiancheng
    Liu, Peng
    Lee, Kyusong
    IET COMPUTER VISION, 2024, 18 (05) : 626 - 639
  • [34] Pre-training dataset generation using visual explanation for classifying beam of vehicle headlights from nighttime camera image
    Oyabu, Tatsuya
    Sultana, Rebeka
    Sakagawa, Yuta
    Ohashi, Gosuke
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2021, 16 (12) : 1603 - 1611
  • [35] Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation
    Mao, Zhuoyuan
    Chu, Chenhui
    Kurohashi, Sadao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (04)
  • [36] Cross-dataset transfer learning for motor imagery signal classification via multi-task learning and pre-training
    Xie, Yuting
    Wang, Kun
    Meng, Jiayuan
    Yue, Jin
    Meng, Lin
    Yi, Weibo
    Jung, Tzyy-Ping
    Xu, Minpeng
    Ming, Dong
    JOURNAL OF NEURAL ENGINEERING, 2023, 20 (05)
  • [37] Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
    Pan, Yingwei
    Li, Yehao
    Luo, Jianjie
    Xu, Jun
    Yao, Ting
    Tao Mei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7070 - 7074
  • [38] SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
    Kim, Hoon
    Jang, Minje
    Yoon, Wonjun
    Lee, Jisoo
    Na, Donghyun
    Wool, Sanghyun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 25096 - 25106
  • [39] CREATER: CTR-driven Advertising Text Generation with Controlled Pre-Training and Contrastive Fine-Tuning
    Wei, Penghui
    Yang, Xuanhua
    Liu, Shaoguo
    Wang, Liang
    Zheng, Bo
    2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2022, 2022, : 9 - 17
  • [40] Data-driven Discovery of a Sepsis Patients Severity Prediction in the ICU via Pre-training BiLSTM Networks
    Li, Qing
    Huang, L. Frank
    Zhong, Jiang
    Li, Lili
    Li, Qi
    Hu, Junhao
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 668 - 673