TNT: Text Normalization based Pre-training of Transformers for Content Moderation

被引:0
|
作者
Tan, Fei [1 ]
Hu, Yifan [1 ]
Hu, Changwei [1 ]
Li, Keqian [1 ]
Yen, Kevin [1 ]
机构
[1] Yahoo Res, New York, NY 10003 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present a new language pre-training model TNT (Text Normalization based pre-training of Transformers) for content moderation. Inspired by the masking strategy and text normalization, TNT is developed to learn language representation by training transformers to reconstruct text from four operation types typically seen in text manipulation: substitution, transposition, deletion, and insertion. Furthermore, the normalization involves the prediction of both operation types and token labels, enabling TNT to learn from more challenging tasks than the standard task of masked word recovery. As a result, the experiments demonstrate that TNT outperforms strong baselines on the hate speech classification task. Additional text normalization experiments and case studies show that TNT is a new potential approach to misspelling correction.
引用
收藏
页码:4735 / 4741
页数:7
相关论文
共 50 条
  • [21] Pre-Training a Graph Recurrent Network for Text Understanding
    Wang, Yile
    Yang, Linyi
    Teng, Zhiyang
    Zhou, Ming
    Zhang, Yue
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (05) : 4061 - 4074
  • [22] Self-supervised Pre-training of Text Recognizers
    Kiss, Martin
    Hradis, Michal
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT IV, 2024, 14807 : 218 - 235
  • [23] Image-Text Pre-Training for Logo Recognition
    Hubenthal, Mark
    Kumar, Suren
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1145 - 1154
  • [24] MolXPT: Wrapping Molecules with Text for Generative Pre-training
    Liu, Zequn
    Zhang, Wei
    Xia, Yingce
    Wu, Lijun
    Xie, Shufang
    Qin, Tao
    Zhang, Ming
    Liu, Tie-Yan
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1606 - 1616
  • [25] PreSTU: Pre-Training for Scene-Text Understanding
    Kil, Jihyung
    Changpinyo, Soravit
    Chen, Xi
    Hu, Hexiang
    Goodman, Sebastian
    Chao, Wei-Lun
    Soricut, Radu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15224 - 15234
  • [26] Self-attention Based Text Matching Model with Generative Pre-training
    Zhang, Xiaolin
    Lei, Fengpei
    Yu, Shengji
    2021 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS DASC/PICOM/CBDCOM/CYBERSCITECH 2021, 2021, : 84 - 91
  • [27] Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training
    Zhang, Haofei
    Duan, Jiarui
    Xue, Mengqi
    Song, Jie
    Sun, Li
    Song, Mingli
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8934 - 8943
  • [28] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1601 - 1610
  • [29] Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
    Noman, Mubashir
    Naseer, Muzammal
    Cholakkal, Hisham
    Anwar, Rao Muhammad
    Khan, Salman
    Khan, Fahad Shahbaz
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27811 - 27819
  • [30] Investigating of Disease Name Normalization Using Neural Network and Pre-Training
    Lou, Yinxia
    Qian, Tao
    Li, Fei
    Zhou, Junxiang
    Ji, Donghong
    Cheng, Ming
    IEEE ACCESS, 2020, 8 : 85729 - 85739