ConfliBERT: A Pre-trained Language Model for Political Conflict and Violence

被引:0
|
作者
Hu, Yibo [1 ]
Hosseini, MohammadSaleh [1 ]
Parolin, Erick Skorupa [1 ]
Osorio, Javier [2 ]
Khan, Latifur [1 ]
Brandt, Patrick T. [1 ]
D'Orazio, Vito J. [1 ]
机构
[1] Univ Texas Dallas, Richardson, TX 75083 USA
[2] Univ Arizona, Tucson, AZ 85721 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Analyzing conflicts and political violence around the world is a persistent challenge in the political science and policy communities due in large part to the vast volumes of specialized text needed to monitor conflict and violence on a global scale. To help advance research in political science, we introduce ConfliBERT, a domain-specific pre-trained language model for conflict and political violence. We first gather a large domain-specific text corpus for language modeling from various sources. We then build ConfliBERT using two approaches: pre-training from scratch and continual pretraining. To evaluate ConfliBERT, we collect 12 datasets and implement 18 tasks to assess the models' practical application in conflict research. Finally, we evaluate several versions of ConfliBERT in multiple experiments. Results consistently show that ConfliBERT outperforms BERT when analyzing political violence and conflict. Our code is publicly available.(1)
引用
收藏
页码:5469 / 5482
页数:14
相关论文
共 50 条
  • [1] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [2] PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter
    Kawintiranon, Kornraphop
    Singh, Lisa
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7360 - 7367
  • [3] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [4] Adder Encoder for Pre-trained Language Model
    Ding, Jianbang
    Zhang, Suiyun
    Li, Linlin
    [J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
  • [5] Surgicberta: a pre-trained language model for procedural surgical language
    Bombieri, Marco
    Rospocher, Marco
    Ponzetto, Simone Paolo
    Fiorini, Paolo
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
  • [6] ViDeBERTa: A powerful pre-trained language model for Vietnamese
    Tran, Cong Dao
    Pham, Nhut Huy
    Nguyen, Anh
    Hy, Truong Son
    Vu, Tu
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1071 - 1078
  • [7] BERTweet: A pre-trained language model for English Tweets
    Dat Quoc Nguyen
    Thanh Vu
    Anh Tuan Nguyen
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, 2020, : 9 - 14
  • [8] Pre-trained Language Model for Biomedical Question Answering
    Yoon, Wonjin
    Lee, Jinhyuk
    Kim, Donghyeon
    Jeong, Minbyul
    Kang, Jaewoo
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 727 - 740
  • [9] Misspelling Correction with Pre-trained Contextual Language Model
    Hu, Yifei
    Ting, Xiaonan
    Ko, Youlim
    Rayz, Julia Taylor
    [J]. PROCEEDINGS OF 2020 IEEE 19TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2020), 2020, : 144 - 149
  • [10] Enhancing Language Generation with Effective Checkpoints of Pre-trained Language Model
    Park, Jeonghyeok
    Zhao, Hai
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2686 - 2694