SsciBERT: a pre-trained language model for social science texts

被引:0
|
作者
Si Shen
Jiangfeng Liu
Litao Lin
Ying Huang
Lin Zhang
Chang Liu
Yutong Feng
Dongbo Wang
机构
[1] Nanjing University of Science and Technology,Group of Science and Technology Full
[2] Nanjing Agricultural University,Text Knowledge Mining, School of Economics & Management
[3] Wuhan University,College of Information Management
[4] Wuhan University,Center for Science, Technology & Education Assessment (CSTEA)
来源
Scientometrics | 2023年 / 128卷
关键词
Social science; Natural language processing; Pre-trained models; Text analysis; BERT;
D O I
暂无
中图分类号
学科分类号
摘要
The academic literature of social sciences records human civilization and studies human social problems. With its large-scale growth, the ways to quickly find existing research on relevant issues have become an urgent demand for researchers. Previous studies, such as SciBERT, have shown that pre-training using domain-specific texts can improve the performance of natural language processing tasks. However, the pre-trained language model for social sciences is not available so far. In light of this, the present research proposes a pre-trained model based on the abstracts published in the Social Science Citation Index (SSCI) journals. The models, which are available on GitHub (https://github.com/S-T-Full-Text-Knowledge-Mining/SSCI-BERT), show excellent performance on discipline classification, abstract structure–function recognition, and named entity recognition tasks with the social sciences literature.
引用
收藏
页码:1241 / 1263
页数:22
相关论文
共 50 条
  • [1] SsciBERT: a pre-trained language model for social science texts
    Shen, Si
    Liu, Jiangfeng
    Lin, Litao
    Huang, Ying
    Zhang, Lin
    Liu, Chang
    Feng, Yutong
    Wang, Dongbo
    [J]. SCIENTOMETRICS, 2023, 128 (02) : 1241 - 1263
  • [2] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [3] Devulgarization of Polish Texts Using Pre-trained Language Models
    Klamra, Cezary
    Wojdyga, Grzegorz
    Zurowski, Sebastian
    Rosalska, Paulina
    Kozlowska, Matylda
    Ogrodniczuk, Maciej
    [J]. COMPUTATIONAL SCIENCE, ICCS 2022, PT II, 2022, : 49 - 55
  • [4] RoBERTuito: a pre-trained language model for social media text in Spanish
    Manuel Perez, Juan
    Furman, Damian A.
    Alonso Alemany, Laura
    Luque, Franco
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7235 - 7243
  • [5] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [6] Adder Encoder for Pre-trained Language Model
    Ding, Jianbang
    Zhang, Suiyun
    Li, Linlin
    [J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
  • [7] Intelligent Completion of Ancient Texts Based on Pre-trained Language Models
    Li, Jiajun
    Ming, Can
    Guo, Zhihao
    Qian, Tieyun
    Peng, Zhiyong
    Wang, Xiaoguang
    Li, Xuhui
    Li, Jing
    [J]. Data Analysis and Knowledge Discovery, 2024, 8 (05) : 59 - 67
  • [8] Surgicberta: a pre-trained language model for procedural surgical language
    Bombieri, Marco
    Rospocher, Marco
    Ponzetto, Simone Paolo
    Fiorini, Paolo
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
  • [9] ViDeBERTa: A powerful pre-trained language model for Vietnamese
    Tran, Cong Dao
    Pham, Nhut Huy
    Nguyen, Anh
    Hy, Truong Son
    Vu, Tu
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1071 - 1078
  • [10] CrisisTransformers: Pre-trained language models and sentence encoders for crisis-related social media texts
    Lamsal, Rabindra
    Read, Maria Rodriguez
    Karunasekera, Shanika
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 296