fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP

被引:0
|
作者
Geng, Zhichao [1 ]
Yan, Hang [1 ]
Qiu, Xipeng [1 ]
Huang, Xuanjing [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present fastHan, an open-source toolkit for four basic tasks in Chinese natural language processing: Chinese word segmentation (CWS), Part-of-Speech (POS) tagging, named entity recognition (NER), and dependency parsing. The backbone of fastHan is a multi-task model based on a pruned BERT, which uses the first 8 layers in BERT. We also provide a 4-layer base model compressed from the 8-layer model. The joint-model is trained and evaluated on 13 corpora of four tasks, yielding near state-of-the-art (SOTA) performance in dependency parsing and NER, achieving SOTA performance in CWS and POS. Besides, fastHan's transferability is also strong, performing much better than popular segmentation tools on a non-training corpus. To better meet the need of practical application, we allow users to use their own labeled data to further fine-tune fastHan. In addition to its small size and excellent performance, fastHan is user-friendly. Implemented as a python package, fastHan isolates users from the internal technical details and is convenient to use. The project is released on Github(1).
引用
收藏
页码:99 / 106
页数:8
相关论文
共 50 条
  • [1] BERT-Based Multi-Task Learning for Aspect-Based Opinion Mining
    Patel, Manil
    Ezeife, C., I
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2021, PT I, 2021, 12923 : 192 - 204
  • [2] Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization
    Abolghasemi, Amin
    Verberne, Suzan
    Azzopardi, Leif
    ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 3 - 12
  • [3] A BERT-based One-Pass Multi-Task Model for Clinical Temporal Relation Extraction
    Lin, Chen
    Miller, Timothy
    Dligach, Dmitriy
    Sadeque, Farig
    Bethard, Steven
    Savova, Guergana
    19TH SIGBIOMED WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2020), 2020, : 70 - 75
  • [4] Massive Choice, Ample Tasks (MACHAMP): A Toolkit for Multi-task Learning in NLP
    van der Goot, Rob
    Ustun, Ahmet
    Ramponi, Alan
    Sharaf, Ibrahim
    Plank, Barbara
    EACL 2021: THE 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, 2021, : 176 - 197
  • [5] Improvements on a Multi-task BERT Model
    Agrali, Mahmut
    Tekir, Selma
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [6] Multi-task BERT for Aspect-based Sentiment Analysis
    Wang, Yuqi
    Chen, Qi
    Wang, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2021), 2021, : 383 - 385
  • [7] LIMIT-BERT : Linguistics Informed Multi-Task BERT
    Zhou, Junru
    Zhang, Zhuosheng
    Zhao, Hai
    Zhang, Shuailiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4450 - 4461
  • [8] Applying BERT-Based NLP for Automated Resume Screening and Candidate Ranking
    Asmita Deshmukh
    Anjali Raut
    Annals of Data Science, 2025, 12 (2) : 591 - 603
  • [9] BERT-based Dense Intra-ranking and Contextualized Late Interaction via Multi-task Learning for Long Document Retrieval
    Li, Minghan
    Gaussier, Eric
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2347 - 2352
  • [10] A Flexible Multi-Task Model for BERT Serving
    Wei, Tianwen
    Qi, Jianwei
    He, Shenghuan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 785 - 796