fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP

被引:0
|
作者
Geng, Zhichao [1 ]
Yan, Hang [1 ]
Qiu, Xipeng [1 ]
Huang, Xuanjing [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present fastHan, an open-source toolkit for four basic tasks in Chinese natural language processing: Chinese word segmentation (CWS), Part-of-Speech (POS) tagging, named entity recognition (NER), and dependency parsing. The backbone of fastHan is a multi-task model based on a pruned BERT, which uses the first 8 layers in BERT. We also provide a 4-layer base model compressed from the 8-layer model. The joint-model is trained and evaluated on 13 corpora of four tasks, yielding near state-of-the-art (SOTA) performance in dependency parsing and NER, achieving SOTA performance in CWS and POS. Besides, fastHan's transferability is also strong, performing much better than popular segmentation tools on a non-training corpus. To better meet the need of practical application, we allow users to use their own labeled data to further fine-tune fastHan. In addition to its small size and excellent performance, fastHan is user-friendly. Implemented as a python package, fastHan isolates users from the internal technical details and is convenient to use. The project is released on Github(1).
引用
收藏
页码:99 / 106
页数:8
相关论文
共 50 条
  • [21] MTRec: Multi-Task Learning over BERT for News Recommendation
    Bi, Qiwei
    Li, Jian
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Yang, Hanfang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2663 - 2669
  • [22] Towards Model-Based Data Acquisition for Subjective Multi-Task NLP Problems
    Kanclerz, Kamil
    Bielaniewicz, Julita
    Gruza, Marcin
    Kocon, Jan
    Wozniak, Stanislaw
    Kazienko, Przemyslaw
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 726 - 735
  • [23] Joint Chinese Event Extraction Based Multi-task Learning
    He R.-F.
    Duan S.-Y.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (04): : 1015 - 1030
  • [24] An efficient NMPC-based multi-task control toolkit for remote handling applications
    Zhang, Xuanchen
    Yang, Yang
    Pan, Hongtao
    Cheng, Yong
    Song, Yuntao
    FUSION ENGINEERING AND DESIGN, 2023, 187
  • [25] A BERT-Based Two-Stage Model for Chinese Chengyu Recommendation
    Tan, Minghuan
    Jiang, Jing
    Dai, Bing Tian
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (06)
  • [26] An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining
    Peng, Yifan
    Chen, Qingyu
    Lu, Zhiyong
    19TH SIGBIOMED WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2020), 2020, : 205 - 214
  • [27] Multi-Task Learning Model Based on BERT and Knowledge Graph for Aspect-Based Sentiment Analysis
    He, Zhu
    Wang, Honglei
    Zhang, Xiaoping
    ELECTRONICS, 2023, 12 (03)
  • [28] BERT-based Regression Model for Micro-edit Humor Classification Task
    Chen, Yuancheng
    Hou, Yi
    Ye, Deqiang
    Yu, Yuehang
    2021 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, INFORMATION AND COMMUNICATION ENGINEERING, 2021, 11933
  • [29] A Framework for Area-efficient Multi-task BERT Execution on ReRAM-based Accelerators
    Kang, Myeonggu
    Shin, Hyein
    Shin, Jaekang
    Kim, Lee-Sup
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
  • [30] A BERT-based system for multi-topic labeling of Arabic content
    Ghourabi, Abdallah
    2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 486 - 489