Tibetan word segmentation method based on CNN-BiLSTM-CRF model

被引:0
|
作者
Wang, Lili [2 ]
Yang, Hongwu [1 ,2 ,3 ]
Xing, Xiaotian [2 ]
Yan, Yajing [2 ]
机构
[1] Northwest Normal Univ, Coll Educ Technol, Lanzhou 730070, Peoples R China
[2] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou 730070, Peoples R China
[3] Natl & Prov Joint Engn Lab Learning Anal Technol, Lanzhou 730070, Peoples R China
基金
美国国家科学基金会;
关键词
Convolutional Neural Network; recurrent neural network; Conditional random field; Tibetan word segmentation;
D O I
10.1109/ialp48816.2019.9037661
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a Tibetan word segmentation method based on CNN-BiLSTM-CRF model that merely uses the characters of sentence as the input so that the method does not need large-scale corpus resources and manual features for training. Firstly, we use convolution neural network to train character vectors. Then the character vectors are searched through the character lookup table to form a matrix C by stacking searched results. Then the convolution operation between the matrix C and multiple filter matrices is carried out to obtain the character-level features of each Tibetan word by maximizing the pooling. We input the character vector into the BiLSTM-CRF model, which is suitable for Tibetan word segmentation through the highway network, for getting a Tibetan word segmentation model that is optimized by using the character vector and CRF model. For Tibetan language with rich morphology, fewer parameters and faster training time make this model better than BiLSTM-CRF model in the performance of character level. The experimental results show that character input is sufficient for language modeling. The robustness of Tibetan word segmentation is improved by the model that can achieves 95.17% of the F value.
引用
收藏
页码:319 / 324
页数:6
相关论文
共 50 条
  • [1] Tibetan Word Segmentation Method Based on BiLSTM_CRF Model
    Wang, Lili
    Yang, Hongwu
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 297 - 302
  • [2] Chinese Named Entity Recognition Based on CNN-BiLSTM-CRF
    Jia, Yaozong
    Xu, Xiaobin
    PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 831 - 834
  • [3] A network security entity recognition method based on feature template and CNN-BiLSTM-CRF
    Ya Qin
    Guo-wei Shen
    Wen-bo Zhao
    Yan-ping Chen
    Miao Yu
    Xin Jin
    Frontiers of Information Technology & Electronic Engineering, 2019, 20 : 872 - 884
  • [4] A network security entity recognition method based on feature template and CNN-BiLSTM-CRF
    Qin, Ya
    Shen, Guo-wei
    Zhao, Wen-bo
    Chen, Yan-ping
    Yu, Miao
    Jin, Xin
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2019, 20 (06) : 872 - 884
  • [5] HAZOP Text Named Entity Recognition using CNN-BilSTM-CRF Model
    Gao, Dong
    Peng, Lanfei
    Bai, Yujie
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 6159 - 6164
  • [6] 基于CNN-BiLSTM-CRF的企业舆情监控模型构建
    张欣艺
    郑军红
    何利力
    计算机时代, 2023, (11) : 136 - 140
  • [7] A BiLSTM-CRF Based Approach to Word Segmentation in Chinese
    Jin, Yuanyuan
    Tao, Shiyu
    Liu, Qi
    Liu, Xiaodong
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 568 - 571
  • [8] Text Word Segmentation of Livestock and Poultry Diseases Based on BERT BiLSTM CRF Model
    Yu L.
    Guo X.
    Zhao H.
    Yang C.
    Zhang J.
    Li Q.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (02): : 287 - 294
  • [9] Research on The Word Segmentation Model Construction Based on CNN plus BiLSTM plus HMM
    Sun, Xuemei
    Wen, Bin
    Fu, Rong
    PROCEEDINGS OF 2021 IEEE 12TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2021, : 140 - 144
  • [10] Research on Named Entity Recognition Method of Metro On-Board Equipment Based on Multiheaded Self-Attention Mechanism and CNN-BiLSTM-CRF
    Lin, Junting
    Liu, Endong
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022