Tibetan word segmentation method based on CNN-BiLSTM-CRF model

被引:0
|
作者
Wang, Lili [2 ]
Yang, Hongwu [1 ,2 ,3 ]
Xing, Xiaotian [2 ]
Yan, Yajing [2 ]
机构
[1] Northwest Normal Univ, Coll Educ Technol, Lanzhou 730070, Peoples R China
[2] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou 730070, Peoples R China
[3] Natl & Prov Joint Engn Lab Learning Anal Technol, Lanzhou 730070, Peoples R China
基金
美国国家科学基金会;
关键词
Convolutional Neural Network; recurrent neural network; Conditional random field; Tibetan word segmentation;
D O I
10.1109/ialp48816.2019.9037661
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a Tibetan word segmentation method based on CNN-BiLSTM-CRF model that merely uses the characters of sentence as the input so that the method does not need large-scale corpus resources and manual features for training. Firstly, we use convolution neural network to train character vectors. Then the character vectors are searched through the character lookup table to form a matrix C by stacking searched results. Then the convolution operation between the matrix C and multiple filter matrices is carried out to obtain the character-level features of each Tibetan word by maximizing the pooling. We input the character vector into the BiLSTM-CRF model, which is suitable for Tibetan word segmentation through the highway network, for getting a Tibetan word segmentation model that is optimized by using the character vector and CRF model. For Tibetan language with rich morphology, fewer parameters and faster training time make this model better than BiLSTM-CRF model in the performance of character level. The experimental results show that character input is sufficient for language modeling. The robustness of Tibetan word segmentation is improved by the model that can achieves 95.17% of the F value.
引用
收藏
页码:319 / 324
页数:6
相关论文
共 50 条
  • [31] Tibetan number identification based on classification of number components in tibetan word segmentation
    Institute of Software, Graduate University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
    不详
    不详
    Coling - Int. Conf. Comput. Linguist., Proc. Conf., (719-724):
  • [32] Method for predicting cotton yield based on CNN-BiLSTM
    Dai J.
    Jiang N.
    Xue J.
    Zhang G.
    He X.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2021, 37 (17): : 152 - 159
  • [33] An Automatic Sleep Staging Method Based on CNN-BiLSTM
    Luo S.-L.
    Hao J.-W.
    Pan L.-M.
    Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2020, 40 (07): : 746 - 752
  • [34] Construction of knowledge graph of forest musk deer based on BiLSTM-CRF model and DPA method
    Yang, Chuqiao
    Wang, Haiyan
    Sai, Jingyan
    Zhang, Peiwei
    Yan, Mengyao
    2020 INTERNATIONAL CONFERENCE ON GREEN DEVELOPMENT AND ENVIRONMENTAL SCIENCE AND TECHNOLOGY, 2020, 615
  • [35] Taxi Demand Method Based on SCSSA-CNN-BiLSTM
    Guo, Dudu
    Sun, Miao
    Wang, Qingqing
    Zhang, Jinquan
    SUSTAINABILITY, 2024, 16 (18)
  • [36] Prediction Method of Dissolved Gas Concentration in Transformer Oil Based on CNN-BiLSTM Model
    Li, Xiaoping
    Bai, Chao
    Shi, Sen
    Tiedao Xuebao/Journal of the China Railway Society, 2022, 44 (05): : 42 - 48
  • [37] An emotion recognition method based on EWT-3D-CNN-BiLSTM-GRU-AT model
    Celebi, Muharrem
    Ozturk, Sitki
    Kaplan, Kaplan
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 169
  • [38] Research Paper Classification Based on CNN and BiLSTM Models Utilizing Word Embedding Methods
    Biswas, Dipto
    Byun, Tae-Young
    Gil, Joon-Min
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2025, 15
  • [39] Research and Implementation of Tibetan Word Segmentation Based on Syllable Methods
    Jiang, Jing
    Li, Yachao
    Jiang, Tao
    Yu, Hongzhi
    2017 INTERNATIONAL SYMPOSIUM ON APPLICATION OF MATERIALS SCIENCE AND ENERGY MATERIALS (SAMSE 2017), 2018, 322
  • [40] AN AUTOMATIC SEGMENTATION METHOD OF LEFT MYOCARDIUM BASED ON SSD MODEL AND CNN
    Wu, Shengjie
    Yang, Feng
    Ma, Haoyuan
    An, Gaoyun
    2017 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS 2017), 2017, : 12 - 16