SanskritWord Segmentation Using Character-level Recurrent and Convolutional Neural Networks

被引:0
|
作者
Helwig, Oliver [1 ,2 ]
Nehrdich, Sebastian [3 ]
机构
[1] Univ Dusseldorf, SFB 991, Dusseldorf, Germany
[2] Univ Zurich, IVS, Zurich, Switzerland
[3] Univ Hamburg, Ctr Buddhist Studies, Hamburg, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper introduces end-to-end neural network models that tokenize Sanskrit by jointly splitting compounds and resolving phonetic merges (Sandhi). Tokenization of Sanskrit depends on local phonetic and distant semantic features that are incorporated using convolutional and recurrent elements. Contrary to most previous systems, our models do not require feature engineering or extern linguistic resources, but operate solely on parallel versions of raw and segmented text. The models discussed in this paper clearly improve over previous approaches to Sanskrit word segmentation. As they are language agnostic, we will demonstrate that they also outperform the state of the art for the related task of German compound splitting.
引用
收藏
页码:2754 / 2763
页数:10
相关论文
共 50 条
  • [1] Keyword Extraction with Character-Level Convolutional Neural Tensor Networks
    Lin, Zhe-Li
    Wang, Chuan-Ju
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT I, 2019, 11439 : 400 - 413
  • [2] Character-level Intrusion Detection Based on Convolutional Neural Networks
    Lin, Steven Z.
    Shi, Yong
    Xue, Zhi
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [3] CHARACTER-LEVEL LANGUAGE MODELING WITH HIERARCHICAL RECURRENT NEURAL NETWORKS
    Hwang, Kyuyeon
    Sung, Wonyong
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5720 - 5724
  • [4] CHARACTER-LEVEL INCREMENTAL SPEECH RECOGNITION WITH RECURRENT NEURAL NETWORKS
    Hwang, Kyuyeon
    Sung, Wonyong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5335 - 5339
  • [5] Automatically Classifying Chinese Judgment Documents Using Character-Level Convolutional Neural Networks
    Zhou, Xiaosong
    Li, Chuanyi
    Ge, Jidong
    Li, Zhongjin
    Zhou, Xiaoyu
    Luo, Bin
    [J]. PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2018, 11013 : 430 - 437
  • [6] Improving Bug Localization with Character-level Convolutional Neural Network and Recurrent Neural Network
    Xiao, Yan
    Keung, Jacky
    [J]. 2018 25TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2018), 2018, : 703 - 704
  • [7] Character-level Convolutional Networks for Text Classification
    Zhang, Xiang
    Zhao, Junbo
    Yann Lecun
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [8] Character-level Language Modeling with Gated Hierarchical Recurrent Neural Networks
    Choi, Iksoo
    Park, Jinhwan
    Sung, Wonyong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 411 - 415
  • [9] Character-level convolutional networks for arithmetic operator character recognition
    Liang, Zhijie
    Li, Qing
    Liao, Shengbin
    [J]. FIFTH INTERNATIONAL CONFERENCE ON EDUCATIONAL INNOVATION THROUGH TECHNOLOGY (EITT 2016), 2016, : 208 - 212
  • [10] Enhanced character-level deep convolutional neural networks for cardiovascular disease prediction
    Zhang, Zhichang
    Qiu, Yanlong
    Yang, Xiaoli
    Zhang, Minyu
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (Suppl 3)