Character-Based N-gram Model for Uyghur Text Retrieval

被引:0
|
作者
Tohti, Turdi [1 ,2 ]
Xu, Lirui [1 ]
Huang, Jimmy [2 ]
Musajan, Winira [1 ]
Hamdulla, Askar [1 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] York Univ, Informat Retrieval & Knowledge Management Res Lab, Toronto, ON, Canada
来源
基金
中国国家自然科学基金;
关键词
Uyghur; Information retrieval; Stemming; N-gram; Lucene;
D O I
10.1007/978-3-319-97909-0_72
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Uyghur is a low resourced language, but Uyghur Information Retrieval (IR) is getting more and more important recently. Although there are related research results and stem-based Uyghur IR systems, it is always difficult to obtain high-performance retrieval results due to the limitations of the existing stemming method. In this paper, we propose a character-based N-gram model and the corresponding smoothing algorithm for Uyghur IR. A full-text IR system based on character N-gram model is developed using the open-source tool Lucene. A series of experiments and comparative analysis are conducted. Experimental results show that our proposed method has the better performance compared with conventional Uyghur IR systems.
引用
收藏
页码:678 / 688
页数:11
相关论文
共 50 条
  • [31] Robust n-gram model of Japanese character and its application to document recognition
    Mori, H
    Aso, H
    Makino, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (05) : 471 - 476
  • [32] Pipilika N-gram Viewer: An Efficient Large Scale N-gram Model for Bengali
    Ahmad, Adnan
    Talha, Mahbubur Rub
    Amin, Md. Ruhul
    Chowdhury, Farida
    [J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [33] Pseudo-Conventional N-Gram Representation of the Discriminative N-Gram Model for LVCSR
    Zhou, Zhengyu
    Meng, Helen
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 943 - 952
  • [34] N-GRAM ANALYSIS OF TEXT DOCUMENTS IN SERBIAN LANGUAGE
    Marovac, Ulfeta
    Pljaskovic, Aldina
    Crnisanin, Adela
    Kajan, Ejub
    [J]. 2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 1385 - 1388
  • [35] gMLP guided deep networks model for character-based handwritten text transcription
    Mouad Bensouilah
    Mokhtar Taffar
    Mohamed Nadjib Zennir
    [J]. Multimedia Tools and Applications, 2024, 83 : 13557 - 13575
  • [36] Improved Text Generation Using N-gram Statistics
    de Novais, Eder Miranda
    Tadeu, Thiago Dias
    Paraboni, Ivandre
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 316 - 325
  • [37] gMLP guided deep networks model for character-based handwritten text transcription
    Bensouilah, Mouad
    Taffar, Mokhtar
    Zennir, Mohamed Nadjib
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 13557 - 13575
  • [38] Character-based handwritten text transcription with attention networks
    Jason Poulos
    Rafael Valle
    [J]. Neural Computing and Applications, 2021, 33 : 10563 - 10573
  • [39] Words prediction based on N-gram model for free-text entry in electronic health records
    Azita Yazdani
    Reza Safdari
    Ali Golkar
    Sharareh R. Niakan Kalhori
    [J]. Health Information Science and Systems, 7
  • [40] Character-based handwritten text transcription with attention networks
    Poulos, Jason
    Valle, Rafael
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10563 - 10573