N-gram and local context analysis for Persian text retrieval

被引:0
|
作者
Aleahmad, Abolfazl [1 ]
Hakimian, Parsia [1 ]
Mahdikhani, Farzad [1 ]
Oroumchian, Farhad [1 ]
机构
[1] Univ Tehran, Dept Elect & Comp Engn, Tehran 14174, Iran
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Persian language is one of the languages in Midde-East, so there are significant amount of Persian documents available on the Web. But there are relatively few studies on retrieval of Persian documents in the literature. In this experimental study, we assessed term and N-gram based vector space model and a query expansion method namely, Local Context Analysis using different weighting schemes on a realistic corpus containing 160000+ news articles. Then we compared our results with previous works reported on Persian language. Our experimental results show that among the assessed methods, 4-gram based vector space model with Lnu.ltu weighting scheme has acceptable performance and Local Context Analysis has the best performance for Persian text retrieval so far.
引用
收藏
页码:284 / 287
页数:4
相关论文
共 50 条
  • [1] N-gram Analysis of a Mongolian Text
    Altangerel, Khuder
    Tsend, Ganbat
    Jalsan, Khash-Erdene
    [J]. IFOST 2008: PROCEEDING OF THE THIRD INTERNATIONAL FORUM ON STRATEGIC TECHNOLOGIES, 2008, : 258 - 259
  • [2] Character N-Gram Tokenization for European Language Text Retrieval
    Paul McNamee
    James Mayfield
    [J]. Information Retrieval, 2004, 7 : 73 - 97
  • [3] Character N-gram tokenization for European language text retrieval
    McNamee, P
    Mayfield, J
    [J]. INFORMATION RETRIEVAL, 2004, 7 (1-2): : 73 - 97
  • [4] Evaluation of N-Gram Conflation Approaches for Arabic Text Retrieval
    Ahmed, Farag
    Nuernberger, Andreas
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (07): : 1448 - 1465
  • [5] Character-Based N-gram Model for Uyghur Text Retrieval
    Tohti, Turdi
    Xu, Lirui
    Huang, Jimmy
    Musajan, Winira
    Hamdulla, Askar
    [J]. BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 : 678 - 688
  • [6] N-gram over Context
    Kawamae, Noriaki
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, : 1045 - 1055
  • [7] N-GRAM ANALYSIS OF TEXT DOCUMENTS IN SERBIAN LANGUAGE
    Marovac, Ulfeta
    Pljaskovic, Aldina
    Crnisanin, Adela
    Kajan, Ejub
    [J]. 2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 1385 - 1388
  • [8] Text mining with n-gram variables
    Schonlau, Matthias
    Guenther, Nick
    Sucholutsky, Ilia
    [J]. STATA JOURNAL, 2017, 17 (04): : 866 - 881
  • [9] SEARCHING FOR TEXT - SEND AN N-GRAM
    KIMBRELL, RE
    [J]. BYTE, 1988, 13 (05): : 297 - &
  • [10] Are n-gram Categories Helpful in Text Classification?
    Kruczek, Jakub
    Kruczek, Paulina
    Kuta, Marcin
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 524 - 537