N-gram and local context analysis for Persian text retrieval

被引:0
|
作者
Aleahmad, Abolfazl [1 ]
Hakimian, Parsia [1 ]
Mahdikhani, Farzad [1 ]
Oroumchian, Farhad [1 ]
机构
[1] Univ Tehran, Dept Elect & Comp Engn, Tehran 14174, Iran
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Persian language is one of the languages in Midde-East, so there are significant amount of Persian documents available on the Web. But there are relatively few studies on retrieval of Persian documents in the literature. In this experimental study, we assessed term and N-gram based vector space model and a query expansion method namely, Local Context Analysis using different weighting schemes on a realistic corpus containing 160000+ news articles. Then we compared our results with previous works reported on Persian language. Our experimental results show that among the assessed methods, 4-gram based vector space model with Lnu.ltu weighting scheme has acceptable performance and Local Context Analysis has the best performance for Persian text retrieval so far.
引用
收藏
页码:284 / 287
页数:4
相关论文
共 50 条
  • [41] N-GRAM STATISTICS FOR NATURAL-LANGUAGE UNDERSTANDING AND TEXT PROCESSING
    SUEN, CY
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) : 164 - 172
  • [42] Sentiment Analysis Using N-gram Technique
    Chidananda, Himadri Tanaya
    Das, Debashis
    Sagnika, Santwana
    [J]. PROGRESS IN COMPUTING, ANALYTICS AND NETWORKING, ICCAN 2017, 2018, 710 : 359 - 367
  • [43] Efficient n-gram analysis in R with cmscu
    David W. Vinson
    Jason K. Davis
    Suzanne S. Sindi
    Rick Dale
    [J]. Behavior Research Methods, 2016, 48 : 909 - 921
  • [44] Short Text Clustering using Numerical data based on N-gram
    Kumar, Rajiv
    Mathur, Robin Prakash
    [J]. 2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 274 - 276
  • [45] A variant of n-gram based language-independent text categorization
    Graovac, Jelena
    [J]. INTELLIGENT DATA ANALYSIS, 2014, 18 (04) : 677 - 695
  • [46] A new type of feature - Loose N-gram feature in text categorization
    Zhang, Xian
    Zhu, Xiaoyan
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 1, PROCEEDINGS, 2007, 4477 : 378 - +
  • [47] An ensemble text classification model combining strong rules and N-Gram
    Liu, Jinhong
    Lu, Yuliang
    [J]. ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 3, PROCEEDINGS, 2007, : 535 - +
  • [48] Apriori and N-gram Based Chinese Text Feature Extraction Method
    王晔
    黄上腾
    [J]. Journal of Shanghai Jiaotong University(Science), 2004, (04) : 11 - 14
  • [49] Combining naive Bayes and n-gram language models for text classification
    Peng, FC
    Schuurmans, D
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 335 - 350
  • [50] Turkish Meaningful Text Generation with Class Based N-Gram Model
    Kutlugun, Mehmet Ali
    Sirin, Yahya
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,