N-gram and local context analysis for Persian text retrieval

被引:0
|
作者
Aleahmad, Abolfazl [1 ]
Hakimian, Parsia [1 ]
Mahdikhani, Farzad [1 ]
Oroumchian, Farhad [1 ]
机构
[1] Univ Tehran, Dept Elect & Comp Engn, Tehran 14174, Iran
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Persian language is one of the languages in Midde-East, so there are significant amount of Persian documents available on the Web. But there are relatively few studies on retrieval of Persian documents in the literature. In this experimental study, we assessed term and N-gram based vector space model and a query expansion method namely, Local Context Analysis using different weighting schemes on a realistic corpus containing 160000+ news articles. Then we compared our results with previous works reported on Persian language. Our experimental results show that among the assessed methods, 4-gram based vector space model with Lnu.ltu weighting scheme has acceptable performance and Local Context Analysis has the best performance for Persian text retrieval so far.
引用
收藏
页码:284 / 287
页数:4
相关论文
共 50 条
  • [11] A Neural N-Gram Network for Text Classification
    Yan, Zhenguo
    Wu, Yue
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2018, 22 (03) : 380 - 386
  • [12] Extracting precise n-gram probabilities from Persian SCFGs
    Sheikhshab, Golnar
    Bahrani, Mohammad
    Sameti, Hossein
    [J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND INFORMATION TECHNOLOGY (ICCCIT 2011), 2011, : 26 - 29
  • [13] Chinese Text Categorization Using the Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    [J]. 2012 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2012), 2012, : 722 - 726
  • [14] Multilingual Text Categorization Using Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    Hirasawa, Shigeichi
    [J]. 2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, : 49 - +
  • [15] Improved Text Generation Using N-gram Statistics
    de Novais, Eder Miranda
    Tadeu, Thiago Dias
    Paraboni, Ivandre
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 316 - 325
  • [16] n-BiLSTM: BiLSTM with n-gram Features for Text Classification
    Zhang, Yunxiang
    Rao, Zhuyi
    [J]. PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 1056 - 1059
  • [17] N-GRAM ANALYSIS IN THE ENGINEERING DOMAIN
    Leary, Martin
    Pearson, Geoff
    Burvill, Colin
    Mazur, Maciej
    Subic, Aleksandar
    [J]. PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 11): IMPACTING SOCIETY THROUGH ENGINEERING DESIGN, VOL 6: DESIGN INFORMATION AND KNOWLEDGE, 2011, 6 : 414 - 423
  • [18] An efficient document retrieval method using n-gram indexing
    Ogawa, Yasushi
    Matsuda, Toru
    [J]. Systems and Computers in Japan, 2002, 33 (02) : 54 - 63
  • [19] Answering questions with an n-gram based passage retrieval engine
    Davide Buscaldi
    Paolo Rosso
    José Manuel Gómez-Soriano
    Emilio Sanchis
    [J]. Journal of Intelligent Information Systems, 2010, 34 : 113 - 134
  • [20] Answering questions with an n-gram based passage retrieval engine
    Buscaldi, Davide
    Rosso, Paolo
    Manuel Gomez-Soriano, Jose
    Sanchis, Emilio
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2010, 34 (02) : 113 - 134