Design and implementation of automatic indexing for information retrieval with Arabic documents

被引:0
|
作者
Hmeidi, I
Kanaan, G
Evens, M
机构
[1] IIT,DEPT COMP SCI,CHICAGO,IL 60616
[2] JORDAN INST SCI & TECHNOL,IRBID,JORDAN
关键词
D O I
10.1002/(SICI)1097-4571(199710)48:10<867::AID-ASI3>3.0.CO;2-#
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the Saudi Arabian National Conferences as a source. All these abstracts involve computer science and information systems. We also designed and built an automatic information retrieval system from scratch to handle Arabic data. The system was implemented in the C language using the GCC compiler and runs on IBM/PCs and compatible microcomputers. We have implemented both automatic and manual indexing techniques for this corpus. A long series of experiments using measures of recall and precision has demonstrated that automatic indexing is at least as effective as manual indexing and more effective in some cases. Since automatic indexing is both cheaper and faster, our results suggest that we can achieve a wider coverage of the literature with less money and produce as good results as with manual indexing. We have also compared the retrieval results using words as index terms versus stems and roots, and confirmed the results obtained by Al-Kharashi and Abu-Salem with smaller corpora that root indexing is more effective than word indexing.
引用
收藏
页码:867 / 881
页数:15
相关论文
共 50 条
  • [1] AUTOMATIC-INDEXING OF DOCUMENTS FOR INFORMATION-RETRIEVAL SYSTEM DIALOG
    BELONOGOV, GG
    KUZNETSOV, BA
    KRICHEVSKII, VK
    [J]. NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1984, (08): : 10 - 14
  • [2] AUTOMATIC INDEXING OF CONNECTED TEXTS OF RETRIEVAL ANNOTATIONS OF DOCUMENTS FOR SEMANTIC INFORMATION SEARCHING
    PASHCHENKO, NA
    [J]. NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1972, (11): : 38 - 45
  • [3] Combining Indexing Units for Arabic Information Retrieval
    Ben Guirat, Souheila
    Bounhas, Ibrahim
    Slimani, Yahya
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2016, 4 (04) : 1 - 14
  • [4] Design and implementation of a structured information retrieval system for SGML documents
    Han, SG
    Son, JH
    Chang, JW
    Zhoo, ZC
    [J]. 6TH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 1999, : 81 - 88
  • [5] Semantic indexing of Arabic texts for information retrieval system
    Abderrahim, Mohammed Alaeddine
    Dib, Mohammed
    Abderrahim, Mohammed El-Amine
    Chikh, Mohammed Amine
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (02) : 229 - 236
  • [6] Pre-indexing Techniques in Arabic Information Retrieval
    Ben Guirat, Souheila
    Bounhas, Ibrahim
    Slimani, Yahia
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 237 - 246
  • [7] The design and implementation of the Chinese information retrieval with the automatically indexing method.
    Wang, LB
    Fan, BB
    Yang, HT
    [J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT'2003, PROCEEDINGS, 2003, : 851 - 854
  • [8] FEASIBILITY STUDY OF AUTOMATIC INDEXING AND INFORMATION RETRIEVAL
    GRAVES, RW
    HELANDER, DP
    [J]. IEEE TRANSACTIONS ON ENGINEERING WRITING AND SPEECH, 1970, EW13 (02): : 58 - 59
  • [9] Arabic Information Retrieval Using Semantic Analysis of Documents
    Al-Maghasbeh, Mohammad Khaled A.
    Bin Hamzah, Mohd Pouzi
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (05): : 53 - 58
  • [10] Automatic Indexing of Financial Documents via Information Extraction
    Ramamurthy, Rajkumar
    Luebbering, Max
    Bell, Thiago
    Gebauer, Michael
    Ulusay, Bilge
    Uedelhoven, Daniel
    Khameneh, Tim Dilmaghani
    Loitz, Ruediger
    Pielka, Maren
    Bauckhage, Christian
    Sifa, Rafet
    [J]. 2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,