COMPARISON OF VSM, GVSM, AND LSI IN INFORMATION RETRIEVAL FOR INDONESIAN TEXT

被引:0
|
作者
Pardede, Jasman [1 ]
Husada, Milda Gustiana [1 ]
机构
[1] Inst Teknol Nasional Itenas, Fac Ind Technol, Dept Informat, Bandung, Indonesia
来源
JURNAL TEKNOLOGI | 2016年 / 78卷 / 5-6期
关键词
VSM; GVSM; LSI; performance; multithread;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Vector space model (VSM) is an Information Retrieval (IR) system model that represents query and documents as n-dimension vector. GVSM is an expansion from VSM that represents the documents base on similarity value between query and minterm vector space of documents collection. Minterm vector is defined by the term in query. Therefore, in retrieving a document can be done base on word meaning inside the query. On the contrary, a document can consist the same information semantically. LSI is a method implemented in IR system to retrieve document base on overall meaning of users' query input from a document, not based on each word translation. LSI uses a matrix algebra technique namely Singular Value Decomposition (SVD). This study discusses the performance of VSM, GVSM and LSI that are implemented on IR to retrieve Indonesian sentences document of. pdf,. doc and. docx extension type files, by using Nazief and Adriani stemming algorithm. Each method implemented either by thread or no-thread. Thread is implemented in preprocessing process in reading each document from document collection and stemming process either for query or documents. The quality of information retrieval performance is evaluated based-on time response, values of recall, precision, and F-measure were measured. The results show that for each method, the fastest execution time is. docx extension type file followed by. doc and. pdf. For the same document collection, the results show that time response for LSI is more faster, followed by GVSM then VSM. The average of recall value for VSM, GVSM and LSI are 82.86 %, 89.68 % and 84.93 % respectively. The average of precision value for VSM, GVSM and LSI are 64.08 %, 67.51 % and 62.08 % respectively. The average of F-measure value for VSM, GVSM and LSI are 71.95 %, 76.63 % and 71.02 % respectively. Implementation of multithread for preprocessing for VSM, GVSM, and LSI can increase average time response required is about 30.422%, 26.282%, and 31.821% respectively.
引用
收藏
页码:51 / 56
页数:6
相关论文
共 50 条
  • [1] TFIDF, LSI and Multi-word in Information Retrieval and Text Categorization
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 108 - +
  • [2] Implementation of LSI Method on Information Retrieval for Text Document in Bahasa Indonesia
    Pardede, Jasman
    Barmawi, Mira Musrini
    [J]. INTERNETWORKING INDONESIA, 2016, 8 (01): : 83 - 87
  • [3] RESEARCH ON A NOVEL VSM INFORMATION RETRIEVAL TECHNOLOGY
    Liu, Shuang
    [J]. JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2024, 25 (06) : 1491 - 1500
  • [4] A COMPARISON OF RELATIONAL DATABASES AND INFORMATION RETRIEVAL LIBRARIES ON TURKISH TEXT RETRIEVAL
    Arslan, Ahmet
    Yilmazel, Ozgur
    [J]. IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 296 - 303
  • [5] Applying VSM and LCS to develop an integrated text retrieval mechanism
    Tasi, Cheng-Shiun
    Huang, Yong-Ming
    Liu, Chien-Hung
    Huang, Yueh-Min
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (04) : 3974 - 3982
  • [6] Augmenting the power of LSI in text retrieval: Singular value rescaling
    Yan, Hua
    Grosky, William I.
    Fotouhi, Farshad
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 65 (01) : 108 - 125
  • [7] Merging case relations into VSM to improve information retrieval precision
    Wang, HT
    Sun, MS
    Liu, SM
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 584 - 592
  • [8] Text Information Retrieval in Tetun
    de Jesus, Gabriel
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III, 2023, 13982 : 429 - 435
  • [9] Text databases and information retrieval
    [J]. ACM Comput Surv, 1 (133):
  • [10] Text Analysis and Information Retrieval of Text Data
    Gupta, Honey
    Kottwani, Aveena
    Gogia, Soniya
    Chaudhari, Sheetal
    [J]. PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 788 - 792