Document Similarity Detection using K-Means and Cosine Distance

被引:0
|
作者
Usino, Wendi [1 ]
Prabuwono, Anton Satria [1 ,2 ]
Allehaibi, Khalid Hamed S. [3 ]
Bramantoro, Arif [1 ,2 ]
Hasniaty, A. [4 ,5 ]
Amaldi, Wahyu [1 ]
机构
[1] Univ Budi Luhur, Fac Informat Technol, Jakarta, Indonesia
[2] Rabigh King Abdulaziz Univ, Fac Comp & Informat Technol, Rabigh, Saudi Arabia
[3] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
[4] Univ Kebangsaan Malaysia, Inst Visual Informat, Bangi, Selangor, Malaysia
[5] Univ Hasanuddin, Fac Engn, Makassar, Indonesia
关键词
K-means; cosine distance; cluster; document similarity; document frequency; inverse document frequency; preprocessing; vector space model;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.
引用
收藏
页码:165 / 170
页数:6
相关论文
共 50 条
  • [41] Similarity Clustering for Data Fusion in Wireless Sensor Networks using k-means
    Ribas, Afonso D.
    Colonna, Juan G.
    Figueiredo, Carlos M. S.
    Nakamura, Eduardo F.
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [42] Text Document Clustering on the basis of Inter passage approach by using K-means
    Mishra, Rupesh Kumar
    Saini, Kanika
    Bagri, Sakshi
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 110 - 113
  • [43] Fast global k-means with similarity functions algorithm
    Lopez-Escobar, Saul
    Carrasco-Ochoa, J. A.
    Martinez-Trinidad, J. Fco
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2006, PROCEEDINGS, 2006, 4224 : 512 - 521
  • [44] Document Clustering - A Feasible Demonstration with K-means Algorithm
    Arif, Wajiha
    Mahoto, Naeem Ahmed
    2019 2ND INTERNATIONAL CONFERENCE ON COMPUTING, MATHEMATICS AND ENGINEERING TECHNOLOGIES (ICOMET), 2019,
  • [45] Automated Platelet Counter with Detection Using K-Means Clustering
    Ibrahim S.
    Fauzi M.F.A.
    Mangshor N.N.A.
    Aminuddin R.
    Sunarko B.
    Annals of Emerging Technologies in Computing, 2023, 7 (05): : 39 - 49
  • [46] Progression Detection of Glaucoma Using K-means and GLCM Algorithm
    Vimal, S.
    Robinson, Y. Harold
    Kaliappan, M.
    Vijayalakshmi, K.
    Seo, Sanghyun
    ADVANCES IN ARTIFICIAL INTELLIGENCE AND APPLIED COGNITIVE COMPUTING, 2021, : 863 - 868
  • [47] Vegetable Disease Detection Using K-Means Clustering And Svm
    Rahamathunnisa, U.
    Nallakaruppan, M. K.
    Anith, A.
    Kumar, K. S. Sendhil
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 1308 - 1311
  • [48] Development of a Corruption Detection Algorithm using K-means Clustering
    Islam, Md. Tawheedul
    Abu Yousuf, Mohammad
    2018 INTERNATIONAL CONFERENCE ON ADVANCEMENT IN ELECTRICAL AND ELECTRONIC ENGINEERING (ICAEEE), 2018,
  • [49] Failure Detection in Quadcopter UAVs Using K-Means Clustering
    Cabahug, James
    Eslamiat, Hossein
    SENSORS, 2022, 22 (16)
  • [50] Rain drop Detection and Removal using K-Means Clustering
    Kanthan, M. Ramesh
    Sujatha, S. Naganandini
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2015, : 811 - 815