Low-Complexity and Secure Clustering-Based Similarity Detection for Private Files

被引:0
|
作者
Najem, Duaa Fadhel [1 ]
Taha, Nagham Abdulrasool [2 ]
Abduljabbar, Zaid Ameen [2 ,3 ,4 ]
Nyangaresi, Vincent Omollo [5 ,6 ]
Ma, Junchao [3 ]
Honi, Dhafer G. [2 ,7 ]
机构
[1] Univ Basrah, Coll Comp Sci & Informat Technol, Dept Cyber Secur, Basrah 61004, Iraq
[2] Univ Basrah, Coll Educ Pure Sci, Dept Comp Sci, Basrah 61004, Iraq
[3] Shenzhen Technol Univ, Coll Big Data & Internet, Shenzhen 518118, Peoples R China
[4] Huazhong Univ Sci & Technol, Shenzhen Inst, Shenzhen 518000, Peoples R China
[5] Jaram Oginga Odinga Univ Sci & Technol, Dept Comp Sci & Software Engn, Bondo 40601, Kenya
[6] SIMATS, Saveetha Sch Engn, Dept Appl Elect, Chennai 600124, Tamilnadu, India
[7] Univ Debrecen, Dept IT, H-4002 Debrecen, Hungary
来源
TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS | 2024年 / 13卷 / 03期
关键词
File similarity; privacy; similarity detection;
D O I
10.18421/TEM133-61
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detection of the similarity between files is a requirement for many practical applications, such as copyright protection, file management, plagiarism detection, and detecting duplicate submissions of scientific articles to multiple journals or conferences. Existing methods have not taken into consideration file privacy, which prevents their use in many delicate situations, for example when comparing two intellectual agencies' files where files are meant to be secured, to find file similarities. Over the last few years, encryption protocols have been developed with the aim of detecting similar files without compromising privacy. However, existing protocols tend to leak important data, and do not have low complexity costs. This paper addresses the issue of computing the similarity between two file collections belonging to two entities who desire to keep their contents private. We propose a clustering-based approach that achieves 90% accuracy while significantly reducing the execution time. The protocols presented in this study are much more efficient than other secure protocols, and the alternatives are slower in terms of similarity detection for large file sets. Our system achieves a high level of security by using a vector space model to convert the files into vectors and by applying Paillier encryption to encrypt the elements of the vector separately, to protect privacy. The study uses the application of the Porter algorithm to the vocabulary set. Using a secure cosine similarity approach, a score for similar files was identified and the index of the similarity scores is returned to the other party, rather than the similar files themselves. The system is strengthened by using clustering for files, based on the k-means clustering technique, which makes it more efficient for large file sets.
引用
收藏
页码:2341 / 2349
页数:9
相关论文
共 50 条
  • [1] Low-complexity fake face detection based on forensic similarity
    Pan, Zhaoguang
    Ren, Yanli
    Zhang, Xinpeng
    MULTIMEDIA SYSTEMS, 2021, 27 (03) : 353 - 361
  • [2] Low-complexity fake face detection based on forensic similarity
    Zhaoguang Pan
    Yanli Ren
    Xinpeng Zhang
    Multimedia Systems, 2021, 27 : 353 - 361
  • [3] Clustering-based low-complexity resource allocation in two-tier femtocell networks with QoS provisioning
    Fu, Fengchao
    Lu, Zhaoming
    Xie, Yuanbao
    Jing, Wenpeng
    Wen, Xiangming
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2017, 30 (04)
  • [4] Low-complexity background subtraction based on spatial similarity
    Sangwook Lee
    Chulhee Lee
    EURASIP Journal on Image and Video Processing, 2014
  • [5] Low-complexity background subtraction based on spatial similarity
    Lee, Sangwook
    Lee, Chulhee
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2014, : 1 - 16
  • [6] A Clustering-Based Approach for Designing Low Complexity FIR Filters
    Nassralla, Mohammad H.
    Akl, Naeem
    Dawy, Zaher
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 299 - 303
  • [7] A clustering-based approach for designing low complexity FIR filters
    Nassralla, Mohammad H.
    Akl, Naeem
    Dawy, Zaher
    IEEE Signal Processing Letters, 2021, 28 : 299 - 303
  • [8] PERCEPTUAL SIMILARITY BASED ROBUST LOW-COMPLEXITY VIDEO FINGERPRINTING
    Vadivel, Karthikeyan Shanmuga
    Fernandes, Felix
    Ma, Zhan
    Lai, PoLin
    Saxena, Ankur
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 1337 - 1340
  • [9] Low-Complexity Index Assignments for Secure Quantization
    Almeida, Joao
    Maierbacher, Gerhard
    Barros, Joao
    2009 43RD ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS, VOLS 1 AND 2, 2009, : 930 - +
  • [10] A Novel Low-Complexity HMM Similarity Measure
    Sahraeian, Sayed Mohammad Ebrahim
    Yoon, Byung-Jun
    IEEE SIGNAL PROCESSING LETTERS, 2011, 18 (02) : 87 - 90