DNA Sequences Representation Derived from Discrete Wavelet Transformation for Text Similarity Recognition

被引:2
|
作者
Phan Hieu Ho [1 ]
Ngoc Anh Thi Nguyen [1 ]
Trung Hung Vo [1 ]
机构
[1] Univ Danang, 41 Leduan St, Danang City, Vietnam
关键词
Text similarity; Discrete Wavelet Transformation; Text analysis and mining; Plagiarism system; Euclidean measurement;
D O I
10.1007/978-3-319-76081-0_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing text similarity, also known as duplicated documents, is considered as the most important solution for plagiarism detection which is a rising dramatically in the era of digital revolution recently. With the aim to contribute an efficient plagiarism system, we investigate a new approach for in text similarity mining via DNA sequences representation derived from Discrete Wavelet Transformation (DWT). Consequently, the contribution of the paper is classified as threefold. Firstly, we convert the raw source materials into a unique set of floating-number series called a DeoxyriboNucleic Acid (DNA) sequences using DWT. The DNA-based structure then is also required for the testing documents input at the second step. Lastly, text similarity discovery algorithm is performed for those given input DNA strings via computing the Euclidean distance. The experimental result demonstrates the advantages of the proposed method with very high precision for detecting text similarity on standard dataset of PAN, known as Plagiarism Analysis, Authorship Identification, and Near-Duplicate detection.
引用
收藏
页码:75 / 85
页数:11
相关论文
共 50 条
  • [1] Detecting Text Similarity Based on Discrete Wavelet Transformation
    Vo, Trung Hung
    Felde, Imre
    Ho, Phan Hieu
    Nguyen, Ngoc Anh Thi
    ACTA POLYTECHNICA HUNGARICA, 2024, 21 (09) : 263 - 277
  • [2] Discrete representation learning for handwritten text recognition
    Homa Davoudi
    Arianna Traviglia
    Neural Computing and Applications, 2023, 35 : 15759 - 15773
  • [3] Graphical Representation and the Similarity of DNA Primary Sequences
    Zhang, Qing-You
    Xu, Lu
    Kao Teng Hsueh Hsiao Hua Heush Hsueh Pao/ Chemical Journal of Chinese Universities, 2002, 23 (07):
  • [4] Graphical representation and the similarity of DNA primary sequences
    Zhang, QY
    Xu, L
    CHEMICAL JOURNAL OF CHINESE UNIVERSITIES-CHINESE, 2002, 23 (07): : 1255 - 1258
  • [5] Discrete representation learning for handwritten text recognition
    Davoudi, Homa
    Traviglia, Arianna
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (21): : 15759 - 15773
  • [6] Analyzing functional similarity of protein sequences with discrete wavelet transform
    Wen, ZN
    Wang, KL
    Li, ML
    Nie, FS
    Yang, Y
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2005, 29 (03) : 220 - 228
  • [7] Sequence similarity search using discrete Fourier and wavelet transformation techniques
    Aghili, SA
    Agrawal, D
    El Abbadi, A
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2005, 14 (05) : 733 - 754
  • [8] Analysis of similarity/dissimilarity of DNA sequences based on a condensed curve representation
    Liao, B
    Zhang, Y
    Ding, KQ
    Wang, TM
    JOURNAL OF MOLECULAR STRUCTURE-THEOCHEM, 2005, 717 (1-3): : 199 - 203
  • [9] On the Similarity of DNA Primary Sequences Based on 5-D Representation
    Bo Liao
    Renfa Li
    Wen Zhu
    Xuyu Xiang
    Journal of Mathematical Chemistry, 2007, 42 : 47 - 57
  • [10] Graphical Representation and Similarity Analysis of DNA Sequences Based on Trigonometric Functions
    Guo-Sen Xie
    Xiao-Bo Jin
    Chunlei Yang
    Jiexin Pu
    Zhongxi Mo
    Acta Biotheoretica, 2018, 66 : 113 - 133