DNA Sequences Representation Derived from Discrete Wavelet Transformation for Text Similarity Recognition

被引:2
|
作者
Phan Hieu Ho [1 ]
Ngoc Anh Thi Nguyen [1 ]
Trung Hung Vo [1 ]
机构
[1] Univ Danang, 41 Leduan St, Danang City, Vietnam
关键词
Text similarity; Discrete Wavelet Transformation; Text analysis and mining; Plagiarism system; Euclidean measurement;
D O I
10.1007/978-3-319-76081-0_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing text similarity, also known as duplicated documents, is considered as the most important solution for plagiarism detection which is a rising dramatically in the era of digital revolution recently. With the aim to contribute an efficient plagiarism system, we investigate a new approach for in text similarity mining via DNA sequences representation derived from Discrete Wavelet Transformation (DWT). Consequently, the contribution of the paper is classified as threefold. Firstly, we convert the raw source materials into a unique set of floating-number series called a DeoxyriboNucleic Acid (DNA) sequences using DWT. The DNA-based structure then is also required for the testing documents input at the second step. Lastly, text similarity discovery algorithm is performed for those given input DNA strings via computing the Euclidean distance. The experimental result demonstrates the advantages of the proposed method with very high precision for detecting text similarity on standard dataset of PAN, known as Plagiarism Analysis, Authorship Identification, and Near-Duplicate detection.
引用
收藏
页码:75 / 85
页数:11
相关论文
共 50 条
  • [31] Similarity studies of DNA sequences based on a new 2D graphical representation
    Huang, Guohua
    Liao, Bo
    Li, Yongfan
    Yu, Yougui
    BIOPHYSICAL CHEMISTRY, 2009, 143 (1-2) : 55 - 59
  • [32] Analysis of similarity/dissimilarity of DNA sequences by a new 3D graphical representation
    Song, Jie
    JOURNAL OF BIOLOGICAL SYSTEMS, 2007, 15 (03) : 287 - 297
  • [33] Trace Representation of the Sequences Derived from Polynomial Quotient
    Zhao, Liping
    Du, Xiaoni
    Wu, Chenhuang
    CLOUD COMPUTING AND SECURITY, PT IV, 2018, 11066 : 26 - 37
  • [34] Numerical characterization and similarity analysis of DNA sequences based on 2-D graphical representation of the characteristic sequences
    Li, C
    Wang, J
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2003, 6 (08) : 795 - 799
  • [35] New cepstral representation using wavelet analysis and spectral transformation for robust speech recognition
    Wassner, H
    Chollet, G
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 260 - 263
  • [36] A Scene Text Detection Method Based on Neighbor-level Guided and Discrete Wavelet Transformation
    Peng, Shaohu
    Tan, Mincong
    Zhu, Shanshan
    2024 9TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS, ICCCS 2024, 2024, : 39 - 44
  • [37] MULTIDIMENSIONAL LETTER SIMILARITY DERIVED FROM RECOGNITION ERRORS
    GILMORE, GC
    HERSH, H
    CARAMAZZA, A
    GRIFFIN, J
    PERCEPTION & PSYCHOPHYSICS, 1979, 25 (05): : 425 - 431
  • [38] Recognition of specific sequences in DNA by a topoisomerase I inhibitor derived from the antitumor drug rebeccamycin
    Bailly, C
    Colson, P
    Houssier, C
    Rodrigues-Pereira, E
    Prudhomme, M
    Waring, MJ
    MOLECULAR PHARMACOLOGY, 1998, 53 (01) : 77 - 87
  • [39] Sparse representation approach for variation-robust face recognition using discrete wavelet transform
    Department of Computer science, Faculty of Science, Al-Azhar University, Cairo, Egypt
    不详
    不详
    Int. J. Comput. Sci. Issues, 6-3 (275-280):
  • [40] Terms derived from frequent sequences for extractive text summarization
    Ledeneva, Yulia
    Gelbukh, Alexander
    Garcia-Hernandez, Rene Arnulfo
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 593 - +