DNA Sequences Representation Derived from Discrete Wavelet Transformation for Text Similarity Recognition

被引:2
|
作者
Phan Hieu Ho [1 ]
Ngoc Anh Thi Nguyen [1 ]
Trung Hung Vo [1 ]
机构
[1] Univ Danang, 41 Leduan St, Danang City, Vietnam
关键词
Text similarity; Discrete Wavelet Transformation; Text analysis and mining; Plagiarism system; Euclidean measurement;
D O I
10.1007/978-3-319-76081-0_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing text similarity, also known as duplicated documents, is considered as the most important solution for plagiarism detection which is a rising dramatically in the era of digital revolution recently. With the aim to contribute an efficient plagiarism system, we investigate a new approach for in text similarity mining via DNA sequences representation derived from Discrete Wavelet Transformation (DWT). Consequently, the contribution of the paper is classified as threefold. Firstly, we convert the raw source materials into a unique set of floating-number series called a DeoxyriboNucleic Acid (DNA) sequences using DWT. The DNA-based structure then is also required for the testing documents input at the second step. Lastly, text similarity discovery algorithm is performed for those given input DNA strings via computing the Euclidean distance. The experimental result demonstrates the advantages of the proposed method with very high precision for detecting text similarity on standard dataset of PAN, known as Plagiarism Analysis, Authorship Identification, and Near-Duplicate detection.
引用
收藏
页码:75 / 85
页数:11
相关论文
共 50 条
  • [41] Characterization and. similarity analysis of DNA sequences grounded on a 2-D graphical representation
    Wang, Jun
    Zhang, Yi
    CHEMICAL PHYSICS LETTERS, 2006, 423 (1-3) : 50 - 53
  • [42] Analysis of similarity/dissimilarity of DNA sequences based on a class of 2D graphical representation
    Yao, Yu-Hua
    Dai, Qi
    Nan, Xu-Ying
    He, Ping-An
    Nie, Zuo-Ming
    Zhou, Song-Ping
    Zhang, Yao-Zhou
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2008, 29 (10) : 1632 - 1639
  • [43] Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation
    Randic, M
    Vracko, M
    Lers, N
    Plavsic, D
    CHEMICAL PHYSICS LETTERS, 2003, 371 (1-2) : 202 - 207
  • [44] Analysis of DNA sequences similarity based on a new 3-D graphical representation method
    Singh, Kshatrapal
    Kumar, Ashish
    Gupta, Manoj Kumar
    ROMANIAN JOURNAL OF INFORMATION TECHNOLOGY AND AUTOMATIC CONTROL-REVISTA ROMANA DE INFORMATICA SI AUTOMATICA, 2021, 31 (03): : 7 - 14
  • [45] Wavelet-based multifractal analysis of DNA sequences by using chaos-game representation
    韩佳静
    符维娟
    Chinese Physics B, 2010, 19 (01) : 22 - 29
  • [46] Wavelet-based multifractal analysis of DNA sequences by using chaos-game representation
    Han Jia-Jing
    Fu Wei-Juan
    CHINESE PHYSICS B, 2010, 19 (01)
  • [47] Geologic heterogeneity recognition using discrete wavelet transformation for subsurface flow solute transport simulations
    Mustapha, Hussein
    Chatterjee, Snehamoy
    Dimitrakopoulos, Roussos
    Graf, Thomas
    ADVANCES IN WATER RESOURCES, 2013, 54 : 22 - 37
  • [48] Markov model recognition and classification of DNA/protein sequences within large text databases
    Wren, JD
    Hildebrand, WH
    Chandrasekaran, S
    Melcher, U
    BIOINFORMATICS, 2005, 21 (21) : 4046 - 4053
  • [49] DNA Motif Recognition Modeling from Protein Sequences
    Wong, Ka-Chun
    ISCIENCE, 2018, 7 : 198 - +
  • [50] Discrete Wavelet Transform Coefficients for Emotion Recognition from EEG Signals
    Yohanes, Rendi E. J.
    Ser, Wee
    Huang, Guang-bin
    2012 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2012, : 2251 - 2254