DNA Sequences Representation Derived from Discrete Wavelet Transformation for Text Similarity Recognition

被引:2
|
作者
Phan Hieu Ho [1 ]
Ngoc Anh Thi Nguyen [1 ]
Trung Hung Vo [1 ]
机构
[1] Univ Danang, 41 Leduan St, Danang City, Vietnam
关键词
Text similarity; Discrete Wavelet Transformation; Text analysis and mining; Plagiarism system; Euclidean measurement;
D O I
10.1007/978-3-319-76081-0_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing text similarity, also known as duplicated documents, is considered as the most important solution for plagiarism detection which is a rising dramatically in the era of digital revolution recently. With the aim to contribute an efficient plagiarism system, we investigate a new approach for in text similarity mining via DNA sequences representation derived from Discrete Wavelet Transformation (DWT). Consequently, the contribution of the paper is classified as threefold. Firstly, we convert the raw source materials into a unique set of floating-number series called a DeoxyriboNucleic Acid (DNA) sequences using DWT. The DNA-based structure then is also required for the testing documents input at the second step. Lastly, text similarity discovery algorithm is performed for those given input DNA strings via computing the Euclidean distance. The experimental result demonstrates the advantages of the proposed method with very high precision for detecting text similarity on standard dataset of PAN, known as Plagiarism Analysis, Authorship Identification, and Near-Duplicate detection.
引用
收藏
页码:75 / 85
页数:11
相关论文
共 50 条
  • [21] On the similarity/dissimilarity of DNA sequences based on 4D graphical representation
    Tang XiaoChan
    Zhou PanPan
    Qiu WenYuan
    CHINESE SCIENCE BULLETIN, 2010, 55 (08): : 701 - 704
  • [22] On the similarity/dissimilarity of DNA sequences based on 4D graphical representation
    TANG XiaoChanZHOU PanPan QIU WenYuan Department of ChemistryState Key Laboratory of Applied Organic ChemistryLanzhou UniversityLanzhou China
    Chinese Science Bulletin, 2010, 55 (08) : 701 - 704
  • [23] Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison
    Hoang, Tung
    Yin, Changchuan
    Yau, Stephen S. -T.
    GENOMICS, 2016, 108 (3-4) : 134 - 142
  • [24] Accurate similarity transformation derived from the discrete Lotka-Volterra system for bidiagonal singular values
    Nagata, Munehiro
    Iwasaki, Masashi
    Nakamura, Yoshimasa
    CALCOLO, 2014, 51 (02) : 305 - 317
  • [25] Improved Algorithm for the Detection of Cancerous Cells Using Discrete Wavelet Transformation of Genomic Sequences
    Mariapushpam, Inbamalar Tharcis
    Rajagopal, Sivakumar
    CURRENT BIOINFORMATICS, 2017, 12 (06) : 543 - 550
  • [26] The representation of space in mental models derived from text
    Langston, W
    Kramer, DC
    Glenberg, AM
    MEMORY & COGNITION, 1998, 26 (02) : 247 - 262
  • [27] The representation of space in mental models derived from text
    William Langston
    Douglas C. Kramer
    Arthur M. Glenberg
    Memory & Cognition, 1998, 26 : 247 - 262
  • [28] Similarity analysis for DNA sequences based on chaos game representation Case study The albumin
    Stan, Cristina
    Cristescu, Constantin P.
    Scarlat, Eugen I.
    JOURNAL OF THEORETICAL BIOLOGY, 2010, 267 (04) : 513 - 518
  • [29] Analysis of similarity/dissimilarity of DNA sequences based on a 3-D graphical representation
    Yao, YH
    Nan, XY
    Wang, TM
    CHEMICAL PHYSICS LETTERS, 2005, 411 (1-3) : 248 - 255
  • [30] Similarity/dissimilarity analysis of DNA sequences based on a 3D graphical representation
    Huang, Hailan
    Shi, Long
    2009 3RD INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1-11, 2009, : 457 - +