DNA Sequences Representation Derived from Discrete Wavelet Transformation for Text Similarity Recognition

被引：2

作者：

Phan Hieu Ho ^{[1
]}

Ngoc Anh Thi Nguyen ^{[1
]}

Trung Hung Vo ^{[1
]}

机构：

[1] Univ Danang, 41 Leduan St, Danang City, Vietnam

来源：

MODERN APPROACHES FOR INTELLIGENT INFORMATION AND DATABASE SYSTEMS | 2018年 / 769卷

关键词：

Text similarity; Discrete Wavelet Transformation; Text analysis and mining; Plagiarism system; Euclidean measurement;

D O I：

10.1007/978-3-319-76081-0_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognizing text similarity, also known as duplicated documents, is considered as the most important solution for plagiarism detection which is a rising dramatically in the era of digital revolution recently. With the aim to contribute an efficient plagiarism system, we investigate a new approach for in text similarity mining via DNA sequences representation derived from Discrete Wavelet Transformation (DWT). Consequently, the contribution of the paper is classified as threefold. Firstly, we convert the raw source materials into a unique set of floating-number series called a DeoxyriboNucleic Acid (DNA) sequences using DWT. The DNA-based structure then is also required for the testing documents input at the second step. Lastly, text similarity discovery algorithm is performed for those given input DNA strings via computing the Euclidean distance. The experimental result demonstrates the advantages of the proposed method with very high precision for detecting text similarity on standard dataset of PAN, known as Plagiarism Analysis, Authorship Identification, and Near-Duplicate detection.

引用

页码：75 / 85

页数：11

共 50 条

[21] On the similarity/dissimilarity of DNA sequences based on 4D graphical representation
Tang XiaoChan
Zhou PanPan
Qiu WenYuan
CHINESE SCIENCE BULLETIN, 2010, 55 (08): : 701 - 704
[22] On the similarity/dissimilarity of DNA sequences based on 4D graphical representation
TANG XiaoChanZHOU PanPan QIU WenYuan Department of ChemistryState Key Laboratory of Applied Organic ChemistryLanzhou UniversityLanzhou China
Chinese Science Bulletin, 2010, 55 (08) : 701 - 704
[23] Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison
Hoang, Tung
Yin, Changchuan
Yau, Stephen S. -T.
GENOMICS, 2016, 108 (3-4) : 134 - 142
[24] Accurate similarity transformation derived from the discrete Lotka-Volterra system for bidiagonal singular values
Nagata, Munehiro
Iwasaki, Masashi
Nakamura, Yoshimasa
CALCOLO, 2014, 51 (02) : 305 - 317
[25] Improved Algorithm for the Detection of Cancerous Cells Using Discrete Wavelet Transformation of Genomic Sequences
Mariapushpam, Inbamalar Tharcis
Rajagopal, Sivakumar
CURRENT BIOINFORMATICS, 2017, 12 (06) : 543 - 550
[26] The representation of space in mental models derived from text
Langston, W
Kramer, DC
Glenberg, AM
MEMORY & COGNITION, 1998, 26 (02) : 247 - 262
[27] The representation of space in mental models derived from text
William Langston
Douglas C. Kramer
Arthur M. Glenberg
Memory & Cognition, 1998, 26 : 247 - 262
[28] Similarity analysis for DNA sequences based on chaos game representation Case study The albumin
Stan, Cristina
Cristescu, Constantin P.
Scarlat, Eugen I.
JOURNAL OF THEORETICAL BIOLOGY, 2010, 267 (04) : 513 - 518
[29] Analysis of similarity/dissimilarity of DNA sequences based on a 3-D graphical representation
Yao, YH
Nan, XY
Wang, TM
CHEMICAL PHYSICS LETTERS, 2005, 411 (1-3) : 248 - 255
[30] Similarity/dissimilarity analysis of DNA sequences based on a 3D graphical representation
Huang, Hailan
Shi, Long
2009 3RD INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1-11, 2009, : 457 - +

← 1 2 3 4 5 →