Reuse and plagiarism in Speech and Natural Language Processing publications

被引:5
|
作者
Mariani, Joseph [1 ]
Francopoulo, Gil [1 ,2 ]
Paroubek, Patrick [1 ]
机构
[1] Univ Paris Saclay, CNRS, LIMSI, Orsay, France
[2] Tagmatica, Paris, France
关键词
Plagiarism; Detection; Text reuse; Natural Language Processing; Speech Processing; Scientometrics; Informetrics;
D O I
10.1007/s00799-017-0211-0
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The aim of this experiment is to present an easy way to compare fragments of texts in order to detect (supposed) results of copy and paste operations between articles in the domain of Natural Language Processing (NLP), including Speech Processing. The search space of the comparisons is a corpus labeled as NLP4NLP, which includes 34 different conferences and journals and gathers a large part of the NLP activity over the past 50 years. This study considers the similarity between the papers of each individual event and the complete set of papers in the whole corpus, according to four different types of relationship (self-reuse, self-plagiarism, reuse and plagiarism) and in both directions: a paper borrowing a fragment of text from another paper of the corpus (that we will call the source paper), or in the reverse direction, fragments of text from the source paper being borrowed and inserted in another paper of the corpus. The results show that self-reuse is rather a common practice, but that plagiarism seems to be very unusual, and that both stay within legal and ethical limits.
引用
收藏
页码:113 / 126
页数:14
相关论文
共 50 条
  • [31] A Natural Language Processing System for Extracting Evidence of Drug Repurposing from Scientific Publications
    Subramanian, Shivashankar
    Baldini, Ioana
    Ravichandran, Sushma
    Katz-Rogozhnikov, Dmitriy A.
    Ramamurthy, Karthikeyan Natesan
    Sattigeri, Prasanna
    Varshney, Kush R.
    Wang, Annmarie
    Mangalath, Pradeep
    Kleiman, Laura B.
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13376 - 13381
  • [32] Towards robust tags for scientific publications from natural language processing tools and Wikipedia
    Lopuszynski, Michal
    Bolikowski, Lukasz
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2015, 16 (01) : 25 - 36
  • [33] Using Natural Language Processing Techniques and Fuzzy-Semantic Similarity for Automatic External Plagiarism Detection
    Gupta, Deepa
    Vani, K.
    Singh, Charan Kamal
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2694 - 2699
  • [34] Plagiarism Detection System for Indonesia Text Based Document by Fingerprint Method and Natural Language Processing Approach
    Winarti, Titin
    Kerami, Djati
    Etp, Lussiana
    Sekarwati, Kemal Ade
    ADVANCED SCIENCE LETTERS, 2016, 22 (10) : 3128 - 3131
  • [35] An application of DICOM architecture for detecting plagiarism in natural language
    Kim, H
    Kang, YK
    Kwon, PJ
    Kim, MY
    PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, VOLS 1 AND 2, 2005, : 816 - 819
  • [36] Trends in Speech and Language Processing
    Feng, Junlan
    Ramabhadran, Bhuvana
    Hansen, John H. L.
    Williams, Jason D.
    IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (01) : 177 - 179
  • [37] Prediction in speech and language processing
    Tavano, Alessandro
    Scharinger, Mathias
    CORTEX, 2015, 68 : 1 - 7
  • [38] Introducing speech and language processing
    Harper, M
    COMPUTATIONAL LINGUISTICS, 2006, 32 (01) : 137 - 142
  • [39] Robust speech recognition in sports competition review based on natural language processing
    Wang, Penglong
    Feng, Yuhong
    Xi, Yongping
    Yang, Shengdong
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023,
  • [40] A systematic review of hate speech automatic detection using natural language processing
    Jahan, Md Saroar
    Oussalah, Mourad
    NEUROCOMPUTING, 2023, 546