Reuse and plagiarism in Speech and Natural Language Processing publications

被引:5
|
作者
Mariani, Joseph [1 ]
Francopoulo, Gil [1 ,2 ]
Paroubek, Patrick [1 ]
机构
[1] Univ Paris Saclay, CNRS, LIMSI, Orsay, France
[2] Tagmatica, Paris, France
关键词
Plagiarism; Detection; Text reuse; Natural Language Processing; Speech Processing; Scientometrics; Informetrics;
D O I
10.1007/s00799-017-0211-0
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The aim of this experiment is to present an easy way to compare fragments of texts in order to detect (supposed) results of copy and paste operations between articles in the domain of Natural Language Processing (NLP), including Speech Processing. The search space of the comparisons is a corpus labeled as NLP4NLP, which includes 34 different conferences and journals and gathers a large part of the NLP activity over the past 50 years. This study considers the similarity between the papers of each individual event and the complete set of papers in the whole corpus, according to four different types of relationship (self-reuse, self-plagiarism, reuse and plagiarism) and in both directions: a paper borrowing a fragment of text from another paper of the corpus (that we will call the source paper), or in the reverse direction, fragments of text from the source paper being borrowed and inserted in another paper of the corpus. The results show that self-reuse is rather a common practice, but that plagiarism seems to be very unusual, and that both stay within legal and ethical limits.
引用
收藏
页码:113 / 126
页数:14
相关论文
共 50 条
  • [41] Development of GUI for Text-to-Speech Recognition using Natural Language Processing
    Mukherjee, Partha
    Santra, Soumen
    Bhowmick, Subhajit
    Paul, Ananya
    Chatterjee, Pubali
    Deyasi, Arpan
    2018 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS, MATERIALS ENGINEERING & NANO-TECHNOLOGY (IEMENTECH), 2018, : 195 - 198
  • [42] Leveraging natural language processing models to automate speech-intelligibility scoring
    Herrmann, Bjoern
    SPEECH LANGUAGE AND HEARING, 2025, 28 (01)
  • [43] Plagiarism and ethics in scientific publications
    Solis Sanchez, Gonzalo
    Cano Garcinuno, Alfredo
    Anton Gamero, Montserrat
    Manrique de Lara, Laia Alsina
    Rey Galan, Corsino
    ANALES DE PEDIATRIA, 2019, 90 (01): : 1 - 2
  • [44] Plagiarism, research publications and law
    Saha, R.
    CURRENT SCIENCE, 2017, 112 (12): : 2375 - 2378
  • [45] Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools Comparison on the ArXiv Dataset
    Lopuszynski, Michal
    Bolikowski, Lukasz
    THEORY AND PRACTICE OF DIGITAL LIBRARIES - TPDL 2013 SELECTED WORKSHOPS, 2014, 416 : 16 - 27
  • [46] Trends in Computational Science: Natural Language Processing and Network Analysis of 23 Years of ICCS Publications
    Luo, Lijing
    Kovalchuk, Sergey
    Krzhizhanovskaya, Valeria
    Paszynski, Maciej
    de Mulatier, Clelia
    Dongarra, Jack
    Sloot, Peter M. A.
    COMPUTATIONAL SCIENCE, ICCS 2024, PT II, 2024, 14833 : 19 - 33
  • [47] Natural language processing
    Chowdhury, GG
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2003, 37 : 51 - 89
  • [48] Natural language processing
    Martinez, Angel R.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (03) : 352 - 357
  • [49] Natural language processing
    EDITORIAL: Automatische Sprachverarbeitung
    Hoepel-Man, Jakob, 1600, De Gruyter Oldenbourg (36):
  • [50] Natural language processing
    Anon
    1600, Knowledge Technology Inc. (15):