Reuse and plagiarism in Speech and Natural Language Processing publications

被引:5
|
作者
Mariani, Joseph [1 ]
Francopoulo, Gil [1 ,2 ]
Paroubek, Patrick [1 ]
机构
[1] Univ Paris Saclay, CNRS, LIMSI, Orsay, France
[2] Tagmatica, Paris, France
关键词
Plagiarism; Detection; Text reuse; Natural Language Processing; Speech Processing; Scientometrics; Informetrics;
D O I
10.1007/s00799-017-0211-0
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The aim of this experiment is to present an easy way to compare fragments of texts in order to detect (supposed) results of copy and paste operations between articles in the domain of Natural Language Processing (NLP), including Speech Processing. The search space of the comparisons is a corpus labeled as NLP4NLP, which includes 34 different conferences and journals and gathers a large part of the NLP activity over the past 50 years. This study considers the similarity between the papers of each individual event and the complete set of papers in the whole corpus, according to four different types of relationship (self-reuse, self-plagiarism, reuse and plagiarism) and in both directions: a paper borrowing a fragment of text from another paper of the corpus (that we will call the source paper), or in the reverse direction, fragments of text from the source paper being borrowed and inserted in another paper of the corpus. The results show that self-reuse is rather a common practice, but that plagiarism seems to be very unusual, and that both stay within legal and ethical limits.
引用
收藏
页码:113 / 126
页数:14
相关论文
共 50 条
  • [1] Measuring Innovation in Speech and Language Processing Publications
    Mariani, Joseph
    Francopoulo, Gil
    Paroubek, Patrick
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1890 - 1895
  • [2] Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition
    Teller, V
    COMPUTATIONAL LINGUISTICS, 2000, 26 (04) : 638 - 641
  • [3] The State of Profanity Obfuscation in Natural Language Processing Scientific Publications
    Nozza, Debora
    Hovy, Dirk
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3897 - 3909
  • [4] TextProc - a natural language processing framework and its use as plagiarism detection system
    Brezovnik, Janez
    Ojstersek, Milan
    INTERNATIONAL JOURNAL OF EDUCATION AND INFORMATION TECHNOLOGIES, 2011, 5 (03): : 293 - 300
  • [5] Translating Speech to Indian Sign Language Using Natural Language Processing
    Sharma, Purushottam
    Tulsian, Devesh
    Verma, Chaman
    Sharma, Pratibha
    Nancy, Nancy
    FUTURE INTERNET, 2022, 14 (09)
  • [6] Potential of natural language processing for metadata extraction fromenvironmental scientific publications
    Blanchy, Guillaume
    Albrecht, Lukas
    Koestel, John
    Garre, Sarah
    SOIL, 2023, 9 (01) : 155 - 168
  • [7] Incident Management Optimization through the Reuse of Experiences and Natural Language Processing
    Vieira Bezerra, Glauber de Tarso
    Monteiro Pinheiro, Vladia Celia
    Albuquerque, Adriano Bessa
    2014 9TH INTERNATIONAL CONFERENCE ON THE QUALITY OF INFORMATION AND COMMUNICATIONS TECHNOLOGY (QUATIC), 2014, : 247 - 254
  • [8] Incident Management Optimization through the Reuse of Experiences and Natural Language Processing
    Bezerra, Glauber
    Pinheiro, Vladia
    Bessa, Adriano
    2014 9TH INTERNATIONAL CONFERENCE ON THE QUALITY OF INFORMATION AND COMMUNICATIONS TECHNOLOGY (QUATIC), 2014, : 58 - 65
  • [9] FarSpeech: Arabic Natural Language Processing for Live Arabic Speech
    Eldesouki, Mohamed
    Gopee, Naassih
    Ali, Ahmed
    Darwish, Kareem
    INTERSPEECH 2019, 2019, : 2372 - 2373
  • [10] Towards Natural Language Processing with Figures of Speech in Hindi Poetry
    Audichya, Milind Kumar
    Saini, Jatinderkumar R.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 128 - 133