Unveiling scientific articles from paper mills with provenance analysis

被引:0
|
作者
Cardenuto, Joao Phillipe [1 ]
Moreira, Daniel [2 ]
Rocha, Anderson [1 ]
机构
[1] Univ Estadual Campinas, Inst Comp, Artificial Intelligence Lab Recod Ai, Campinas, SP, Brazil
[2] Loyola Univ Chicago, Dept Comp Sci, Chicago, IL USA
来源
PLOS ONE | 2024年 / 19卷 / 10期
基金
巴西圣保罗研究基金会;
关键词
D O I
10.1371/journal.pone.0312666
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The increasing prevalence of fake publications created by paper mills poses a significant challenge to maintaining scientific integrity. While integrity analysts typically rely on textual and visual clues to identify fake articles, determining which papers merit further investigation can be akin to searching for a needle in a haystack, as these fake publications have non-related authors and are published on non-related venues. To address this challenge, we developed a new methodology for provenance analysis, which automatically tracks and groups suspicious figures and documents. Our approach groups manuscripts from the same paper mill by analyzing their figures and identifying duplicated and manipulated regions. These regions are linked and organized in a provenance graph, providing evidence of systematic production. We tested our solution on a paper mill dataset of hundreds of documents and also on a larger version of the dataset that deliberately included thousands of documents intentionally selected to distract our method. Our approach successfully identified and linked systematically produced articles on both datasets by pinpointing the figures they reused and manipulated from one another. The technique herein proposed offers a promising solution to identify fraudulent manuscripts, and it could be a valuable tool for supporting scientific integrity.
引用
收藏
页数:28
相关论文
共 50 条
  • [21] Proposal of indicators for the structural analysis of scientific articles
    De Sordi, Jose Osvaldo
    de Paulo, Wanderlei Lima
    Meireles, Manuel Antonio
    de Azevedo, Marcia Carvalho
    Contreras Pinochet, Luis Hernan
    JOURNAL OF INFORMETRICS, 2017, 11 (02) : 483 - 497
  • [22] Datasets and annotations for layout analysis of scientific articles
    Gemelli, Andrea
    Marinai, Simone
    Pisaneschi, Lorenzo
    Santoni, Francesco
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2024, 27 (04) : 683 - 705
  • [23] Linguistic analysis of figures in articles of scientific popularisation
    El Khamissy, Riham
    THELEME-REVISTA COMPLUTENSE DE ESTUDIOS FRANCESES, 2018, 33 (01): : 29 - 44
  • [24] Information extraction from scientific articles: a survey
    Nasar, Zara
    Jaffry, Syed Waqar
    Malik, Muhammad Kamran
    SCIENTOMETRICS, 2018, 117 (03) : 1931 - 1990
  • [25] Information extraction from scientific articles: a survey
    Zara Nasar
    Syed Waqar Jaffry
    Muhammad Kamran Malik
    Scientometrics, 2018, 117 : 1931 - 1990
  • [26] Automatic keyphrase extraction from scientific articles
    Kim, Su Nam
    Medelyan, Olena
    Kan, Min-Yen
    Baldwin, Timothy
    LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (03) : 723 - 742
  • [27] Extracting Core Claims from Scientific Articles
    Jansen, Tom
    Kuhn, Tobias
    BNAIC 2016: ARTIFICIAL INTELLIGENCE, 2017, 765 : 32 - 46
  • [28] Automatic keyphrase extraction from scientific articles
    Su Nam Kim
    Olena Medelyan
    Min-Yen Kan
    Timothy Baldwin
    Language Resources and Evaluation, 2013, 47 : 723 - 742
  • [29] Analysis of the wet-end dynamics in paper mills
    Ryu, Jae Yong
    Yeo, Yeong Koo
    Yi, Sung Chul
    Seo, Dong Jun
    Kang, Hong
    Palpu Chongi Gisul/Journal of Korea Technical Association of the Pulp and Paper Industry, 2003, 35 (05): : 26 - 36
  • [30] PAPER-MILLS IN ENSIVAL FROM 1906 TO 1913
    RADERMECKER, AF
    PAPIER, 1992, 46 (12): : A55 - A56