Unveiling scientific articles from paper mills with provenance analysis

被引:0
|
作者
Cardenuto, Joao Phillipe [1 ]
Moreira, Daniel [2 ]
Rocha, Anderson [1 ]
机构
[1] Univ Estadual Campinas, Inst Comp, Artificial Intelligence Lab Recod Ai, Campinas, SP, Brazil
[2] Loyola Univ Chicago, Dept Comp Sci, Chicago, IL USA
来源
PLOS ONE | 2024年 / 19卷 / 10期
基金
巴西圣保罗研究基金会;
关键词
D O I
10.1371/journal.pone.0312666
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The increasing prevalence of fake publications created by paper mills poses a significant challenge to maintaining scientific integrity. While integrity analysts typically rely on textual and visual clues to identify fake articles, determining which papers merit further investigation can be akin to searching for a needle in a haystack, as these fake publications have non-related authors and are published on non-related venues. To address this challenge, we developed a new methodology for provenance analysis, which automatically tracks and groups suspicious figures and documents. Our approach groups manuscripts from the same paper mill by analyzing their figures and identifying duplicated and manipulated regions. These regions are linked and organized in a provenance graph, providing evidence of systematic production. We tested our solution on a paper mill dataset of hundreds of documents and also on a larger version of the dataset that deliberately included thousands of documents intentionally selected to distract our method. Our approach successfully identified and linked systematically produced articles on both datasets by pinpointing the figures they reused and manipulated from one another. The technique herein proposed offers a promising solution to identify fraudulent manuscripts, and it could be a valuable tool for supporting scientific integrity.
引用
收藏
页数:28
相关论文
共 50 条
  • [31] MICROBIOLOGICAL TREATMENT OF WATERS FROM PAPER MILLS CIRCUITS
    Stanciu, Constantin
    SCIENTIFIC STUDY AND RESEARCH-CHEMISTRY AND CHEMICAL ENGINEERING BIOTECHNOLOGY FOOD INDUSTRY, 2009, 10 (01): : 91 - 96
  • [32] Drying characteristics of biosludge from pulp and paper mills
    Hovey, Geanna
    Allen, D. Grant
    Tran, Honghi
    TAPPI JOURNAL, 2017, 16 (08): : 465 - 473
  • [33] NOISE FROM MILLS AND REFINERS IN A PAPER-MILL
    PETTERSE.GW
    NORSK SKOGINDUSTRI, 1973, 27 (12): : 350 - 355
  • [34] Analysis of Citations to Biomedical Articles Affected by Scientific Misconduct
    Neale, Anne Victoria
    Dailey, Rhonda K.
    Abrams, Judith
    SCIENCE AND ENGINEERING ETHICS, 2010, 16 (02) : 251 - 261
  • [35] Bibliometric analysis of scientific articles on jurimetry published in Brazil
    Maia, Marcos
    Bezerra, Cicero Aparecido
    RDBCI-REVISTA DIGITAL DE BIBLIOTECONOMIA E CIENCIA DA INFORMACAO, 2020, 18
  • [36] Scientific paper management. How do articles get published in medical journals?
    Serra, Maria E.
    ARCHIVOS ARGENTINOS DE PEDIATRIA, 2020, 118 (06): : 433 - 437
  • [37] THE ROLE OF VIDEO LECTURES AND MASS MEDIA ARTICLES IN READING A SCIENTIFIC PAPER IN ESL STUDENTS
    Bobunova, A.
    11TH INTERNATIONAL CONFERENCE OF EDUCATION, RESEARCH AND INNOVATION (ICERI2018), 2018, : 2328 - 2331
  • [38] Unsupervised document structure analysis of digital scientific articles
    Klampfl, Stefan
    Granitzer, Michael
    Jack, Kris
    Kern, Roman
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2014, 14 (3-4) : 83 - 99
  • [39] Textual analysis of scientific articles published on Colombian fossils
    Restrepo-Arango, Cristina
    Cardenas-Rozo, Andres L.
    ENCONTROS BIBLI-REVISTA ELETRONICA DE BIBLIOTECONOMIA E CIENCIA DA INFORMACAO, 2022, 27 : 1 - 25
  • [40] Analysis of Citations to Biomedical Articles Affected by Scientific Misconduct
    Anne Victoria Neale
    Rhonda K. Dailey
    Judith Abrams
    Science and Engineering Ethics, 2010, 16 : 251 - 261