Automated detection of discourse segment and experimental types from the text of cancer pathway results sections

被引:9
|
作者
Burns, Gully A. P. C. [1 ]
Dasigi, Pradeep [2 ]
de Waard, Anita [3 ]
Hovy, Eduard H. [2 ]
机构
[1] Univ Southern Calif, Inst Informat Sci, Viterbi Sch Engn, Marina Del Rey, CA 90292 USA
[2] Carnegie Mellon Univ, Language Technol Inst, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
[3] Elsevier Res Data Serv, Jericho, VT 05465 USA
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2016年
关键词
ONTOLOGY; ARTICLES; MINT;
D O I
10.1093/database/baw122
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automated machine-reading biocuration systems typically use sentence-by-sentence information extraction to construct meaning representations for use by curators. This does not directly reflect the typical discourse structure used by scientists to construct an argument from the experimental data available within a article, and is therefore less likely to correspond to representations typically used in biomedical informatics systems (let alone to the mental models that scientists have). In this study, we develop Natural Language Processing methods to locate, extract, and classify the individual passages of text from articles' Results sections that refer to experimental data. In our domain of interest (molecular biology studies of cancer signal transduction pathways), individual articles may contain as many as 30 small-scale individual experiments describing a variety of findings, upon which authors base their overall research conclusions. Our system automatically classifies discourse segments in these texts into seven categories (fact, hypothesis, problem, goal, method, result, implication) with an F-score of 0.68. These segments describe the essential building blocks of scientific discourse to (i) provide context for each experiment, (ii) report experimental details and (iii) explain the data's meaning in context. We evaluate our system on text passages from articles that were curated in molecular biology databases (the Pathway Logic Datum repository, the Molecular Interaction MINT and INTACT databases) linking individual experiments in articles to the type of assay used (coprecipitation, phosphorylation, translocation etc.). We use supervised machine learning techniques on text passages containing unambiguous references to experiments to obtain baseline F1 scores of 0.59 for MINT, 0.71 for INTACT and 0.63 for Pathway Logic. Although preliminary, these results support the notion that targeting information extraction methods to experimental results could provide accurate, automated methods for biocuration. We also suggest the need for finer-grained curation of experimental methods used when constructing molecular biology databases
引用
收藏
页数:12
相关论文
共 50 条
  • [1] DISTANT SUPERVISION FOR CANCER PATHWAY EXTRACTION FROM TEXT
    Poon, Hoifung
    Toutanova, Kristina
    Quirk, Chris
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2015 (PSB), 2015, : 120 - 131
  • [2] Text, textual genre and types of discourse: revisiting key concepts from the perspective of Sociodiscursive Interactionism
    Gois Oliveira, Hermano Aroldo
    Mendes Pereira, Regina Celi
    DIALOGO DAS LETRAS, 2023, 12
  • [3] Automated HT Decompensation Detection: Results from the Decompensation Detection Study (DECODE)
    Ewald, Gregory A.
    Gilliam, F. Roosevelt
    Sweeney, Robert J.
    JOURNAL OF CARDIAC FAILURE, 2009, 15 (06) : S122 - S122
  • [4] Automated Detection of Substance-Use Status and Related Information from Clinical Text
    Alzubi, Raid
    Alzoubi, Hadeel
    Katsigiannis, Stamos
    West, Daune
    Ramzan, Naeem
    SENSORS, 2022, 22 (24)
  • [5] Automated melanoma skin cancer detection from digital images
    Shalu
    Rani, Rajneesh
    Kamboj, Aman
    INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2021, 37 (03) : 275 - 289
  • [6] Experimental results on the use of the MUSIC algorithm for early breast cancer detection
    Vasquez, J. A. Tobon
    Vipiana, F.
    Dassano, G.
    Casu, M. R.
    Vacca, M.
    Pulimeno, A.
    Solimene, R.
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTROMAGNETICS IN ADVANCED APPLICATIONS (ICEAA), 2015, : 1084 - 1085
  • [7] The Impact of Automated Reminders on Credit Outcomes: Results from an Experimental Pilot Program
    Roll, Stephen P.
    Moulton, Stephanie
    JOURNAL OF CONSUMER AFFAIRS, 2019, 53 (04) : 1693 - 1724
  • [8] Cancer cure for 32 cancer types: results from the EUROCARE-5 study
    Dal Maso, Luigino
    Panato, Chiara
    Tavilla, Andrea
    Guzzinati, Stefano
    Serraino, Diego
    Mallone, Sandra
    Botta, Laura
    Boussari, Olayide
    Capocaccia, Riccardo
    Colonna, Marc
    Crocetti, Emanuele
    Dumas, Agnes
    Dyba, Tadek
    Franceschi, Silvia
    Gatta, Gemma
    Gigli, Anna
    Giusti, Francesco
    Jooste, Valerie
    Minicozzi, Pamela
    Neamtiu, Luciana
    Romain, Gaelle
    Zorzi, Manuel
    De Angelis, Roberta
    Francisci, Silvia
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2020, 49 (05) : 1517 - 1525
  • [9] Automated Detection and Extraction of Skull from MR Head Images: Preliminary Results
    Goceri, Evgin
    Songul, Caner
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 171 - 176
  • [10] Drowsiness Detection and Warning in Manual and Automated Driving: Results from Subjective Evaluation
    Kundinger, Thomas
    Riener, Andreas
    Sofra, Nikoletta
    Weigl, Klemens
    AUTOMOTIVEUI'18: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON AUTOMOTIVE USER INTERFACES AND INTERACTIVE VEHICULAR APPLICATIONS, 2018, : 229 - 236