Compound Classification and Consideration of Correlation with Chemical Descriptors from Articles on Antioxidant Capacity Using Natural Language Processing

被引:0
|
作者
Matsumoto, Yuto [1 ]
Gotoh, Hiroaki [1 ]
机构
[1] Yokohama Natl Univ, Dept Chem & Life Sci, Yokohama 2408501, Japan
关键词
SOAC VALUES; EXTRACTION; QUERCETIN;
D O I
10.1021/acs.jcim.3c01826
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
In recent times, there has been a substantial increase in the number of articles focusing on antioxidants. However, the development of a comprehensive estimator for antioxidant capacity remains elusive due to the challenge of integrating information from these articles. Furthermore, the complexity of the antioxidant mechanism, which involves a multitude of factors, makes it difficult to establish a simple equation or correlation. Hence, there is a pressing need for a model that can effectively interpret the collective knowledge from these articles, especially from a chemistry perspective. In this research, we employed natural language processing techniques, specifically Word2Vec, to analyze articles related to antioxidant capacity. We extracted representation vectors of compound names from these documents and organized them into 10 distinct clusters. In our investigation of two of these clusters, we unveiled that the majority of the compounds in question were flavonoids and flavonoid glycosides. To establish a link between the descriptors and clusters, we utilized kernel density estimation and generated scatter plots to visualize their similarity. These visualizations clearly indicated a strong relationship between the descriptors and clusters, affirming that a tangible connection exists between word vectors and compound descriptors through a document analysis conducted with natural language processing techniques. This study represents a pioneering approach that utilizes document analysis to shed light on the field of antioxidant capacity research, marking a significant advancement in this domain.
引用
收藏
页码:119 / 127
页数:9
相关论文
共 50 条
  • [41] Exploring chemical space using natural language processing methodologies for drug discovery
    Ozturk, Hakime
    Ozgur, Arzucan
    Schwaller, Philippe
    Laino, Teodoro
    Ozkirimli, Elif
    DRUG DISCOVERY TODAY, 2020, 25 (04) : 689 - 705
  • [42] Automatized spatio-temporal detection of drought impacts from newspaper articles using natural language processing and machine learning
    Sodoge, Jan
    Kuhlicke, Christian
    de Brito, Mariana Madruga
    WEATHER AND CLIMATE EXTREMES, 2023, 41
  • [43] Classification of Severe Maternal Morbidity from Electronic Health Records Written in Spanish Using Natural Language Processing
    Torres-Silva, Ever A.
    Rua, Santiago
    Giraldo-Forero, Andres F.
    Durango, Maria C.
    Florez-Arango, Jose F.
    Orozco-Duque, Andres
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [44] Explainable Automatic Industrial Carbon Footprint Estimation From Bank Transaction Classification Using Natural Language Processing
    Gonzalez-Gonzalez, Jaime
    Garcia-Mendez, Silvia
    De Arriba-Perez, Francisco
    Gonzalez-Castano, Francisco J.
    Barba-Seara, Oscar
    IEEE ACCESS, 2022, 10 : 126326 - 126338
  • [45] Fake news detection using deep learning integrating feature extraction, natural language processing, and statistical descriptors
    Madani, Mirmorsal
    Motameni, Homayun
    Mohamadi, Hosein
    SECURITY AND PRIVACY, 2022, 5 (06)
  • [46] VHost-Classifier: virus-host classification using natural language processing
    Kitson, Ezra
    Suttle, Curtis A.
    BIOINFORMATICS, 2019, 35 (19) : 3867 - 3869
  • [47] Technical Debt Classification in Issue Trackers using Natural Language Processing based on Transformers
    Skryseth, Daniel
    Shivashankar, Karthik
    Pilan, Ildiko
    Martini, Antonio
    2023 ACM/IEEE INTERNATIONAL CONFERENCE ON TECHNICAL DEBT, TECHDEBT, 2023, : 92 - 101
  • [48] Spam email classification based on cybersecurity potential risk using natural language processing
    Janez-Martino, Francisco
    Alaiz-Rodriguez, Rocio
    Gonzalez-Castro, Victor
    Fidalgo, Eduardo
    Alegre, Enrique
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [49] Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing
    Garg, Ravi
    Oh, Elissa
    Naidech, Andrew
    Kording, Konrad
    Prabhakaran, Shyam
    JOURNAL OF STROKE & CEREBROVASCULAR DISEASES, 2019, 28 (07): : 2045 - 2051
  • [50] Surgical classification using natural language processing of informed consent forms in spine surgery
    Shost, Michael D.
    Meade, Seth M.
    Steinmetz, Michael P.
    Mroz, Thomas E.
    Habboub, Ghaith
    NEUROSURGICAL FOCUS, 2023, 54 (06)