Text mining of CHO bioprocess bibliome: Topic modeling and document classification

被引:0
|
作者
Wang, Qinghua [1 ,2 ]
Olshin, Jonathan [1 ,2 ]
Vijay-Shanker, K. [1 ]
Wu, Cathy H. [1 ,2 ,3 ]
机构
[1] Univ Delaware, Dept Comp & Informat Sci, Newark, DE 19716 USA
[2] Univ Delaware, Ctr Bioinformat & Computat Biol, Newark, DE 19716 USA
[3] Georgetown Univ, Dept Biochem & Mol & Cellular Biol, Prot Informat Resource, Med Ctr, Washington, DC USA
来源
PLOS ONE | 2023年 / 18卷 / 04期
基金
美国国家科学基金会;
关键词
D O I
10.1371/journal.pone.0274042
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Chinese hamster ovary (CHO) cells are widely used for mass production of therapeutic proteins in the pharmaceutical industry. With the growing need in optimizing the performance of producer CHO cell lines, research on CHO cell line development and bioprocess continues to increase in recent decades. Bibliographic mapping and classification of relevant research studies will be essential for identifying research gaps and trends in literature. To qualitatively and quantitatively understand the CHO literature, we have conducted topic modeling using a CHO bioprocess bibliome manually compiled in 2016, and compared the topics uncovered by the Latent Dirichlet Allocation (LDA) models with the human labels of the CHO bibliome. The results show a significant overlap between the manually selected categories and computationally generated topics, and reveal the machine-generated topic-specific characteristics. To identify relevant CHO bioprocessing papers from new scientific literature, we have developed supervized models using Logistic Regression to identify specific article topics and evaluated the results using three CHO bibliome datasets, Bioprocessing set, Glycosylation set, and Phenotype set. The use of top terms as features supports the explainability of document classification results to yield insights on new CHO bioprocessing papers.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [1] Dataless Text Classification: A Topic Modeling Approach with Document Manifold
    Li, Ximing
    Li, Changchun
    Chi, Jinjin
    Ouyang, Jihong
    Li, Chenliang
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 973 - 982
  • [2] A Survey of Topic Modeling in Text Mining
    Alghamdi, Rubayyi
    Alfalqi, Khalid
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (01) : 147 - 153
  • [3] A Hybrid Method for Manufacturing Text Mining Based on Document Clustering and Topic Modeling Techniques
    Shotorbani, Peyman Yazdizadeh
    Ameri, Farhad
    Kulvatunyou, Boonserm
    Ivezic, Nenad
    ADVANCES IN PRODUCTION MANAGEMENT SYSTEMS: INITIATIVES FOR A SUSTAINABLE WORLD, 2016, 488 : 777 - 786
  • [4] Topic modeling combined with classification technique for extractive multi-document text summarization
    Rajendra Kumar Roul
    Soft Computing, 2021, 25 : 1113 - 1127
  • [5] Topic modeling combined with classification technique for extractive multi-document text summarization
    Roul, Rajendra Kumar
    SOFT COMPUTING, 2021, 25 (02) : 1113 - 1127
  • [6] Sentiment-topic modeling in text mining
    Lin, Chenghua
    Ibeke, Ebuka
    Wyner, Adam
    Guerin, Frank
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 5 (05) : 246 - 254
  • [7] Hybrid Text Mining Model for Document Classification
    Vidhya, K. A.
    Aghila, G.
    2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1, 2010, : 210 - 214
  • [8] Topic document model approach for naive Bayes text classification
    Kim, SB
    Rim, HC
    Kim, JD
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1091 - 1094
  • [9] Fuzzy topic modeling approach for text mining over short text
    Rashid, Junaid
    Shah, Syed Muhammad Adnan
    Irtaza, Aun
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
  • [10] Pattern Document Weight Discovery For Text Classification Mining
    Brindha, S.
    Prabha, K.
    Sukumaran, S.
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 651 - 655