Selecting Documents Relevant for Chemistry as a Classification Problem

被引:0
|
作者
Zhu, Zhemin [1 ]
Akhondi, Saber A. [1 ]
Nandal, Umesh [1 ]
Doornenbal, Marius [1 ]
Gregory, Michelle [1 ]
机构
[1] Elsevier, Radarweg 29, NL-1043 NX Amsterdam, Netherlands
关键词
Natural language processing; Document classification; Machine learning; Cheminfomatics; INFORMATION;
D O I
10.1007/978-3-319-58694-6_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a first version of a system for selecting chemical publications for inclusion in a chemistry information database. This database, Reaxys (https://www.elsevier.com/solutions/reaxys), is a portal for the retrieval of structured chemistry information from published journals and patents. There are three challenges in this task: (i) Training and input data are highly imbalanced; (ii) High recall (>= 95%) is desired; and (iii) Data offered for selection is numerically massive but at the same time, incomplete. Our system successfully handles the imbalance with the undersampling technique and achieves relatively high recall using chemical named entities as features. Experiments on a real-world data set consisting of 15,822 documents show that the features of chemical named entities boost recall by 8% over the usual n-gram features being widely used in general document classification applications. For fostering research on this challenging topic, a part of the data set compiled in this paper can be requested.
引用
收藏
页码:198 / 201
页数:4
相关论文
共 50 条
  • [41] Finding hidden relevant documents buried in scientific documents by terminological paraphrases
    Sung-Pil Choi
    Sung-Ho Shin
    Hanmin Jung
    Daesung Lee
    Multimedia Tools and Applications, 2015, 74 : 8729 - 8743
  • [42] Finding hidden relevant documents buried in scientific documents by terminological paraphrases
    Choi, Sung-Pil
    Shin, Sung-Ho
    Jung, Hanmin
    Lee, Daesung
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (20) : 8729 - 8743
  • [43] Gamma process classification according to relevant variables: Problem statement and first study
    Wang, Xuan Zhou
    Grall-Maes, Edith
    Beauseroy, Pierre
    ADVANCES IN SAFETY, RELIABILITY AND RISK MANAGEMENT, 2012, : 1449 - 1456
  • [44] KEEPING CHEMISTRY RELEVANT
    KOLB, KE
    TAYLOR, MA
    JOURNAL OF CHEMICAL EDUCATION, 1980, 57 (01) : 20 - 21
  • [45] RELEVANT CHEMISTRY IN UGANDA
    KNUTTON, S
    CHEMISTRY & INDUSTRY, 1993, (08) : 292 - 292
  • [46] MAKING CHEMISTRY RELEVANT
    SUTER, PH
    JOURNAL OF CHEMICAL EDUCATION, 1974, 51 (01) : 45 - 45
  • [47] ONLY CHEMISTRY IS RELEVANT
    KASSENOF.MM
    CHEMICAL & ENGINEERING NEWS, 1971, 49 (01) : 7 - &
  • [48] MAKE CHEMISTRY RELEVANT
    CHERKIN, A
    CHEMICAL & ENGINEERING NEWS, 1968, 46 (18) : 7 - &
  • [49] Selecting Clinically Relevant Gait Characteristics for Classification of Early Parkinson's Disease: A Comprehensive Machine Learning Approach
    Rehman, Rana Zia Ur
    Del Din, Silvia
    Guan, Yu
    Yarnall, Alison J.
    Shi, Jian Qing
    Rochester, Lynn
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [50] Selecting Clinically Relevant Gait Characteristics for Classification of Early Parkinson’s Disease: A Comprehensive Machine Learning Approach
    Rana Zia Ur Rehman
    Silvia Del Din
    Yu Guan
    Alison J. Yarnall
    Jian Qing Shi
    Lynn Rochester
    Scientific Reports, 9