Automatic Classification of Algorithm Citation Functions in Scientific Literature

被引:24
|
作者
Tuarob, Suppawong [1 ]
Kang, Sung Woo [2 ]
Wettayakorn, Poom [1 ]
Pornprasit, Chanatip [1 ]
Sachati, Tanakitti [1 ]
Hassan, Saeed-Ul [3 ]
Haddawy, Peter [1 ]
机构
[1] Mahidol Univ, Fac Informat & Commun Technol, Salaya Phutthamonthon 73170, Thailand
[2] Inha Univ, Coll Engn, Incheon 22212, South Korea
[3] Informat Technol Univ, Lahore, Pakistan
关键词
Feature extraction; Machine learning algorithms; Metadata; Clustering algorithms; Approximation algorithms; Machine learning; Computer science; Algorithm citation; ensemble machine learning; scholarly big data; algorithmic evolution; MEDIA;
D O I
10.1109/TKDE.2019.2913376
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Computer sciences and related disciplines evolve around developing, evaluating, and applying algorithms. Typically, an algorithm is not developed from scratch, but uses and builds upon existing ones, which often are proposed and published in scholarly articles. The ability to capture this evolution relationship among these algorithms in scientific literature would not only allow us to understand how a particular algorithm is composed, but also shed light on large-scale analysis of algorithmic evolution through different temporal spans and thematic scales. We propose to capture such evolution relationship between two algorithms by investigating the knowledge represented in citation contexts, where authors explain how cited algorithms are used in their works. A set of heterogeneous ensemble machine-learning methods is proposed, where the combination of two base classifiers trained with heterogeneous feature types is used to automatically identify the algorithm usage relationship. The proposed heterogeneous ensemble methods achieve the best average F1 of 0.749 and 0.905 for fine-grained and binary algorithm citation function classification, respectively. The success of this study will allow us to generate a large-scale algorithm citation network from a collection of scholarly documents representing multiple time spans, venues, and fields of study. Such a network will be used as an instrument not only to answer critical questions in algorithm search, such as identifying the most influential and generalizable algorithms, but also to study the evolution of algorithmic development and trends over time.
引用
收藏
页码:1881 / 1896
页数:16
相关论文
共 50 条
  • [41] Using Citation Bias to Guide Better Sampling of Scientific Literature
    Fu, Yuanxi
    Yuan, Jasmine
    Schneider, Jodi
    [J]. 18TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI2021), 2021, : 419 - 424
  • [42] A Classification Scheme for Algorithm Citation Function in Scholarly Works
    Tuarob, Suppawong
    Mitra, Prasenjit
    Giles, C. Lee
    [J]. JCDL'13: PROCEEDINGS OF THE 13TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES, 2013, : 367 - 368
  • [43] Automatic document classification of biological literature
    Chen, David
    Muller, Hans-Michael
    Sternberg, Paul W.
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [44] Automatic document classification of biological literature
    David Chen
    Hans-Michael Müller
    Paul W Sternberg
    [J]. BMC Bioinformatics, 7
  • [45] Automatic figure classification in bioscience literature
    Kim, Daehyun
    Ramesh, Balaji Polepalli
    Yu, Hong
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (05) : 848 - 858
  • [46] Automatic video classification: A survey of the literature
    Brezeale, Darin
    Cook, Diane J.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (03): : 416 - 430
  • [47] An Automatic Method for the Analysis of Scientific Literature Collection
    Custura, Adrian-Mihai
    Bajenaru, Lidia
    Pop, Florin
    [J]. STUDIES IN INFORMATICS AND CONTROL, 2018, 27 (04): : 423 - 430
  • [48] An Algorithm for Title Classification on Scientific News
    Shao, Wujie
    Zhu, Hongjian
    Yan, Yunyang
    Zhu, Quanyin
    [J]. PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 1883 - 1887
  • [49] Automatic Music Classification With genetic algorithm
    Pandey, Sunita S.
    Mishra, Ravi
    Ramesh, P.
    Mullasseri, Sileesh
    Sahoo, Nihar Ranjan
    Jadav, Ravindra
    Habeeb, Jasmin
    Unni, Anjana P.
    Verma, Sudhir
    Badrinarayan, S.
    [J]. CURRENT SCIENCE, 2019, 117 (03): : 354 - 354
  • [50] Classification of Scientific Networks Using Aggregated Journal-Journal Citation Relations in the Journal Citation Reports
    Chen, C. -M.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2008, 59 (14): : 2296 - 2304