A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications

被引:0
|
作者
Qiuzi Zhang
Qikai Cheng
Yong Huang
Wei Lu
机构
[1] SchoolofInformationManagement,WuhanUniversity
关键词
D O I
暂无
中图分类号
G254 [文献标引与编目];
学科分类号
摘要
Purpose:Our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.Design/methodology/approach:The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text.In each iteration,new patterns are constructed and added to the pattern list based on their calculated score.Three seed-selection strategies are also proposed in this paper.Findings:The performance of the method is verified by means of experiments on real data collected from computer science journals.The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.Research limitations:While the triple representation of sentences is effective and efficient for extracting data-usage statements,it is unable to handle complex sentences.Additional features that can address complex sentences should thus be explored in the future.Practical implications:Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking,dataset-based scholar search,and dataset evaluation.Originality/value:To the best of our knowledge,this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
引用
收藏
页码:69 / 85
页数:17
相关论文
共 50 条
  • [1] A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications
    Qiuzi Zhang
    Qikai Cheng
    Yong Huang
    Wei Lu
    Journal of Data and Information Science, 2016, (01) : 69 - 85
  • [2] Adaptive Neural-Fuzzy Inference System for Classification of Rail Quality Data with Bootstrapping-Based Over-Sampling
    Yang, Y. Y.
    Mahfouf, M.
    Panoutsos, G.
    Zhang, Q.
    Thornton, S.
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 2205 - 2212
  • [3] Extraction of data deposition statements from the literature: a method for automatically tracking research results
    Neveol, Aurelie
    Wilbur, W. John
    Lu, Zhiyong
    BIOINFORMATICS, 2011, 27 (23) : 3306 - 3312
  • [4] Fast bootstrapping-based estimation of confidence intervals of expression levels and differential expression from RNA-Seq data
    Mandric, Igor
    Temate-Tiagueu, Yvette
    Shcheglova, Tatiana
    Al Seesi, Sahar
    Zelikovsky, Alex
    Mandoiu, Ion I.
    BIOINFORMATICS, 2017, 33 (20) : 3302 - 3304
  • [5] Domain adaptation of web data extraction based on bootstrapping method
    Liu, Dong-Lan
    Liu, Xin
    Ma, Lei
    Yu, Hao
    Zhao, Yong
    Lv, Guo-Dong
    PROCEEDINGS OF THE 2ND ANNUAL INTERNATIONAL CONFERENCE ON ELECTRONICS, ELECTRICAL ENGINEERING AND INFORMATION SCIENCE (EEEIS 2016), 2016, 117 : 372 - 385
  • [6] An improved, SSH-based method to automatically identify mesoscale eddies in the ocean
    WANG Xin
    DU Yunyan
    ZHOU Chenghu
    FAN Xing
    YI Jiawei
    热带海洋学报, 2013, 32 (02) : 15 - 23
  • [7] A BERT-based sequential deep neural architecture to identify contribution statements and extract phrases for triplets from scientific publications
    Gupta, Komal
    Ahmad, Ammaar
    Ghosal, Tirthankar
    Ekbal, Asif
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2024, 25 (04) : 1 - 28
  • [8] A new method for automatically constructing concept maps based on data mining techniques
    Bai, Shih-Ming
    Chen, Shyi-Ming
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 3078 - 3083
  • [9] Anonymization method based on sparse coding for power usage data
    Harada, Keiya
    Ohno, Yuta
    Nakamura, Yuichi
    Nishi, Hiroaki
    2018 IEEE 16TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2018, : 571 - 576
  • [10] Set-Based Testing of Proteomic Features with Large Portions of Missing Values - a Bootstrapping Approach with Optimal Data Usage
    Birgit, Debrabant
    HUMAN HEREDITY, 2021, 85 (02) : 74 - 74