A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications

被引:0
|
作者
Qiuzi Zhang
Qikai Cheng
Yong Huang
Wei Lu
机构
[1] SchoolofInformationManagement,WuhanUniversity
关键词
D O I
暂无
中图分类号
G254 [文献标引与编目];
学科分类号
摘要
Purpose:Our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.Design/methodology/approach:The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text.In each iteration,new patterns are constructed and added to the pattern list based on their calculated score.Three seed-selection strategies are also proposed in this paper.Findings:The performance of the method is verified by means of experiments on real data collected from computer science journals.The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.Research limitations:While the triple representation of sentences is effective and efficient for extracting data-usage statements,it is unable to handle complex sentences.Additional features that can address complex sentences should thus be explored in the future.Practical implications:Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking,dataset-based scholar search,and dataset evaluation.Originality/value:To the best of our knowledge,this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
引用
收藏
页码:69 / 85
页数:17
相关论文
共 50 条
  • [21] The Production of Thailand's Sugarcane: Using Panel Data Envelopment Analysis (Panel DEA) Based Decision on Bootstrapping Method
    Chaitip, Prasert
    Chaiboonsri, Chukiat
    Inluang, Fawikorn
    INTERNATIONAL CONFERENCE ON APPLIED ECONOMICS (ICOAE 2014), 2014, 14 : 120 - 127
  • [22] Method to identify forest fire based on smoke plumes mask by using MODIS data
    Peng Guang-Xiong
    Shen Wei
    Hu De-Yong
    Li Jing
    Chen Yun-Hao
    JOURNAL OF INFRARED AND MILLIMETER WAVES, 2008, 27 (03) : 185 - 189
  • [23] A Method for Automatically Generating Join Queries Based on Relations-Attributes Distance Matrix over Data Lakes
    Zhang, Caicai
    Lu, Chenglang
    Mei, Zhuolin
    Wu, Bin
    Yu, Jing
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2023, 30 (05): : 1539 - 1546
  • [24] A New Method of Achieving Single Three-Dimensional Building Model Automatically Based on Oblique Photography Data
    Che, Defu
    Li, Zonghui
    Liu, Yining
    Zhong, Renqing
    Ma, Baodong
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [25] A New Method of Achieving Single Three-Dimensional Building Model Automatically Based on Oblique Photography Data
    Che, Defu
    Li, Zonghui
    Liu, Yining
    Zhong, Renqing
    Ma, Baodong
    Mathematical Problems in Engineering, 2021, 2021
  • [26] Use of a MeSH-based index of faculty research interests to identify faculty publications: An IAIMSian study of precision, recall, and data reusabilty
    McKibbon, KA
    Friedman, P
    Friedman, CP
    AMIA 2002 SYMPOSIUM, PROCEEDINGS: BIOMEDICAL INFORMATICS: ONE DISCIPLINE, 2002, : 514 - 518
  • [27] Design modification supporting method based on product usage data in closed-loop PLM
    Shin, Jong-Ho
    Kiritsis, Dimitris
    Xirouchakis, Paul
    INTERNATIONAL JOURNAL OF COMPUTER INTEGRATED MANUFACTURING, 2015, 28 (06) : 551 - 568
  • [28] A five-year department-based electronic usage data analysis on electronic journal access, library training courses, and scholarly publications
    Huang, Chia-Chang
    Chung, Ching-Jung
    Wu, Yi-Ting
    Hsu, Po-Ting
    Liang, Jen-Feng
    Yang, Ying-Ying
    Yang, Jie Chi
    ELECTRONIC LIBRARY, 2024, 42 (01): : 23 - 36
  • [29] A new four-step method to identify the parameters of transmission line based on SCADA data
    Kong, He
    Lu, Min
    Que, Lingyan
    Xu, Feiyang
    Zhao, Junbo
    Xue, Ancheng
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2022, 16 (09) : 1822 - 1835
  • [30] Method to Identify Diagnostic Rules for Pancreatic Cancer Using Laboratory Data Based on Bayesian Estimation
    Manabe, Shirou
    Shimai, Yoshie
    Murata, Taizo
    Fujii, Ayumi
    Ueda, Kanayo
    Nakagawa, Akito
    Takeda, Toshihiro
    Mihara, Naoki
    Matsamura, Yasushi
    MEDINFO 2017: PRECISION HEALTHCARE THROUGH INFORMATICS, 2017, 245 : 1372 - 1372