A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications

被引:0
|
作者
Qiuzi Zhang
Qikai Cheng
Yong Huang
Wei Lu
机构
[1] SchoolofInformationManagement,WuhanUniversity
关键词
D O I
暂无
中图分类号
G254 [文献标引与编目];
学科分类号
摘要
Purpose:Our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.Design/methodology/approach:The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text.In each iteration,new patterns are constructed and added to the pattern list based on their calculated score.Three seed-selection strategies are also proposed in this paper.Findings:The performance of the method is verified by means of experiments on real data collected from computer science journals.The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.Research limitations:While the triple representation of sentences is effective and efficient for extracting data-usage statements,it is unable to handle complex sentences.Additional features that can address complex sentences should thus be explored in the future.Practical implications:Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking,dataset-based scholar search,and dataset evaluation.Originality/value:To the best of our knowledge,this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
引用
收藏
页码:69 / 85
页数:17
相关论文
共 50 条
  • [41] An Optimal Mean Based Block Robust Feature Extraction Method to Identify Colorectal Cancer Genes with Integrated Data
    Jian Liu
    Yuhu Cheng
    Xuesong Wang
    Lin Zhang
    Hui Liu
    Scientific Reports, 7
  • [42] A GIS software-based method to identify public health data belonging to address-defined communities
    Lam, Amanda M.
    Singletary, Mariana C.
    Cullen, Theresa
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (11) : 2716 - 2721
  • [43] A Fuzzy set-based method to identify the car position in a road lane at intersections by smartphone GPS data
    Marinelli, Mario
    Palmisano, Gianvito
    Astarita, Vittorio
    Ottomanelli, Michele
    Dell'Orco, Mauro
    20TH EURO WORKING GROUP ON TRANSPORTATION MEETING, EWGT 2017, 2017, 27 : 444 - 451
  • [44] A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data
    Li, Ziyi
    Wang, Yizhuo
    Ganan-Gomez, Irene
    Colla, Simona
    Do, Kim-Anh
    BIOINFORMATICS, 2022, 38 (21) : 4885 - 4892
  • [45] An Optimal Mean Based Block Robust Feature Extraction Method to Identify Colorectal Cancer Genes with Integrated Data
    Liu, Jian
    Cheng, Yuhu
    Wang, Xuesong
    Zhang, Lin
    Liu, Hui
    SCIENTIFIC REPORTS, 2017, 7
  • [46] Power usage pattern and consumption separation method by load devices based on remote metering system's Load profile data
    Kim, Sun Ic
    Kim, Hae Soon
    Joo, Yong Jae
    Kim, Ji Hyun
    2011 11TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2011, : 1669 - 1671
  • [47] A Class-Information-Based Sparse Component Analysis Method to Identify Differentially Expressed Genes on RNA-Seq Data
    Liu, Jin-Xing
    Xu, Yong
    Gao, Ying-Lian
    Zheng, Chun-Hou
    Wang, Dong
    Zhu, Qi
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (02) : 392 - 398
  • [48] A textual data-driven method to identify and prioritise user preferences based on regret/rejoicing perception for smart and connected products
    Du, Yinfeng
    Liu, Dun
    Duan, Hengxin
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2022, 60 (13) : 4176 - 4196
  • [49] A Data Mining-Based Method to Disclose Usage Behavior Patterns of Fresh Air Systems in Beijing Dwellings during the Heating Season
    Gao, Sijia
    Pan, Song
    Liu, Yiqiao
    Zhu, Ning
    Cui, Tong
    Chang, Li
    Han, Xiaofei
    Cui, Ying
    BUILDINGS, 2024, 14 (10)
  • [50] An Improved Method to Identify Built-Up Areas of Urban Agglomerations in Eastern and Western China Based on Multi-Source Data Fusion
    Lu, Xiaoyi
    Yang, Guang
    Chen, Shijun
    LAND, 2024, 13 (07)