Ensemble System for Identification of Cited Text Spans: Based on Two Steps of Feature Selection

被引:0
|
作者
Xu, Jin [1 ]
Zhang, Chengzhi [1 ]
Ma, Shutian [1 ]
机构
[1] Nanjing Univ Sci & Technol, Dept Informat Management, Nanjing 210094, Peoples R China
来源
关键词
Cited text spans; Feature selection; Negative sampling; Text classification; Ensemble system; CLASSIFIER;
D O I
10.1007/978-3-030-31624-2_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
CL-SciSumm Shared Task proposed a novel approach which is to generate scientific summary based on cited text spans (CTS) in target paper. This mechanism requires identifying CTS from reference paper according to citation sentence (citance) firstly. Therefore, CTS identification has then arisen the attention of many scholars since identified sentences will finally be aggregated for summary generation. Prior studies viewed this task as a text classification problem and feature selection is one key step for modeling the linkage between CTS and citance. Since most studies have paved the work by building features arbitrarily and applying them directly to model training. There is a lack of investigation to evaluate the effectiveness of features. Performance variation caused by different classifiers are barely taken into consideration as well. To further improve the performance of CTS identification, this paper builds an ensemble system based on two steps of feature selection. In the first step, we construct a set of features and do correlation analysis to select those which are higher-correlated with CTS. The second step is responsible for assigning several basic classifiers (SVM, Decision Tree and Logistic Regression) with their best performing feature sets. Experimental results demonstrate that our proposed systems can surpass the previous best performing one.
引用
收藏
页码:95 / 107
页数:13
相关论文
共 50 条
  • [1] Cited text spans identification with an improved balanced ensemble model
    Wang, Pancheng
    Li, Shasha
    Zhou, Haifang
    Tang, Jintao
    Wang, Ting
    [J]. SCIENTOMETRICS, 2019, 120 (03) : 1111 - 1145
  • [2] Cited text spans identification with an improved balanced ensemble model
    Pancheng Wang
    Shasha Li
    Haifang Zhou
    Jintao Tang
    Ting Wang
    [J]. Scientometrics, 2019, 120 : 1111 - 1145
  • [3] FEATURE SELECTION AND CLASSIFICATION INTEGRATED METHOD FOR IDENTIFYING CITED TEXT SPANS FOR CITANCES ON IMBALANCED DATA
    Yee, Jen-Yuan
    Tsai, Cheng-Jung
    Hsu, Tien-Yu
    Lin, Jung-Yi
    Cheng, Pei-Cheng
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2021, 34 (04) : 355 - 373
  • [4] Ensemble Learning Based Feature Selection with an Application to Text Classification
    Onan, Aytug
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [5] Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset
    Shutian Ma
    Jin Xu
    Chengzhi Zhang
    [J]. Scientometrics, 2018, 116 : 1303 - 1330
  • [6] Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset
    Ma, Shutian
    Xu, Jin
    Zhang, Chengzhi
    [J]. SCIENTOMETRICS, 2018, 116 (02) : 1303 - 1330
  • [7] An Ensemble Intrusion Detection System based on Acute Feature Selection
    Hariprasad, S.
    Deepa, T.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8267 - 8280
  • [8] An Ensemble Intrusion Detection System based on Acute Feature Selection
    Hariprasad S
    Deepa T
    [J]. Multimedia Tools and Applications, 2024, 83 : 8267 - 8280
  • [9] Ensemble Filter-Wrapper Text Feature Selection Methods for Text Classification
    Ige, Oluwaseun Peter
    Gan, Keng Hoon
    [J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, : 1847 - 1865
  • [10] An Ensemble Based Approach for Feature Selection
    Minaei-Bidgoli, Behrouz
    Asadi, Maryam
    Parvin, Hamid
    [J]. ENGINEERING APPLICATIONS OF NEURAL NETWORKS, PT I, 2011, 363 : 240 - 246