FEATURE SELECTION AND CLASSIFICATION INTEGRATED METHOD FOR IDENTIFYING CITED TEXT SPANS FOR CITANCES ON IMBALANCED DATA

被引:0
|
作者
Yee, Jen-Yuan [1 ]
Tsai, Cheng-Jung [2 ]
Hsu, Tien-Yu [3 ]
Lin, Jung-Yi [4 ]
Cheng, Pei-Cheng [5 ]
机构
[1] Natl Museum Nat Sci, Visitor Serv, Dept Operat, Collect & Informat Management, Taichung 40453, Taiwan
[2] Natl Changhua Univ Educ, Grad Inst Stat & Informat Sci, Changhua 50007, Taiwan
[3] Natl Museum Nat Sci, Dept Sci Educ, Taichung 40453, Taiwan
[4] Hon Hai Precis IndCo Ltd Foxconn, IP Affairs Div, Taipei 11492, Taiwan
[5] Chien Hsin Univ Sci & Technol, Dept Informat Management, Taoyuan 32097, Taiwan
关键词
Citation analysis; cited text spans identification; feature selection; classification; class imbalance; performance evaluation; scientific paper summarization;
D O I
10.22452/mjcs.vol34no4.3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies in scientific paper summarization have explored a new form of structured summary for a reference paper by grouping all cited and citing sentences together by facet. This involves three main tasks: (1) identifying cited text spans for citances (i.e., citing sentences), (2) classifying their discourse facets, and (3) generating a structured summary from the cited text spans and citances. This paper focuses on the first task, and approaches the task as binary classification to distinguish relevant pairs of citances and reference sentences from irrelevant pairs. We propose a new method that integrates feature selection and classification techniques to enhance classification performance. The proposed method investigates combinations of six feature selection methods (chi(2)-Statistics, Information Gain, Gain Ratio, Relief-F, Significance Attribute Evaluation, and Symmetrical Uncertainty), and five classification algorithms (k-Nearest Neighbors, Decision Tree, Support Vector Machine, Naive Bayes, and Random Forest). Additionally, to address imbalanced data during training, we apply SMOTE (Synthetic Minority Over sampling Technique) to introduce synthetic biases towards the minority. Experiments are conducted using the CLSciSumm corpora to compare the effect of feature selection applied to classification. The results reveal the benefits of feature selection in significantly boosting performance of F-1 score metric, and show that our method is competitive to the state-of-the-art methods in the CL-SciSumm evaluations.
引用
收藏
页码:355 / 373
页数:19
相关论文
共 50 条
  • [1] An Embedded Feature Selection Method for Imbalanced Data Classification
    Liu, Haoyue
    Zhou, MengChu
    Liu, Qing
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (03) : 703 - 715
  • [2] An Embedded Feature Selection Method for Imbalanced Data Classification
    Haoyue Liu
    MengChu Zhou
    Qing Liu
    [J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6 (03) : 703 - 715
  • [3] A Classification Method Based on Feature Selection for Imbalanced Data
    Liu, Yi
    Wang, Yanzhen
    Ren, Xiaoguang
    Zhou, Hao
    Diao, Xingchun
    [J]. IEEE ACCESS, 2019, 7 : 81794 - 81807
  • [4] Optimal Feature Selection for Imbalanced Text Classification
    Khurana, Anshu
    Verma, Om Prakash
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
  • [5] On Identifying Cited Texts for Citances and Classifying Their Discourse Facets by Classification Techniques
    Yeh, Jen-Yuan
    Hsu, Tien-Yu
    Tsai, Cheng-Jung
    Cheng, Pei-Cheng
    Lin, Jung-Yi
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2019, 35 (01) : 61 - 86
  • [6] Comparison of metrics for feature selection in imbalanced text classification
    Ogura, Hiroshi
    Amano, Hiromi
    Kondo, Masato
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 4978 - 4989
  • [7] Ensemble System for Identification of Cited Text Spans: Based on Two Steps of Feature Selection
    Xu, Jin
    Zhang, Chengzhi
    Ma, Shutian
    [J]. INFORMATION RETRIEVAL (CCIR 2019), 2019, 11772 : 95 - 107
  • [8] Imbalanced Data Classification Based on Feature Selection Techniques
    Ksieniewicz, Pawel
    Wozniak, Michal
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2018), PT II, 2018, 11315 : 296 - 303
  • [9] FISA: Feature-based instance selection for imbalanced text classification
    Sun, Aixin
    Lim, Ee-Peng
    Benatallah, Boualem
    Hassan, Mahbub
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 250 - 254
  • [10] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,