Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization

被引:8
|
作者
Guran, Aysun [1 ]
Bayazit, Nilgun Guler [2 ]
Gurbuz, Mustafa Zahid [1 ]
机构
[1] Dogus Univ, Dept Comp Engn, Istanbul, Turkey
[2] Yildiz Tech Univ, Dept Engn Math, Istanbul, Turkey
关键词
Turkish text summarization; latent semantic analysis; analytical hierarchical process; artificial bee colony algorithm; Turkish Wikipedia;
D O I
10.3906/elk-1201-15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study presents a novel hybrid Turkish text summarization system that combines structural and semantic features. The system uses 5 structural features, 1 of which is newly proposed and 3 are semantic features whose values are extracted from Turkish Wikipedia links. The features are combined using the weights calculated by 2 novel approaches. The first approach makes use of an analytical hierarchical process, which depends on a series of expert judgments based on pairwise comparisons of the features. The second approach makes use of the artificial bee colony algorithm for automatically determining the weights of the features. To confirm the significance of the proposed hybrid system, its performance is evaluated on a new Turkish corpus that contains 110 documents and 3 human-generated extractive summary corpora. The experimental results show that exploiting all of the features by combining them results in a better performance than exploiting each feature individually.
引用
收藏
页码:1411 / 1425
页数:15
相关论文
共 50 条
  • [1] A Wikipedia-based Semantic Model for Text Clustering
    Zhou, Jing-min
    Cui, Qing-jun
    Zhang, Hui
    [J]. 2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 2, 2011, : 413 - 416
  • [2] A Semantic Search Technique with Wikipedia-based Text Representation Model
    Hong, Ki-Joo
    Kim, Han-Joon
    [J]. 2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 177 - 182
  • [3] CRF Based Feature Extraction Applied for Supervised Automatic Text Summarization
    Batcha, Nowshath K.
    Aziz, Normaziah A.
    Shafie, Sharil I.
    [J]. 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI 2013), 2013, 11 : 426 - 436
  • [4] Applying Wikipedia-Based Explicit Semantic Analysis for Query-Biased Document Summarization
    Zhou, Yunqing
    Guo, Zhongqi
    Ren, Peng
    Yu, Yong
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, 2010, 6215 : 474 - 481
  • [5] Wikipedia-based Kernels for Text Categorization
    Minier, Zsolt
    Bodo, Zaldn
    Csato, Lehel
    [J]. NINTH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, PROCEEDINGS, 2007, : 157 - 164
  • [6] Independent semantic feature extraction algorithm based on short text
    School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
    [J]. Tongxin Xuebao, 2007, 12 (121-124):
  • [7] APTrans: Transformer-Based Multilayer Semantic and Locational Feature Integration for Efficient Text Classification
    Ji, Gaoyang
    Chen, Zengzhao
    Liu, Hai
    Liu, Tingting
    Wang, Bing
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (11):
  • [8] Towards perfect text classification with Wikipedia-based semantic Naive Bayes learning
    Kim, Han-joon
    Kim, Jiyun
    Kim, Jinseog
    Lim, Pureum
    [J]. NEUROCOMPUTING, 2018, 315 : 128 - 134
  • [9] An Algebraic Approach for Sentence Based Feature Extraction Applied for Automatic Text Summarization
    Batcha, Nowshath Kadhar
    Aziz, Normaziah Abdul
    [J]. ADVANCED SCIENCE LETTERS, 2014, 20 (01) : 139 - 143
  • [10] A WIKIPEDIA-BASED FRAMEWORK FOR COLLABORATIVE SEMANTIC ANNOTATION
    Fernandez, N.
    Fisteus, J. A.
    Fuentes, D.
    Sanchez, L.
    Luque, V.
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2011, 20 (05) : 847 - 886