An innovative approach of Bangla text summarization by introducing pronoun replacement and improved sentence ranking

被引:1
|
作者
Haque M.M. [1 ]
Pervin S. [1 ]
Begum Z. [2 ]
机构
[1] Computer Science and Engineering, University of Dhaka, Dhaka
[2] Institute of Information Technology, University of Dhaka, Dhaka
来源
关键词
Bangla news document; Cosine similarity; Dangling pronoun; Pronoun replacement; Sentence frequency;
D O I
10.3745/JIPS.04.0038
中图分类号
O21 [概率论与数理统计];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper proposes an automatic method to summarize Bangla news document. In the proposed approach, pronoun replacement is accomplished for the first time to minimize the dangling pronoun from summary. After replacing pronoun, sentences are ranked using term frequency, sentence frequency, numerical figures and title words. If two sentences have at least 60% cosine similarity, the frequency of the larger sentence is increased, and the smaller sentence is removed to eliminate redundancy. Moreover, the first sentence is included in summary always if it contains any title word. In Bangla text, numerical figures can be presented both in words and digits with a variety of forms. All these forms are identified to assess the importance of sentences. We have used the rule-based system in this approach with hidden Markov model and Markov chain model. To explore the rules, we have analyzed 3,000 Bangla news documents and studied some Bangla grammar books. A series of experiments are performed on 200 Bangla news documents and 600 summaries (3 summaries are for each document). The evaluation results demonstrate the effectiveness of the proposed technique over the four latest methods. © 2017 KIPS.
引用
收藏
页码:752 / 777
页数:25
相关论文
共 22 条
  • [1] Automated Bangla Text Summarization by Sentence Scoring and Ranking
    Efat, Md. Iftekharul Alam
    Ibrahim, Mohammad
    Kayesh, Humayun
    2013 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2013,
  • [2] Pronoun Replacement Approach for Enhancing Arabic Text Summarization
    Migdadi, Aya
    Smadi, Lara
    Mustafa, Ahmad
    2022 13TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2022, : 309 - 314
  • [3] Improving the Quality of Text Summarization using Pronoun Replacement Technique
    Urolagin, Siddhaling
    Satish, Likitha
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 1991 - 1995
  • [4] Categorized Text Document Summarization in the Kannada Language by Sentence Ranking
    Jayashree, R.
    Murthy, Srikanta K.
    Anami, Basavaraj S.
    2012 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2012, : 776 - 781
  • [5] Automated Text Summarization: Sentence Refinement Approach
    Jusoh, Shaidah
    Masoud, Abdulsalam M.
    Alfawareh, Hejab M.
    DIGITAL INFORMATION PROCESSING AND COMMUNICATIONS, PT 2, 2011, 189 : 207 - +
  • [6] An approach to sentence-selection-based text summarization
    Chen, F
    Han, KS
    Chen, GL
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 489 - 493
  • [7] Enhaneement of Keyphrase-Based Approach of Automatie Bangla Text Summarization
    Haque, Md. Majharul
    Pervin, Suraiya
    Begum, Zerina
    PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 42 - 46
  • [8] Word-sentence co-ranking for automatic extractive text summarization
    Fang, Changjian
    Mu, Dejun
    Deng, Zhenghong
    Wu, Zhiang
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 72 : 189 - 195
  • [9] A Query Specific Graph Based Approach to Multi-document Text Summarization: Simultaneous Cluster and Sentence Ranking
    Pandit, Sandip R.
    Potey, M. A.
    2013 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND RESEARCH ADVANCEMENT (ICMIRA 2013), 2013, : 213 - 217
  • [10] Automatic Bangla Text Summarization Using Term Frequency and Semantic Similarity Approach
    Sarkar, Avik
    Hossen, Md Sharif
    2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,