Automatic Keyword and Sentence-Based Text Summarization for Software Bug Reports

被引:14
|
作者
Jindal, Shubhra Goyal [1 ]
Kaur, Arvinder [1 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat & Commun Technol, New Delhi 110078, India
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Text summarization; rapid automatic keyword extraction; fuzzy c-means; hierarchical clustering; bug reports; rule engine; MODEL;
D O I
10.1109/ACCESS.2020.2985222
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text Summarization is a process which efficiently retrieves the relevant information from documents. The objective of the proposed, unsupervised approach is to summarize bug reports (software artefacts) with complete content and diversified information. The proposed approach utilizes Rapid Automatic Keyword Extraction and term frequency-inverse document frequency method to extract meaningful keywords and key-phrases with a relevant score. For sentence extraction, fuzzy C-means clustering is used to extracts sentences having high degree of membership from each cluster above a set threshold value. A rule-engine is used for sentence selection. The rules are generated with the domain knowledge and based on the extracted information by the keywords and sentences selected by the clustering method. Cohesive and coherent summary is generated by the proposed method on apache bug reports. For redundancy removal and to re-rank generated summary, hierarchical clustering is presented to enrich the extracted summary. The proposed approach is evaluated on newly constructed Apache project Bug Report Corpus (APBRC) and existing Bug Report Corpus (BRC). The results are compared on the basis of performance metrics such as precision, recall, pyramid precision and F-score. The experimental results depict that our proposed approach attains significant improvement over other baseline approaches such as BRC and LRCA. It also attains significant improvement over existing state-of-art unsupervised approaches such as Hurried, centroid and others. It extracts significant keyword phrases and sentences from each cluster to achieve full coverage and coherent summary. The results evaluated on APBRC corpus attains an average value of 78.22 & x0025;, 82.18 & x0025;, 80.10 & x0025; and 81.66 & x0025; for precision, recall, f-score and pyramid precision respectively.
引用
收藏
页码:65352 / 65370
页数:19
相关论文
共 50 条
  • [1] Automatic text summarization based on keyword derivation
    Ando, K
    Yamasaki, T
    Shishibori, M
    Aoe, JI
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 464 - 469
  • [2] Automatic Summarization of Bug Reports
    Rastkar, Sarah
    Murphy, Gail C.
    Murray, Gabriel
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2014, 40 (04) : 366 - 380
  • [3] Chinese Automatic Text Summarization Based on Keyword Extraction
    Jiang Xiao-yu
    [J]. FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 225 - 228
  • [4] Sentence reduction for automatic text summarization
    Jing, HY
    [J]. 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 310 - 315
  • [5] Sentence-based Classification of Free-text Breast Cancer Radiology Reports
    Maghsoodi, Aisan
    Sevenster, Merlijn
    Scholtes, Johannes
    Nalbantov, Georgi
    [J]. 2012 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2012,
  • [6] Sentence-Based Text Analysis for Customer Reviews
    Bueschken, Joachim
    Allenby, Greg M.
    [J]. MARKETING SCIENCE, 2016, 35 (06) : 953 - 975
  • [7] Connectionist models for sentence-based text extracts
    Demiros, I
    Antonopoulos, V
    Georgantopoulos, B
    Triantafyllou, Y
    Piperidis, S
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 2648 - 2653
  • [8] A new sentence similarity measure and sentence based extractive technique for automatic text summarization
    Aliguliyev, Ramiz M.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) : 7764 - 7772
  • [9] Language-independent extractive automatic text summarization based on automatic keyword extraction
    Hernandez-Castaneda, Angel
    Arnulfo Garcia-Hernandez, Rene
    Ledeneva, Yulia
    Eduardo Millan-Hernandez, Christian
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 71
  • [10] Automatic Thai Text Summarization Using Keyword-Based Abstractive Method
    Ngamcharoen, Parun
    Sanglerdsinlapachai, Nuttapong
    Vejjanugraha, Pikul
    [J]. 2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,