Open information extraction as an intermediate semantic structure for Persian text summarization

被引:3
|
作者
Rahat, Mahmoud [1 ]
Talebpour, Alireza [1 ]
机构
[1] Shahid Beheshti Univ, Fac Comp Sci & Engn, Tehran, Iran
关键词
Text summarization; Extractive summary; Open information extraction; Persian (Farsi) text processing;
D O I
10.1007/s00799-018-0244-z
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Semantic applications typically exploit structures such as dependency parse trees, phrase-chunking, semantic role labeling or open information extraction. In this paper, we introduce a novel application of Open IE as an intermediate layer for text summarization. Text summarization is an important method for providing relevant information in large digital libraries. Open IE is referred to the process of extracting machine-understandable structural propositions from text. We use these propositions as a building block to shorten the sentence and generate a summary of the text. The proposed system offers a new form of summarization that is able to break the structure of the sentence and extract the most significant sub-sentential elements. Other advantages include the ability to identify and eliminate less important sections of the sentence (such as adverbs, adjectives, appositions or dependent clauses), or duplicate pieces of sentences which in turn opens up the space for entering more sentences in the summary to enhance the coverage and coherency of it. The proposed system is localized for Persian language; however, it can be adopted to other languages. Experiments performed on a standard data set Pasokh with a standard comparison tool showed promising results for the proposed approach. We used summaries produced by the system in a real-world application in the virtual library of Shahid Beheshti University and received good feedbacks from users.
引用
收藏
页码:339 / 352
页数:14
相关论文
共 50 条
  • [1] Structured Text Summarization via Open Domain Information Extraction
    Hao, Zengguang
    Xu, Binxia
    Zheng, Shiyuan
    Gao, Yang
    [J]. PROCEEDINGS OF THE 2018 IEEE 22ND INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN ((CSCWD)), 2018, : 701 - 706
  • [2] AN APPROACH FOR COMBINING SEMANTIC INFORMATION AND PROXIMITY INFORMATION FOR TEXT SUMMARIZATION
    Jeong, Hogyeong
    Yun, Yeogirl
    [J]. KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 427 - 432
  • [3] Text Summarization towards Scientific Information Extraction
    Keller, Abigail
    Furst, Jacob
    Raicu, Daniela
    Hastings, Peter
    Tchoua, Roselyne
    [J]. 2022 IEEE 18TH INTERNATIONAL CONFERENCE ON E-SCIENCE (ESCIENCE 2022), 2022, : 225 - 235
  • [4] A Topic Information Fusion and Semantic Relevance for Text Summarization
    You, Fucheng
    Zhao, Shuai
    Chen, Jingjing
    [J]. IEEE ACCESS, 2020, 8 : 178946 - 178953
  • [5] Enhancing Biomedical Text Summarization Using Semantic Relation Extraction
    Shang, Yue
    Li, Yanpeng
    Lin, Hongfei
    Yang, Zhihao
    [J]. PLOS ONE, 2011, 6 (08):
  • [6] Modeling user knowledge and semantic structure for information extraction from text
    Moertl, PM
    [J]. ICCM - 2001: PROCEEDINGS OF THE 2001 FOURTH INTERNATIONAL CONFERENCE ON COGNITIVE MODELING, 2001, : 283 - 284
  • [7] AHP TECHNIQUES FOR PERSIAN TEXT SUMMARIZATION
    Tofighy, Seyyed Mohsen
    Raj, Ram Gopal
    Javadi, Hamid Haj Seyyed
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2013, 26 (01) : 1 - 8
  • [8] Automatic Persian Text Summarization Using Linguistic Features from Text Structure Analysis
    Heidary, Ebrahim
    Parvin, Hamid
    Nejatian, Samad
    Bagherifard, Karamollah
    Rezaie, Vahideh
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (03): : 2845 - 2861
  • [9] Parsa: An open information extraction system for Persian
    Rahat, Mahmoud
    Talebpour, Alireza
    [J]. DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2018, 33 (04) : 874 - 893
  • [10] Information-content based sentence extraction for text summarization
    Mallett, D
    Elding, J
    Nascimento, MA
    [J]. ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 2, PROCEEDINGS, 2004, : 214 - 218