Mining causality from texts for question answering system

被引:14
|
作者
Pechsiri, Chaveevan [1 ]
Kawtrakul, Asanee [1 ]
机构
[1] Kasetsart Univ, Bangkok, Thailand
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2007年 / E90D卷 / 10期
关键词
elementary discourse unit (EDU); multiple EDU causality extraction; causative antecedent; effective consequence; causality boundary identification; verb-pair rules extraction;
D O I
10.1093/ietisy/e90-d.10.1523
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This research aims to develop automatic knowledge mining of causality from texts for supporting an automatic question answering system (QA) in answering 'why' question, which is among the most crucial forms of questions. The out come of this research will assist people in diagnosing problems, such as in plant diseases, health, industrial and etc. While the previous works have extracted causality knowledge within only one or two adjacent EDUs (Elementary Discourse Units), this research focuses to mine causality knowledge existing within multiple EDUs which takes multiple causes and multiple effects in to consideration, where the adjacency between cause and effect is unnecessary. There are two main problems: how to identify the interesting causality events from documents, and how to identify the boundaries of the causative unit and the effective unit in term of the multiple EDUs. In addition, there are at least three main problems involved in boundaries identification: the implicit boundary de-limiter, the nonadjacent cause-consequence, and the effect surrounded by causes. This research proposes using verb-pair rules learnt by comparing the Naive Bayes classifier (NB) and Support Vector Machine (SVM) to identify causality EDUs in Thai agricultural and health news domains. The boundary identification problems are solved by utilizing verb-pair rules, Centering Theory and cue phrase set. The reason for emphasizing on using verbs to extract causality is that they explicitly make, in a certain way, the consequent events of cause-effect, e.g. 'Aphids suck the sap from rice leaves. Then leaves will shrink. Later, they will become yellow and dry.'. The outcome of the proposed methodology shown that the verb-pair rules extracted from NB outperform those extracted from SVM when the corpus contains high occurence of each verb, while the results from SVM is better than NB when the corpus contains less occurence of each verb. The verb-pair rules extracted from NB for causality extraction has the highest precision (0.88) with the recall of 0.75 from the plant disease corpus whereas from SVM has the highest precision (0.89) with the recall of 0.76 from bird flu news. For boundary determination, our methodology can handle very well with approximate 96% accuracy. In addition, the extracted causality results from this research can be generalized as laws in the Inductive-Statistical theory of Hempel's explanation theory, which will be useful for QA and reasoning.
引用
收藏
页码:1523 / 1533
页数:11
相关论文
共 50 条
  • [31] DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
    Ke, Pei
    Huang, Fei
    Mi, Fei
    Wang, Yasheng
    Liu, Qun
    Zhu, Xiaoyan
    Huang, Minlie
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9676 - 9691
  • [32] Information extraction and knowledge acquisition from texts using bilingual question-answering
    Kontos, John
    Malagardi, Ioanna
    Journal of Intelligent and Robotic Systems: Theory and Applications, 1999, 26 (02): : 103 - 122
  • [33] Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining
    Zhang, Yundong
    Niebles, Juan Carlos
    Soto, Alvaro
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 349 - 357
  • [34] Question Answering with Texts and Tables Through Deep Reinforcement Learning
    Jose, Marcos M.
    Cacao, Flavio N.
    Ribeiro, Maria F.
    Cheang, Rafael M.
    Pirozelli, Paulo
    Cozman, Fabio G.
    INTELLIGENT SYSTEMS, BRACIS 2024, PT II, 2025, 15413 : 339 - 353
  • [35] Mining Object Parts from CNNs via Active Question-Answering
    Zhang, Quanshi
    Cao, Ruiming
    Wu, Ying Nian
    Zhu, Song-Chun
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3890 - 3899
  • [36] Mining Implicit Relevance Feedback from User Behavior for Web Question Answering
    Shou, Linjun
    Bo, Shining
    Cheng, Feixiang
    Gong, Ming
    Pei, Jian
    Jiang, Daxin
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2931 - 2941
  • [37] SPBERTQA: A Two-Stage Question Answering System Based on Sentence Transformers for Medical Texts
    Nhung Thi-Hong Nguyen
    Phuong Phan-Dieu Ha
    Luan Thanh Nguyen
    Kiet Van Nguyen
    Ngan Luu-Thuy Nguyen
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 371 - 382
  • [38] Using machine learning and text mining in question answering
    Juarez-Gonzalez, Antonio
    Tellez-Valero, Alberto
    Denicia-Carral, Claudia
    Montes-y-Gomez, Manuel
    Villasenor-Pineda, Luis
    Evaluation of Multilingual and Multi-modal Information Retrieval, 2007, 4730 : 415 - 423
  • [39] KNOWLEDGE ENHANCED LATENT RELEVANCE MINING FOR QUESTION ANSWERING
    Wang, Dong
    Shen, Ying
    Zheng, Hai-Tao
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4282 - 4286
  • [40] Web question answering using impression mining technique
    Kumamoto, Tadahiko
    Tanaka, Katsumi
    IMECS 2007: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2007, : 893 - +