IndIE: A Multilingual Open Information Extraction Tool For Indic Languages

被引:0
|
作者
Mishra, Ritwik [1 ]
Singh, Simranjeet [2 ]
Shah, Rajiv Ratn [1 ]
Kumaraguru, Ponnurangam [3 ,4 ]
Bhattacharyya, Pushpak
机构
[1] IIIT, Delhi, India
[2] NSUT, Delhi, India
[3] IIIT, Hyderabad, Telangana, India
[4] Indian Inst Technol, Bombay, Maharashtra, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open Information Extraction (OIE) is the process of extracting informative facts from opendomain natural language text. A multilingual OIE tool, IndIE, has been proposed, which performs chunking, creates a Merged-phrase Dependency Tree (MDT), and generates triples using hand-crafted rules. It is observed that fine-tuned transformer-based chunker outperforms other traditional methods of chunking. A benchmark called Hindi-BenchIE has also been developed for automatically evaluating Hindi triples. The developed OIE tool, IndIE, has been automatically evaluated on the golden-triples of 112 Hindi sentences. Compared to other multilingual methods, the IndIE method generates more meaningful triples with 0.51 F1-score. It is observed that IndIE generates more fine-grained triples than other methods. It is conjectured that IndIE has the ability to generate meaningful triples for Urdu, Tamil, and Telugu sentences as well because the developed chunker is shown to generalize across various natural languages, and the triple generation rules are based on dependency relations that are common to the aforementioned Indic languages.
引用
收藏
页码:312 / 326
页数:15
相关论文
共 50 条
  • [1] Multilingual Neural Machine Translation for Indic to Indic Languages
    Das, Sudhansu Bala
    Panda, Divyajyoti
    Mishra, Tapas Kumar
    Patra, Bidyut Kr.
    Ekbal, Asif
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (05)
  • [2] Multilingual Open Information Extraction
    Gamallo, Pablo
    Garcia, Marcos
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE-BK, 2015, 9273 : 711 - 722
  • [3] Improving Multilingual Neural Machine Translation System for Indic Languages
    Das, Sudhansu Bala
    Biradar, Atharv
    Mishra, Tapas Kumar
    Patra, Bidyut Kr.
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [4] Multilingual Open Information Extraction: Challenges and Opportunities
    Claro, Daniela Barreiro
    Souza, Marlo
    Xavier, Clarissa Castella
    Oliveira, Leandro
    [J]. INFORMATION, 2019, 10 (07)
  • [5] Multilingual open information extraction: Challenges and opportunities
    Claro, Daniela Barreiro
    Souza, Marlo
    Xavier, Clarissa Castellã
    Oliveira, Leandro
    [J]. Information (Switzerland), 2019, 10 (07):
  • [6] MACD: Multilingual Abusive Comment Detection at Scale for Indic Languages
    Gupta, Vikram
    Roychowdhury, Sumegh
    Das, Mithun
    Banerjee, Somnath
    Saha, Punyajoy
    Mathew, Binny
    Vanchinathan, Hastagiri
    Mukherjee, Animesh
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [7] MILIE: Modular & Iterative Multilingual Open Information Extraction
    Kotnis, Bhushan
    Gashteovski, Kiril
    Onoro-Rubio, Daniel
    Shaker, Ammar
    Rodriguez-Tembras, Vanesa
    Takamoto, Makoto
    Niepert, Mathias
    Lawrence, Carolin
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6939 - 6950
  • [8] DetIE: Multilingual Open Information Extraction Inspired by Object Detection
    Vasilkovsky, Michael
    Alekseev, Anton
    Malykh, Valentin
    Shenbin, Ilya
    Tutubalina, Elena
    Salikhov, Dmitriy
    Stepnov, Mikhail
    Chertok, Andrey
    Nikolenko, Sergey
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11412 - 11420
  • [9] Logical-linguistic model for multilingual Open Information Extraction
    Khairova, Nina
    Mamyrbayev, Orken
    Mukhsina, Kuralay
    Kolesnyk, Anastasiia
    [J]. COGENT ENGINEERING, 2020, 7 (01):
  • [10] Intent Identification and Entity Extraction for Healthcare Queries in Indic Languages
    Mullick, Ankan
    Mondal, Ishani
    Ray, Sourjyadip
    Raghav, R.
    Chaitanya, G. Sai
    Goyal, Pawan
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1870 - 1881