Query Expansion in Resource-Scarce Languages: A Multilingual Framework Utilizing Document Structure

被引:2
|
作者
Atreya, Arjun, V [1 ]
Kankaria, Ashish [1 ]
Bhattacharyya, Pushpak [1 ]
Ramakrishnan, Ganesh [1 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Bombay 400076, Maharashtra, India
关键词
Query expansion; resource scarce languages; multilingual retrieval;
D O I
10.1145/2997643
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Retrievals in response to queries to search engines in resource-scarce languages often produce no results, which annoys the user. In such cases, at least partially relevant documents must be retrieved. We propose a novel multilingual framework, MultiStructPRF, which expands the query with related terms by (i) using a resource-rich assisting language and (ii) giving varied importance to the expansion terms depending on their position of occurrence in the document. Our system uses the help of an assisting language to expand the query in order to improve system recall. We propose a systematic expansion model for weighting the expansion terms coming from different parts of the document. To combine the expansion terms from query language and assisting language, we propose a heuristics-based fusion model. Our experimental results show an improvement over other PRF techniques in both precision and recall for multiple resource-scarce languages like Marathi, Bengali, Odia, Finnish, and the like. We study the effect of different assisting languages on precision and recall for multiple query languages. Our experiments reveal an interesting fact: Precision is positively correlated with the typological closeness of query language and assisting language, whereas recall is positively correlated with the resource richness of the assisting language.
引用
收藏
页数:17
相关论文
共 17 条
  • [1] Transliteration for resource-scarce languages
    Chinnakotla M.K.
    Damani O.P.
    Satoskar A.
    [J]. ACM Transactions on Asian Language Information Processing, 2010, 9 (04):
  • [2] NLP Web Services for Resource-Scarce Languages
    Puttkammer, M. J.
    Eiselen, E. R.
    Hocking, J.
    Koen, F. J.
    [J]. 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2018, : 43 - 49
  • [3] Automatic diacritic restoration for resource-scarce languages
    De Pauw, Guy
    Wagacha, Peter W.
    de Schryver, Gilles-Maurice
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 170 - +
  • [4] ASR Corpus Design for Resource-Scarce Languages
    Barnard, Etienne
    Davel, Marelie
    van Heerden, Charl
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2823 - 2826
  • [5] Developing Core Technologies for Resource-Scarce Nguni Languages
    du Toit, Jakobus S.
    Puttkammer, Martin J.
    [J]. INFORMATION, 2021, 12 (12)
  • [6] Small-Vocabulary Speech Recognition for Resource-Scarce Languages
    Qiao, Fang
    Sherwani, Jahanzeb
    Rosenfeld, Roni
    [J]. PROCEEDINGS OF THE FIRST ACM SYMPOSIUM ON COMPUTING FOR DEVELOPMENT (ACM DEV 2010), 2010,
  • [7] Viability of Neural Networks for Core Technologies for Resource-Scarce Languages
    Loubser, Melinda
    Puttkammer, Martin J.
    [J]. INFORMATION, 2020, 11 (01)
  • [8] Tower of Babel: A Crowdsourcing Game Building Sentiment Lexicons for Resource-scarce Languages
    Hong, Yoonsung
    Kwak, Haewoon
    Baek, Youngmin
    Moon, Sue
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 549 - 556
  • [9] Metaphor Annotation in SesothoText Corpus Towards the Representation of Resource-Scarce Languages in NLP
    Mahloane, Malefu Justina
    Trausan-Matu, Stefan
    [J]. 2015 20TH INTERNATIONAL CONFERENCE ON CONTROL SYSTEMS AND COMPUTER SCIENCE, 2015, : 405 - 410
  • [10] Multilingual sentiment analysis: from formal to informal and scarce resource languages
    Lo, Siaw Ling
    Cambria, Erik
    Chiong, Raymond
    Cornforth, David
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2017, 48 (04) : 499 - 527