Building a text collection for Urdu information retrieval

被引:5
|
作者
Rasheed, Imran [1 ]
Banka, Haider [1 ]
Khan, Hamaid M. [2 ]
机构
[1] Indian Inst Technol ISM, Dept Comp Sci & Engn, Dhanbad, Bihar, India
[2] Fatih Sultan Mehmet Vakif Univ, Aluteam, Istanbul, Turkey
关键词
Assessors agreement; relevance judgment; text collection construction and evaluation; Urdu corpus; Urdu information retrieval; CORPUS;
D O I
10.4218/etrij.2019-0458
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Urdu is a widely spoken language in the Indian subcontinent with over 300 million speakers worldwide. However, linguistic advancements in Urdu are rare compared to those in other European and Asian languages. Therefore, by following Text Retrieval Conference standards, we attempted to construct an extensive text collection of 85 304 documents from diverse categories covering over 52 topics with relevance judgment sets at 100 pool depth. We also present several applications to demonstrate the effectiveness of our collection. Although this collection is primarily intended for text retrieval, it can also be used for named entity recognition, text summarization, and other linguistic applications with suitable modifications. Ours is the most extensive existing collection for the Urdu language, and it will be freely available for future research and academic education.
引用
收藏
页码:856 / 868
页数:13
相关论文
共 50 条
  • [1] Building a text collection for Urdu information retrieval (vol 43, pg 856, 2021)
    Rasheed, Imran
    Banka, Haider
    Khan, Hamaid M.
    Daud, Ali
    [J]. ETRI JOURNAL, 2022, 44 (01) : 168 - 168
  • [2] CURE: Collection for Urdu Information Retrieval Evaluation and Ranking
    Iqbal, Muntaha
    Tahir, Bilal
    Mehmood, Muhammad Amir
    [J]. 2021 INTERNATIONAL CONFERENCE ON DIGITAL FUTURES AND TRANSFORMATIVE TECHNOLOGIES (ICODT2), 2021,
  • [3] Corrigendum to: Building a text collection for Urdu information retrieval (ETRI Journal, (2021), 43, 5, (856-868), 10.4218/etrij.2019-0458)
    Rasheed, Imran
    Banka, Haider
    Khan, Hamaid M.
    Daud, Ali
    [J]. ETRI Journal, 2022, 44 (01)
  • [4] INSTRUMENTS FOR THE COLLECTION OF TEXT DATA-RETRIEVAL INFORMATION
    LEONTEVA, NN
    REZNITSKAYA, DL
    [J]. NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1983, (02): : 9 - 16
  • [5] Query Expansion in Information Retrieval for Urdu Language
    Rasheed, Imran
    Banka, Haider
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 171 - 176
  • [6] Context-aware Urdu Information Retrieval System
    Shoaib, Umar
    Fiaz, Laiba
    Chakraborty, Chinmay
    Rauf, Hafiz Tayyab
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [7] Text Information Retrieval in Tetun
    de Jesus, Gabriel
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III, 2023, 13982 : 429 - 435
  • [8] Text databases and information retrieval
    [J]. ACM Comput Surv, 1 (133):
  • [9] Text Analysis and Information Retrieval of Text Data
    Gupta, Honey
    Kottwani, Aveena
    Gogia, Soniya
    Chaudhari, Sheetal
    [J]. PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 788 - 792
  • [10] Text mining and information retrieval
    Forest, Dominic
    Da Sylva, Lyne
    [J]. CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2011, 35 (03): : 217 - 227