SandhiKosh: A Benchmark Corpus for Evaluating Sanskrit Sandhi Tools

被引:0
|
作者
Bhardwaj, Shubham [1 ]
Gantayat, Neelamadhav [1 ]
Chaturvedi, Nikhil [1 ]
Garg, Rahul [1 ]
Agarwal, Sumeet [1 ]
机构
[1] Indian Inst Technol, IBM Res, New Delhi, India
关键词
Sanskrit; Sandhi; Morphophonology; HATHA YOGA;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Sanskrit is an ancient Indian language. Several important texts which are of interest to people all over the world today were written in Sanskrit. The Sanskrit grammar has a precise and complete specification given in the text Astadhyayi by Panini. This has led to the development of a number of Sanskrit Computational Linguistics tools for processing and analyzing Sanskrit texts. Unfortunately, there has been no effort to standardize and critically validate these tools. In this paper, we develop a Sanskrit benchmark called SandhiKosh to evaluate the completeness and accuracy of Sanskrit Sandhi tools. We present the results of this benchmark on three most prominent Sanskrit tools and demonstrate that these tools have substantial scope for improvement. This benchmark will be freely available to researchers worldwide and we hope it will help everyone working in this area evaluate and validate their tools.
引用
收藏
页码:4494 / 4500
页数:7
相关论文
共 50 条
  • [1] BEFRIEND - A benchmark for evaluating reverse engineering tools
    Fülöp, Lajos Jeno
    Hegedus, Péter
    Ferenc, Rudolf
    Periodica Polytechnica Electrical Engineering, 2008, 52 (3-4): : 153 - 162
  • [2] Benchmark Tools for Evaluating AGVs at Industrial Environments
    Yuste, Hector
    Armesto, Leopoldo
    Tornero, Josep
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 2657 - 2662
  • [3] Towards a Benchmark for Evaluating Reverse Engineering Tools
    Fueloep, Lajos Jeno
    Hegedus, Peter
    Ferenc, Rudolf
    Gyimothy, Tibor
    FIFTEENTH WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, 2008, : 335 - 336
  • [4] Neural Compound-Word (Sandhi) Generation and Splitting in Sanskrit Language
    Dave, Sushant
    Singh, Arun Kumar
    Prathosh, A. P.
    Lall, Brejesh
    CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 171 - 177
  • [5] Towards a benchmark for evaluating design pattern miner tools
    Fueloep, Lajos Jeno
    Ferenc, Rudolf
    Gyimothy, Tibor
    CSMR 2008: 12TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING: DEVELOPING EVOLVABLE SYSTEMS, 2008, : 143 - 152
  • [6] Generating synthetic benchmark circuits for evaluating CAD tools
    Stroobandt, D
    Verplaetse, P
    Van Campenhout, J
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2000, 19 (09) : 1011 - 1022
  • [7] Sanskrit Sandhi Splitting using seq2(seq)2
    Aralikatte, Rahul
    Gantayat, Neelamadhav
    Panwar, Naveen
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4909 - 4914
  • [8] Evaluating Tagsets for Sanskrit
    Gopal, Madhav
    Mishra, Diwakar
    Singh, Devi Priyanka
    SANSKRIT COMPUTATIONAL LINGUISTICS, 2010, 6465 : 150 - 161