Lexicalized and statistical parsing of natural language text in Tamil using hybrid language models

被引:0
|
作者
Selvam, M. [1 ]
Natarajan, A.M. [2 ]
Thangarajan, R. [1 ]
机构
[1] Department of Information Technology, Kongu Engineering College, Perundurai, Erode, Tamilnadu 638052, India
[2] Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu 638401, India
来源
| 2008年 / WSEAS卷 / 07期
关键词
Forestry - Statistics - Syntactics - Context free grammars - Natural language processing systems;
D O I
暂无
中图分类号
学科分类号
摘要
Parsing is an important process of Natural Language Processing (NLP) and Computational Linguistics which is used to understand the syntax and semantics of a natural language (NL) sentences confined to the grammar. Parser is a computational system which processes input sentence according to the productions of the grammar, and builds one or more constituent structures which conform to the grammar. The interpretation of natural language text depends on the context also. Language models need syntax and semantic coverage for the better interpretation of natural language sentences in small and large vocabulary tasks. Though statistical parsing with trigram language models gives better performance through tri-gram probabilities and large vocabulary size, it has some disadvantages like lack of support in syntax, free ordering of words and long term relationship. Grammar based structural parsing provides solutions to some extent but it is very tedious for larger vocabulary corpus. To overcome these disadvantages, structural component is to be involved in statistical approach which results in hybrid language models like phrase and dependency structure language models. To add the structural component, balance the vocabulary size and meet the challenging features of Tamil language, Lexicalized and Statistical Parsing (LSP) is to be employed with the assistance of hybrid language models. This paper focuses on lexicalized and statistical parsing of natural language text in Tamil language with comparative analysis of phrase and dependency language models. For the development of hybrid language models, new part of speech (POS) tag set with more than 500 tags and dependency tag set with 31 tags for Tamil language have been developed which have the wider coverage. Phrase and dependency structure treebanks have been developed with 3261 Tamil sentences which cover 51026 words. Hybrid language models were developed using these treebanks, employed in LSP and evaluated against gold standards. This LSP with hybrid language models provides better results and covers all the challenging features of Tamil language.
引用
收藏
相关论文
共 50 条
  • [41] TEXT CONDITIONING AND STATISTICAL LANGUAGE MODELING FOR ROMANIAN LANGUAGE
    Domokos, Jozsef
    Toderean, Gavril
    Buza, Ovidiu
    FROM SPEECH PROCESSING TO SPOKEN LANGUAGE TECHNOLOGY, 2009, : 161 - 168
  • [42] Language Resource Addition Strategies for Raw Text Parsing
    Ushiku, Atsushi
    Sasada, Tetsuro
    Mori, Shinsuke
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 664 - 671
  • [43] A Universal Prompting Strategy for Extracting Process Model Information from Natural Language Text Using Large Language Models
    Neuberger, Julian
    Ackermann, Lars
    van der Aa, Han
    Jablonski, Stefan
    CONCEPTUAL MODELING, ER 2024, 2025, 15238 : 38 - 55
  • [44] PARSING NATURAL-LANGUAGE - KING,M
    THOMPSON, H
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION A-HUMAN EXPERIMENTAL PSYCHOLOGY, 1987, 39 (04): : 835 - 836
  • [45] NeuralParse a neural model for parsing natural language
    Salerno, J
    SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 2963 - 2968
  • [46] PARSING NATURAL-LANGUAGE - KING,M
    BRISCOE, EJ
    JOURNAL OF LINGUISTICS, 1984, 20 (02) : 390 - 392
  • [47] Issues in Parsing and POS Tagging of Hybrid Language
    Atrey, Shree Harsh
    Prasad, T. V.
    Krishna, G. Rama
    2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND CYBERNETICS (CYBERNETICSCOM), 2012, : 20 - 24
  • [48] A Comparative Study on Parsing in Natural Language Processing
    Kiran, Sandeep
    Charan, Shesha Sai R.
    Priyanka
    Pooja, M. R.
    PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 785 - 788
  • [49] Neural networks for parsing natural language sentences
    Marchesi, M
    Barabino, G
    Benedicenti, L
    MELECON '96 - 8TH MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, PROCEEDINGS, VOLS I-III: INDUSTRIAL APPLICATIONS IN POWER SYSTEMS, COMPUTER SCIENCE AND TELECOMMUNICATIONS, 1996, : 1476 - 1479
  • [50] Towards Incremental Parsing of Natural Language Using Recursive Neural Networks
    F. Costa
    P. Frasconi
    V. Lombardo
    G. Soda
    Applied Intelligence, 2003, 19 : 9 - 25