An Empirical Investigation of Word Class-Based Features for Natural Language Understanding

被引:4
|
作者
Celikyilmaz, Asli [1 ]
Sarikaya, Ruhi [1 ]
Jeong, Minwoo [1 ]
Deoras, Anoop [2 ,3 ]
机构
[1] Microsoft Corp, Intent Sci Team, Redmond, WA 98052 USA
[2] Microsoft Corp, Conversat Understanding Sci, Redmond, WA 98052 USA
[3] Netflix, Algorithms Res & Engn Grp, Los Gatos, CA 95032 USA
关键词
Class-based features; conditional random fields; exponential models; natural language understanding; regularization; shrinkage features; NETWORKS;
D O I
10.1109/TASLP.2015.2511925
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
There are many studies that show using class-based features improves the performance of natural language processing (NLP) tasks such as syntactic part-of-speech tagging, dependency parsing, sentiment analysis, and slot filling in natural language understanding (NLU), but not much has been reported on the underlying reasons for the performance improvements. In this paper, we investigate the effects of the word class-based features for the exponential family of models specifically focusing on NLU tasks, and demonstrate that the performance improvements could be attributed to the regularization effect of the class-based features on the underlying model. Our hypothesis is based on empirical observation that shrinking the sum of parameter magnitudes in an exponential model tends to improve performance. We show on several semantic tagging tasks that there is a positive correlation between the model size reduction by the addition of the class-based features and the model performance on a held-out dataset. We also demonstrate that class-based features extracted from different data sources using alternate word clustering methods can individually contribute to the performance gain. Since the proposed features are generated in an unsupervised manner without significant computational overhead, the improvements in performance largely come for free and we show that such features provide gains for a wide range of tasks from semantic classification and slot tagging in NLU to named entity recognition (NER).
引用
收藏
页码:994 / 1005
页数:12
相关论文
共 50 条
  • [41] Transformer-based Natural Language Understanding and Generation
    Zhang, Feng
    An, Gaoyun
    Ruan, Qiuqi
    [J]. 2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 281 - 284
  • [42] MENU-BASED NATURAL-LANGUAGE UNDERSTANDING
    TENNANT, H
    [J]. AFIPS CONFERENCE PROCEEDINGS, 1984, 53 : 629 - &
  • [43] When is a word not just a word? An investigation into the dissonance and synergy between intention and understanding of the language of feedback in legal education
    Jones, Dawn
    Ellison, Lynn
    [J]. LAW TEACHER, 2021, 55 (02): : 155 - 168
  • [44] Investigation on the relationship between bioconcentration factor and distribution coefficient based on class-based compounds: The factors that affect bioconcentration
    Wang, Yu
    Wen, Yang
    Li, Jin J.
    He, Jia
    Qin, Wei C.
    Su, Li M.
    Zhao, Yuan H.
    [J]. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY, 2014, 38 (02) : 388 - 396
  • [45] Word-based statistical compressors as natural language compression boosters
    Farina, Antonio
    Navarro, Gonzalo
    Parama, Jose R.
    [J]. DCC: 2008 DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2008, : 162 - +
  • [46] Word-Based Self-Indexes for Natural Language Text
    Farina, Antonio
    Brisaboa, Nieves R.
    Navarro, Gonzalo
    Claude, Francisco
    Places, Angeles S.
    Rodriguez, Eduardo
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2012, 30 (01)
  • [47] Multi-Label Learning with Class-Based Features Using Extended Centroid-Based Classification Technique (CCBF)
    Devi, P. R. Suganya
    Baskaran, R.
    Abirami, S.
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2015/INDIA ELEVENTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2015/NDIA ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2015, 2015, 54 : 405 - 411
  • [48] Automatic Diagram Drawing Based on Natural Language Text Understanding
    Mukherjee, Anirban
    Garain, Utpal
    [J]. DIAGRAMMATIC REPRESENTATION AND INFERENCE, PROCEEDINGS, 2008, 5223 : 398 - +
  • [49] Intellectual search systems based on the model of natural language understanding
    Kargin, A. A.
    Paramonov, A., I
    [J]. 2005 IEEE INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS, 2005, : 150 - 154
  • [50] Morpheme Level Hierarchical Pitman-Yor Class-based Language Models for LVCSR of Morphologically Rich Languages
    Mousa, Amr El-Desoky
    Shaik, M. Ali Basha
    Schlueter, Ralf
    Ney, Hermann
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3376 - 3380