Word n-gram attention models for sentence similarity and inference

被引:29
|
作者
Lopez-Gazpio, I [1 ]
Maritxalar, M. [1 ]
Lapata, M. [2 ]
Agirre, E. [1 ]
机构
[1] Univ Basque Country, UPV EHU, Comp Sci Fac, IXA NLP Grp, Manuel Lardizabal 1, Donostia San Sebastian 20018, Basque Country, Spain
[2] Univ Edinburgh, Sch Informat, Inst Language Cognit & Computat, 10 Crichton St, Edinburgh EH8 9AB, Midlothian, Scotland
关键词
Attention models; Deep learning; Natural language understanding; Natural Language Inference; Semantic textual similarity;
D O I
10.1016/j.eswa.2019.04.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic Textual Similarity and Natural Language Inference are two popular natural language understanding tasks used to benchmark sentence representation models where two sentences are paired. In such tasks sentences are represented as bag of words, sequences, trees or convolutions, but the attention model is based on word pairs. In this article we introduce the use of word n-grams in the attention model. Our results on five datasets show an error reduction of up to 41% with respect to the word-based attention model. The improvements are especially relevant with low data regimes and, in the case of natural language inference, on the recently released hard subset of Natural Language Inference datasets. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [41] Improving n-gram models by incorporating enhanced distributions
    OBoyle, P
    Ming, J
    McMahon, J
    Smith, FJ
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 168 - 171
  • [42] JOINT TRAINING OF INTERPOLATED EXPONENTIAL N-GRAM MODELS
    Sethy, Abhinav
    Chen, Stanley
    Arisoy, Ebru
    Ramabhadran, Bhuvana
    Audkhasi, Kartik
    Narayanan, Shrikanth
    Vozila, Paul
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 25 - 30
  • [43] Predicting Domain Generation Algorithms with N-Gram Models
    Mu, ZiCheng
    [J]. 2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 31 - 38
  • [44] Word-wise Explanation Method For Deep Learning Models Using Character N-gram Input
    Aksayli, N. Deniz
    Islek, Irem
    Karaman, Cagla Cig
    Gungor, Onur
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [45] New word clustering method for building n-gram language models in continuous speech recognition systems
    Bahrani, Mohammad
    Sameti, Hossein
    Hafezi, Nazila
    Momtazi, Saeedeh
    [J]. NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 286 - 293
  • [46] N-gram Insight
    Prans, George
    [J]. AMERICAN SCIENTIST, 2011, 99 (05) : 356 - 357
  • [47] Self-Organizing n-gram Model for Automatic Word Spacing
    Park, Seong-Bae
    Tae, Yoon-Shik
    Park, Se-Young
    [J]. COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 633 - 640
  • [48] Learning Chinese word representation better by cascade morphological n-gram
    Zongyang Xiong
    Ke Qin
    Haobo Yang
    Guangchun Luo
    [J]. Neural Computing and Applications, 2021, 33 : 3757 - 3768
  • [49] Exploration of N-gram Features for the Domain Adaptation of Chinese Word Segmentation
    Guo, Zhen
    Zhang, Yujie
    Su, Chen
    Xu, Jinan
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, 2012, 333 : 121 - 131
  • [50] A fast and flexible architecture for very large word n-gram datasets
    Flor, Michael
    [J]. NATURAL LANGUAGE ENGINEERING, 2013, 19 (01) : 61 - 93