Exploring Numeracy in Word Embeddings

被引:0
|
作者
Naik, Aakanksha [1 ]
Ravichander, Abhilasha [1 ]
Rose, Carolyn [1 ]
Hovy, Eduard [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
NUMBERS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word embeddings are now pervasive across NLP subfields as the de-facto method of forming text representataions. In this work, we show that existing embedding models are inadequate at constructing representations that capture salient aspects of mathematical meaning for numbers, which is important for language understanding. Numbers are ubiquitous and frequently appear in text. Inspired by cognitive studies on how humans perceive numbers, we develop an analysis framework to test how well word embeddings capture two essential properties of numbers: magnitude (e.g. 3<4) and numeration (e.g. 3=three). Our experiments reveal that most models capture an approximate notion of magnitude, but are inadequate at capturing numeration. We hope that our observations provide a starting point for the development of methods which better capture numeracy in NLP systems.
引用
收藏
页码:3374 / 3380
页数:7
相关论文
共 50 条
  • [1] Methods for Numeracy-Preserving Word Embeddings
    Sundararaman, Dhanasekar
    Si, Shijing
    Subramanian, Vivek
    Wang, Guoyin
    Hazarika, Devamanyu
    Carin, Lawrence
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4742 - 4753
  • [2] Exploring Implicit Semantic Constraints for Bilingual Word Embeddings
    Jinsong Su
    Zhenqiao Song
    Yaojie Lu
    Mu Xu
    Changxing Wu
    Yidong Chen
    [J]. Neural Processing Letters, 2018, 48 : 1073 - 1088
  • [3] Exploring Implicit Semantic Constraints for Bilingual Word Embeddings
    Su, Jinsong
    Song, Zhenqiao
    Lu, Yaojie
    Xu, Mu
    Wu, Changxing
    Chen, Yidong
    [J]. NEURAL PROCESSING LETTERS, 2018, 48 (02) : 1073 - 1088
  • [4] Exploring the Adaptability of Word Embeddings to Log Message Classification
    Shehu, Yusufu
    Harper, Robert
    [J]. 2021 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2021), 2021, : 854 - 859
  • [5] Exploring fake news identification using word and sentence embeddings
    Priyanga, V. T.
    Sanjanasri, J. P.
    Menon, Vijay Krishna
    Gopalakrishnan, E. A.
    Soman, K. P.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (05) : 5441 - 5448
  • [6] New Word Analogy Corpus for Exploring Embeddings of Czech Words
    Svoboda, Lukas
    Brychcin, Tomas
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 103 - 114
  • [7] Exploring Bilingual Word Embeddings for Hiligaynon, a Low-Resource Language
    Michel, Leah
    Hangya, Viktor
    Fraser, Alexander
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2573 - 2580
  • [8] Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings
    Tkachenko, Maksim
    Chia, Chong Cher
    Lauw, Hady W.
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1212 - 1221
  • [9] HistorEx: Exploring Historical Text Corpora Using Word and Document Embeddings
    Mueller, Sven
    Brunzel, Michael
    Kaun, Daniela
    Biswas, Russa
    Koutraki, Maria
    Tietz, Tabea
    Sack, Harald
    [J]. SEMANTIC WEB: ESWC 2019 SATELLITE EVENTS, 2019, 11762 : 136 - 140
  • [10] Exploring Portuguese Word Embeddings for Discovering Lexical-Semantic Relations
    Sousa, Tiago
    Alves, Ana
    Oliveira, Hugo Goncalo
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 399 - 405