Linguistic Information in Word Embeddings

被引:0
|
作者
Basirat, Ali [1 ]
Tang, Marc [1 ]
机构
[1] Uppsala Univ, Dept Linguist & Philol, Uppsala, Sweden
关键词
Neural network; Nominal classification; Swedish Word embedding; DIMENSIONALITY;
D O I
10.1007/978-3-030-05453-3_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical grammatical and semantic features are easily captured by word embeddings. The classification of semantic features required significantly less neurons than grammatical features in our experiments based on a single layer feed-forward neural network. However, semantic features also generated higher entropy in the classification output despite its high accuracy. Furthermore, the count/mass distinction resulted in difficulties to the model, even though the quantity of neurons was almost tuned to its maximum.
引用
收藏
页码:492 / 513
页数:22
相关论文
共 50 条
  • [1] Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource
    Tulkens, Stephan
    Emmery, Chris
    Daelemans, Walter
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4130 - 4136
  • [2] Combining Transformer Embeddings with Linguistic Features for Complex Word Identification
    Ortiz-Zambrano, Jenny A.
    Espin-Riofrio, Cesar
    Montejo-Raez, Arturo
    [J]. ELECTRONICS, 2023, 12 (01)
  • [3] Enriching Portuguese Word Embeddings with Visual Information
    Consoli, Bernardo Scapini
    Vieira, Renata
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 435 - 440
  • [4] Specializing Word Embeddings (for Parsing) by Information Bottleneck
    Li, Xiang Lisa
    Eisner, Jason
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4745 - 4749
  • [5] Learning Word Embeddings Using Spatial Information
    Joko, Hideaki
    Oka, Ryunosuke
    Uchide, Hayato
    Itsui, Hiroyasu
    Otsuka, Takahiro
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 2959 - 2964
  • [6] Refining Word Embeddings with Sentiment Information for Sentiment Analysis
    Kasri, Mohammed
    Birjali, Marouane
    Nabil, Mohamed
    Beni-Hssane, Abderrahim
    El-Ansari, Anas
    El Fissaoui, Mohamed
    [J]. Journal of ICT Standardization, 2022, 10 (03): : 353 - 382
  • [7] Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation
    Miaschi, Alessio
    Dell'Orletta, Felice
    [J]. 5TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2020), 2020, : 110 - 119
  • [8] Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings
    Sorato, Danielly
    Fileto, Renato
    [J]. PROCEEDINGS OF THE XV BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS, SBSI 2019: Complexity on Modern Information Systems, 2019,
  • [9] BioWordVec, improving biomedical word embeddings with subword information and MeSH
    Zhang, Yijia
    Chen, Qingyu
    Yang, Zhihao
    Lin, Hongfei
    Lu, Zhiyong
    [J]. SCIENTIFIC DATA, 2019, 6 (1)
  • [10] Learning Diachronic Word Embeddings with Iterative Stable Information Alignment
    Lin, Zefeng
    Wan, Xiaojun
    Guo, Zongming
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 749 - 760