Examining the effect of whitening on static and contextualized word embeddings

被引:4
|
作者
Sasaki, Shota [1 ,2 ]
Heinzerling, Benjamin [1 ,2 ]
Suzuki, Jun [1 ,2 ]
Inui, Kentaro [1 ,2 ]
机构
[1] RIKEN, Sendai, Miyagi 9808579, Japan
[2] Tohoku Univ, Sendai, Miyagi 9808579, Japan
关键词
Static word embeddings; Contextualized word embeddings; Whitening; Frequency bias;
D O I
10.1016/j.ipm.2023.103272
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Static word embeddings (SWE) and contextualized word embeddings (CWE) are the foundation of modern natural language processing. However, these embeddings suffer from spatial bias in the form of anisotropy, which has been demonstrated to reduce their performance. A method to alleviate the anisotropy is the "whitening"transformation. Whitening is a standard method in signal processing and other areas, however, its effect on SWE and CWE is not well understood. In this study, we conduct an experiment to elucidate the effect of whitening on SWE and CWE. The results indicate that whitening predominantly removes the word frequency bias in SWE, and biases other than the word frequency bias in CWE.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages
    Suarez, Pedro Javier Ortiz
    Romary, Laurent
    Sagot, Benoit
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1703 - 1714
  • [22] Diachronic Sense Modeling with Deep Contextualized Word Embeddings: An Ecological View
    Hu, Renfen
    Li, Shen
    Liang, Shichen
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3899 - 3908
  • [23] MULTITOPIC TEXT CLUSTERING AND CLUSTER LABELING USING CONTEXTUALIZED WORD EMBEDDINGS
    Ostapiuk, Z., V
    Korotyeyeva, T. O.
    RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2020, (04) : 95 - 105
  • [24] Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings
    Zhai, Zenan
    Dat Quoc Nguyen
    Akhondi, Saber A.
    Thorne, Camilo
    Druckenbrodt, Christian
    Cohn, Trevor
    Gregory, Michelle
    Verspoor, Karin
    SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 328 - 338
  • [25] A comprehensive analysis of static word embeddings for Turkish
    Saritas, Karahan
    Oz, Cahid Arda
    Gungor, Tunga
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
  • [26] BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance
    Schick, Timo
    Schuetze, Hinrich
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3996 - 4007
  • [27] Improving Phrase Chunking by using Contextualized Word Embeddings for a Morphologically Rich Language
    Ehsan, Toqeer
    Khalid, Javairia
    Ambreen, Saadia
    Mustafa, Asad
    Hussain, Sarmad
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (08) : 9781 - 9799
  • [28] Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
    Akhtyamova, Liliya
    Martinez, Paloma
    Verspoor, Karin
    Cardiff, John
    IEEE ACCESS, 2020, 8 (164717-164726) : 164717 - 164726
  • [29] Together is Better: Hybrid Recommendations Combining Graph Embeddings and Contextualized Word Representations
    Polignano, Marco
    Musto, Cataldo
    de Gemmis, Marco
    Lops, Pasquale
    Semeraro, Giovanni
    15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 187 - 198
  • [30] Improving Phrase Chunking by using Contextualized Word Embeddings for a Morphologically Rich Language
    Toqeer Ehsan
    Javairia Khalid
    Saadia Ambreen
    Asad Mustafa
    Sarmad Hussain
    Arabian Journal for Science and Engineering, 2022, 47 : 9781 - 9799