Examining the effect of whitening on static and contextualized word embeddings

被引:4
|
作者
Sasaki, Shota [1 ,2 ]
Heinzerling, Benjamin [1 ,2 ]
Suzuki, Jun [1 ,2 ]
Inui, Kentaro [1 ,2 ]
机构
[1] RIKEN, Sendai, Miyagi 9808579, Japan
[2] Tohoku Univ, Sendai, Miyagi 9808579, Japan
关键词
Static word embeddings; Contextualized word embeddings; Whitening; Frequency bias;
D O I
10.1016/j.ipm.2023.103272
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Static word embeddings (SWE) and contextualized word embeddings (CWE) are the foundation of modern natural language processing. However, these embeddings suffer from spatial bias in the form of anisotropy, which has been demonstrated to reduce their performance. A method to alleviate the anisotropy is the "whitening"transformation. Whitening is a standard method in signal processing and other areas, however, its effect on SWE and CWE is not well understood. In this study, we conduct an experiment to elucidate the effect of whitening on SWE and CWE. The results indicate that whitening predominantly removes the word frequency bias in SWE, and biases other than the word frequency bias in CWE.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Predicting Quality and Popularity of a Movie From Plot Summary and Character Description Using Contextualized Word Embeddings
    Lee, Jung-Hoon
    Kim, You-Jin
    Cheong, Yun-Gyung
    2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 214 - 220
  • [42] Could this be next for corpus linguistics? Methods of semi-automatic data annotation with contextualized word embeddings
    Fonteyn, Lauren
    Manjavacas, Enrique
    Haket, Nina
    Dorst, Aletta G.
    Kruijt, Eva
    LINGUISTICS VANGUARD, 2024, 10 (01): : 587 - 602
  • [43] Beyond Word-Based Model Embeddings: Contextualized Representations for Enhanced Social Media Spam Detection
    Alshattnawi, Sawsan
    Shatnawi, Amani
    AlSobeh, Anas M. R.
    Magableh, Aws A.
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [44] Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases
    Guo, Wei
    Caliskan, Aylin
    AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2021, : 122 - 133
  • [45] Network embeddings from distributional thesauri for improving static word representations
    Jana, Abhik
    Haldar, Siddhant
    Goyal, Pawan
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 187
  • [46] Effect of dimensionality change on the bias of word embeddings
    Rai, Rohit Raj
    Awekar, Amit
    PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 601 - 602
  • [47] Enhancing Entity Linking with Contextualized Entity Embeddings
    Xu, Zhenran
    Chen, Yulin
    Shi, Senbao
    Hu, Baotian
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 228 - 239
  • [48] Malware Detection through Contextualized Vector Embeddings
    Pandya, Vinay
    Di Troia, Fabio
    2023 SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC, 2023,
  • [49] Contextualized Diachronic Word Representations
    Jawahar, Ganesh
    Seddah, Djame
    1ST INTERNATIONAL WORKSHOP ON COMPUTATIONAL APPROACHES TO HISTORICAL LANGUAGE CHANGE, 2019, : 35 - 47
  • [50] Conceptual Combination in Large Language Models: Uncovering Implicit Relational Interpretations in Compound Words With Contextualized Word Embeddings
    Ciapparelli, Marco
    Zarbo, Calogero
    Marelli, Marco
    COGNITIVE SCIENCE, 2025, 49 (03)