Examining the effect of whitening on static and contextualized word embeddings

被引：4

作者：

Sasaki, Shota ^{[1
,2
]}

Heinzerling, Benjamin ^{[1
,2
]}

Suzuki, Jun ^{[1
,2
]}

Inui, Kentaro ^{[1
,2
]}

机构：

[1] RIKEN, Sendai, Miyagi 9808579, Japan

[2] Tohoku Univ, Sendai, Miyagi 9808579, Japan

来源：

INFORMATION PROCESSING & MANAGEMENT | 2023年 / 60卷 / 03期

关键词：

Static word embeddings; Contextualized word embeddings; Whitening; Frequency bias;

D O I：

10.1016/j.ipm.2023.103272

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Static word embeddings (SWE) and contextualized word embeddings (CWE) are the foundation of modern natural language processing. However, these embeddings suffer from spatial bias in the form of anisotropy, which has been demonstrated to reduce their performance. A method to alleviate the anisotropy is the "whitening"transformation. Whitening is a standard method in signal processing and other areas, however, its effect on SWE and CWE is not well understood. In this study, we conduct an experiment to elucidate the effect of whitening on SWE and CWE. The results indicate that whitening predominantly removes the word frequency bias in SWE, and biases other than the word frequency bias in CWE.

引用

页数：10

共 50 条

[41] Predicting Quality and Popularity of a Movie From Plot Summary and Character Description Using Contextualized Word Embeddings
Lee, Jung-Hoon
Kim, You-Jin
Cheong, Yun-Gyung
2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 214 - 220
[42] Could this be next for corpus linguistics? Methods of semi-automatic data annotation with contextualized word embeddings
Fonteyn, Lauren
Manjavacas, Enrique
Haket, Nina
Dorst, Aletta G.
Kruijt, Eva
LINGUISTICS VANGUARD, 2024, 10 (01): : 587 - 602
[43] Beyond Word-Based Model Embeddings: Contextualized Representations for Enhanced Social Media Spam Detection
Alshattnawi, Sawsan
Shatnawi, Amani
AlSobeh, Anas M. R.
Magableh, Aws A.
APPLIED SCIENCES-BASEL, 2024, 14 (06):
[44] Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases
Guo, Wei
Caliskan, Aylin
AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2021, : 122 - 133
[45] Network embeddings from distributional thesauri for improving static word representations
Jana, Abhik
Haldar, Siddhant
Goyal, Pawan
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 187
[46] Effect of dimensionality change on the bias of word embeddings
Rai, Rohit Raj
Awekar, Amit
PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 601 - 602
[47] Enhancing Entity Linking with Contextualized Entity Embeddings
Xu, Zhenran
Chen, Yulin
Shi, Senbao
Hu, Baotian
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 228 - 239
[48] Malware Detection through Contextualized Vector Embeddings
Pandya, Vinay
Di Troia, Fabio
2023 SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC, 2023,
[49] Contextualized Diachronic Word Representations
Jawahar, Ganesh
Seddah, Djame
1ST INTERNATIONAL WORKSHOP ON COMPUTATIONAL APPROACHES TO HISTORICAL LANGUAGE CHANGE, 2019, : 35 - 47
[50] Conceptual Combination in Large Language Models: Uncovering Implicit Relational Interpretations in Compound Words With Contextualized Word Embeddings
Ciapparelli, Marco
Zarbo, Calogero
Marelli, Marco
COGNITIVE SCIENCE, 2025, 49 (03)

← 1 2 3 4 5 →