Examining the effect of whitening on static and contextualized word embeddings

被引:4
|
作者
Sasaki, Shota [1 ,2 ]
Heinzerling, Benjamin [1 ,2 ]
Suzuki, Jun [1 ,2 ]
Inui, Kentaro [1 ,2 ]
机构
[1] RIKEN, Sendai, Miyagi 9808579, Japan
[2] Tohoku Univ, Sendai, Miyagi 9808579, Japan
关键词
Static word embeddings; Contextualized word embeddings; Whitening; Frequency bias;
D O I
10.1016/j.ipm.2023.103272
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Static word embeddings (SWE) and contextualized word embeddings (CWE) are the foundation of modern natural language processing. However, these embeddings suffer from spatial bias in the form of anisotropy, which has been demonstrated to reduce their performance. A method to alleviate the anisotropy is the "whitening"transformation. Whitening is a standard method in signal processing and other areas, however, its effect on SWE and CWE is not well understood. In this study, we conduct an experiment to elucidate the effect of whitening on SWE and CWE. The results indicate that whitening predominantly removes the word frequency bias in SWE, and biases other than the word frequency bias in CWE.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] What Does This Word Mean? Explaining Contextualized Embeddings with Natural Language Definition
    Chang, Ting-Yun
    Chen, Yun-Nung
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6064 - 6070
  • [32] Static detection of malicious PowerShell based on word embeddings
    Mimura, Mamoru
    Tajiri, Yui
    INTERNET OF THINGS, 2021, 15
  • [33] Visually Analyzing Contextualized Embeddings
    Berger, Matthew
    2020 IEEE VISUALIZATION CONFERENCE - SHORT PAPERS (VIS 2020), 2020, : 276 - 280
  • [34] SimAlign: High QualityWord AlignmentsWithout Parallel Training Data Using Static and Contextualized Embeddings
    Sabet, Masoud Jalili
    Dufter, Philipp
    Yvon, Francois
    Schuetze, Hinrich
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1627 - 1643
  • [35] Incremental Sense Weight Training for In-Depth Interpretation of Contextualized Word Embeddings (Student Abstract)
    Jiang, Xinyi
    Yang, Zhengzhe
    Choi, Jinho D.
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13823 - 13824
  • [36] Effect of Text Color on Word Embeddings
    Ikoma, Masaya
    Iwana, Brian Kenji
    Uchida, Seiichi
    DOCUMENT ANALYSIS SYSTEMS, 2020, 12116 : 341 - 355
  • [37] Opinion Mining with Deep Contextualized Embeddings
    Han, Wen-Bin
    Kando, Noriko
    NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2019, : 35 - 42
  • [38] CyBERT: Contextualized Embeddings for the Cybersecurity Domain
    Ranade, Priyanka
    Piplai, Aritran
    Joshi, Anupam
    Finin, Tim
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3334 - 3342
  • [39] CEDR: Contextualized Embeddings for Document Ranking
    MacAvaney, Sean
    Yates, Andrew
    Cohan, Arman
    Goharian, Nazli
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1101 - 1104
  • [40] Contextualized Query Embeddings for Conversational Search
    Lin, Sheng-Chieh
    Yang, Jheng-Hong
    Lin, Jimmy
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1004 - 1015