Cross-Lingual Word Embeddings

被引:0
|
作者
Søgaard A. [1 ]
Vulić I. [2 ]
Ruder S. [3 ]
Faruqui M. [4 ]
机构
[1] University of Copenhagen, Denmark
[2] University of Cambridge, United Kingdom
[3] DeepMind, United Kingdom
[4] Google Assistant, United States
来源
关键词
cross-lingual learning; machine learning; natural language processing; semantics;
D O I
10.2200/S00920ED2V01Y201904HLT042
中图分类号
学科分类号
摘要
The majority of natural language processing (NLP) is English language processing, and while there is good language technology support for (standard varieties of) English, support for Albanian, Burmese, or Cebuano-and most other languages-remains limited. Being able to bridge this digital divide is important for scientific and democratic reasons but also represents an enormous growth potential. A key challenge for this to happen is learning to align basic meaning-bearing units of different languages. In this book, the authors survey and discuss recent and historical work on supervised and unsupervised learning of such alignments. Specifically, the book focuses on so-called cross-lingual word embeddings. The survey is intended to be systematic, using consistent notation and putting the available methods on comparable form, making it easy to compare wildly different approaches. In so doing, the authors establish previously unreported relations between these methods and are able to present a fast-growing literature in a very compact way. Furthermore, the authors discuss how best to evaluate cross-lingual word embedding methods and survey the resources available for students and researchers interested in this topic. Table of Contents: Preface / Introduction / Monolingual Word Embedding Models / Cross-Lingual Word Embedding Models: Typology / A Brief History of Cross-Lingual Word Representations / Word-Level Alignment Models / Sentence-Level Alignment Methods / Document-Level Alignment Models / From Bilingual to Multilingual Training / Unsupervised Learning of Cross-Lingual Word Embeddings / Applications and Evaluation / Useful Data and Software / General Challenges and Future Directions / Bibliography / Authors' Biographies. Copyright © 2019 by Morgan & Claypool.
引用
收藏
页码:1 / 132
页数:131
相关论文
共 50 条
  • [41] Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
    Zhang, Mozhi
    Fujinuma, Yoshinari
    Paul, Michael J.
    Boyd-Graber, Jordan
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2214 - 2220
  • [42] Manipuri-English Cross-lingual Word Embeddings using a Temporally Aligned Comparable Corpus
    Laitonjam, Lenin
    Singh, Sanasam Ranbir
    [J]. 2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 195 - 199
  • [43] Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer
    Zhao, Jieyu
    Mukherjee, Subhabrata
    Hosseini, Saghar
    Chang, Kai-Wei
    Awadallah, Ahmed Hassan
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2896 - 2907
  • [44] A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 789 - 798
  • [45] Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings
    Wang, Haozhou
    Henderson, James
    Merlo, Paola
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4419 - 4430
  • [46] A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity
    Fujinuma, Yoshinari
    Boyd-Graber, Jordan
    Paul, Michael J.
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4952 - 4962
  • [47] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    [J]. 36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688
  • [48] Multilingual Offensive Language Identification with Cross-lingual Embeddings
    Ranasinghe, Tharindu
    Zampieri, Marcos
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5838 - 5844
  • [49] Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing
    Schuster, Tal
    Ram, Ori
    Barzilay, Regina
    Globerson, Amir
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1599 - 1613
  • [50] Data Augmentation with Unsupervised Machine Translation Improves the Structural Similarity of Cross-lingual Word Embeddings
    Nishikawa, Sosuke
    Ri, Ryokan
    Tsuruoka, Yoshimasa
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 163 - 173