A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models

被引:74
|
作者
Naseem, Usman [1 ]
Razzak, Imran [2 ]
Khan, Shah Khalid [3 ]
Prasad, Mukesh [4 ]
机构
[1] Univ Sydney, Sch Comp Sci, Sydney, NSW, Australia
[2] Deakin Univ, Sch Informat Technol, Burwood, Australia
[3] RMIT Univ, Sch Engn, Melbourne, Vic, Australia
[4] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW, Australia
关键词
Text mining; natural language processing; word representation; language models; LOGISTIC-REGRESSION; SENTIMENT; EMBEDDINGS; CLASSIFICATION; FRAMEWORK; CONTEXT;
D O I
10.1145/3434237
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP-related tasks. In the end, this survey briefly discusses the commonly used ML- and DL-based classifiers, evaluation metrics, and the applications of these word embeddings in different NLP tasks.
引用
收藏
页数:35
相关论文
共 50 条
  • [1] From Word Embeddings to Pre-Trained Language Models: A State-of-the-Art Walkthrough
    Mars, Mourad
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [2] MODELS OF VISUAL WORD RECOGNITION - SAMPLING THE STATE-OF-THE-ART
    JACOBS, AM
    GRAINGER, J
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1994, 20 (06) : 1311 - 1334
  • [3] Sentiment analysis in tweets: an assessment study from classical to modern word representation models
    Barreto, Sergio
    Moura, Ricardo
    Carvalho, Jonnathan
    Paes, Aline
    Plastino, Alexandre
    DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 37 (01) : 318 - 380
  • [4] Sentiment analysis in tweets: an assessment study from classical to modern word representation models
    Sérgio Barreto
    Ricardo Moura
    Jonnathan Carvalho
    Aline Paes
    Alexandre Plastino
    Data Mining and Knowledge Discovery, 2023, 37 : 318 - 380
  • [5] Similarity Analysis of Contextual Word Representation Models
    Wu, John M.
    Belinkov, Yonatan
    Sajjad, Hassan
    Durrani, Nadir
    Dalvi, Fahim
    Glass, James
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 4638 - 4655
  • [6] Representation of a Polysemous Word in Models of Mental Lexicon
    Kozhara, Olesya, V
    TOMSK STATE UNIVERSITY JOURNAL, 2019, (447): : 38 - 46
  • [7] A Survey on Distributed Word Representation
    Sun F.
    Guo J.-F.
    Lan Y.-Y.
    Xu J.
    Cheng X.-Q.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (07): : 1605 - 1625
  • [8] Learning Sense Representation from Word Representation for Unsupervised Word Sense Disambiguation
    Wang, Jie
    Fu, Zhenxin
    Li, Moxin
    Zhang, Haisong
    Zhao, Dongyan
    Yan, Rui
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13947 - 13948
  • [9] From Word Types to Tokens and Back: A Survey of Approaches to Word Meaning Representation and Interpretation
    Apidianaki, Marianna
    COMPUTATIONAL LINGUISTICS, 2023, 49 (02) : 465 - 523
  • [10] State-of-the-Art Vietnamese Word Segmentation
    Cong, Song Nguyen Duc
    Ngo, Quoc Hung
    Jiamthapthaksin, Rachsuda
    PROCEEDINGS OF 2016 2ND INTERNATIONAL CONFERENCE ON SCIENCE IN INFORMATION TECHNOLOGY (ICSITECH) - INFORMATION SCIENCE FOR GREEN SOCIETY AND ENVIRONMENT, 2016, : 119 - 124