Dissecting word embeddings and language models in natural language processing

被引:9
|
作者
Verma, Vivek Kumar [1 ]
Pandey, Mrigank [1 ]
Jain, Tarun [2 ]
Tiwari, Pradeep Kumar [3 ]
机构
[1] Manipal Univ Jaipur, Dept Informat Technol, Jaipur 303007, Rajasthan, India
[2] Manipal Univ Jaipur, Dept Comp Sci & Engn, Jaipur 303007, Rajasthan, India
[3] Manipal Univ Jaipur, Dept Comp Applicat, Jaipur 303007, Rajasthan, India
关键词
Natural language processing; Language models; Word embedding;
D O I
10.1080/09720529.2021.1968108
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Natural language processing (NLP) is an area in artificial intelligence that deals with understanding, interpretation and development of human language for computers to carry out tasks such as sentiment analysis, summarization of text in a document, developing conversational agents, machine translation and speech recognition. From conversational agents called catboats deployed on various websites that interact with consumers digitally to understand the needs of the consumers to reading summarized content delivered through apps in smartphones, NLP has had some major achievements in transforming the digital world that is increasingly gearing towards artificial intelligence. One area that has seen remarkable growth in recent times is language modelling, a statistical technique to compute the probability of tokens or words in a given sentence. In this paper, we attempt to present an overview of various representations with respect to language modelling, from neural word embeddings such as Word2Vec and GloVe to deep contextualized pre-trained embedding such as ULMFit, ELMo, OpenAI GPT and BERT.
引用
收藏
页码:1509 / 1515
页数:7
相关论文
共 50 条
  • [1] Word Embeddings for Latvian Natural Language Processing Tools
    Znotins, Arturs
    [J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2016, 289 : 167 - 173
  • [2] A comparison of word embeddings for the biomedical natural language processing
    Wang, Yanshan
    Liu, Sijia
    Afzal, Naveed
    Rastegar-Mojarad, Majid
    Wang, Liwei
    Shen, Feichen
    Kingsbury, Paul
    Liu, Hongfang
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 87 : 12 - 20
  • [3] Word embeddings for biomedical natural language processing: A survey
    Chiu, Billy
    Baker, Simon
    [J]. LANGUAGE AND LINGUISTICS COMPASS, 2020, 14 (12):
  • [4] Domain specific word embeddings for natural language processing in radiology
    Chen, Timothy L.
    Emerling, Max
    Chaudhari, Gunvant R.
    Chillakuru, Yeshwant R.
    Seo, Youngho
    Vu, Thienkhai H.
    Sohn, Jae Ho
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 113
  • [5] Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing
    Alawad, Mohammed
    Tourassi, Georgia
    [J]. 2019 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2019), 2019, : 134 - 139
  • [6] Word Embeddings for Code-Mixed Language Processing
    Pratapa, Adithya
    Choudhury, Monojit
    Sitaram, Sunayana
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3067 - 3072
  • [7] Continuous-Space Language Processing: Beyond Word Embeddings
    Ostendorf, Mari
    [J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2016, 2016, 9918 : 3 - 15
  • [8] Word Embeddings for the Polish Language
    Rogalski, Marek
    Szczepaniak, Piotr S.
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2016, 2016, 9692 : 126 - 135
  • [9] SECNLP: A survey of embeddings in clinical natural language processing
    Kalyan, Katikapalli Subramanyam
    Sangeetha, S.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 101
  • [10] Definition Modeling: Learning to Define Word Embeddings in Natural Language
    Noraset, Thanapon
    Liang, Chen
    Birnbaum, Larry
    Downey, Doug
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3259 - 3266