Dissecting word embeddings and language models in natural language processing

被引：9

作者：

Verma, Vivek Kumar ^{[1
]}

Pandey, Mrigank ^{[1
]}

Jain, Tarun ^{[2
]}

Tiwari, Pradeep Kumar ^{[3
]}

机构：

[1] Manipal Univ Jaipur, Dept Informat Technol, Jaipur 303007, Rajasthan, India

[2] Manipal Univ Jaipur, Dept Comp Sci & Engn, Jaipur 303007, Rajasthan, India

[3] Manipal Univ Jaipur, Dept Comp Applicat, Jaipur 303007, Rajasthan, India

来源：

JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY | 2021年 / 24卷 / 05期

关键词：

Natural language processing; Language models; Word embedding;

D O I：

10.1080/09720529.2021.1968108

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Natural language processing (NLP) is an area in artificial intelligence that deals with understanding, interpretation and development of human language for computers to carry out tasks such as sentiment analysis, summarization of text in a document, developing conversational agents, machine translation and speech recognition. From conversational agents called catboats deployed on various websites that interact with consumers digitally to understand the needs of the consumers to reading summarized content delivered through apps in smartphones, NLP has had some major achievements in transforming the digital world that is increasingly gearing towards artificial intelligence. One area that has seen remarkable growth in recent times is language modelling, a statistical technique to compute the probability of tokens or words in a given sentence. In this paper, we attempt to present an overview of various representations with respect to language modelling, from neural word embeddings such as Word2Vec and GloVe to deep contextualized pre-trained embedding such as ULMFit, ELMo, OpenAI GPT and BERT.

引用

页码：1509 / 1515

页数：7

共 50 条

[1] Word Embeddings for Latvian Natural Language Processing Tools
Znotins, Arturs
[J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2016, 289 : 167 - 173
[2] A comparison of word embeddings for the biomedical natural language processing
Wang, Yanshan
Liu, Sijia
Afzal, Naveed
Rastegar-Mojarad, Majid
Wang, Liwei
Shen, Feichen
Kingsbury, Paul
Liu, Hongfang
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 87 : 12 - 20
[3] Word embeddings for biomedical natural language processing: A survey
Chiu, Billy
Baker, Simon
[J]. LANGUAGE AND LINGUISTICS COMPASS, 2020, 14 (12):
[4] Domain specific word embeddings for natural language processing in radiology
Chen, Timothy L.
Emerling, Max
Chaudhari, Gunvant R.
Chillakuru, Yeshwant R.
Seo, Youngho
Vu, Thienkhai H.
Sohn, Jae Ho
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 113
[5] Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing
Alawad, Mohammed
Tourassi, Georgia
[J]. 2019 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2019), 2019, : 134 - 139
[6] Word Embeddings for Code-Mixed Language Processing
Pratapa, Adithya
Choudhury, Monojit
Sitaram, Sunayana
[J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3067 - 3072
[7] Continuous-Space Language Processing: Beyond Word Embeddings
Ostendorf, Mari
[J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2016, 2016, 9918 : 3 - 15
[8] Word Embeddings for the Polish Language
Rogalski, Marek
Szczepaniak, Piotr S.
[J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2016, 2016, 9692 : 126 - 135
[9] SECNLP: A survey of embeddings in clinical natural language processing
Kalyan, Katikapalli Subramanyam
Sangeetha, S.
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 101
[10] Definition Modeling: Learning to Define Word Embeddings in Natural Language
Noraset, Thanapon
Liang, Chen
Birnbaum, Larry
Downey, Doug
[J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3259 - 3266

← 1 2 3 4 5 →