Data science in light of natural language processing: An overview

被引:13
|
作者
Zeroual, Imad [1 ]
Lakhouaja, Abdelhak [1 ]
机构
[1] Mohamed First Univ, Fac Sci, Av Med 6 BP 717, Oujda 60000, Morocco
关键词
Data science; Natural language processing; Data driven approches; Corpora; Machine learning;
D O I
10.1016/j.procs.2018.01.101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The focus of data scientists is essentially divided into three areas: collecting data, analyzing data, and inferring information from data. Each one of these tasks requires special personnel, takes time, and costs money. Yet, the next and the fastidious step is how to turn data into products. Therefore, this field grabs the attention of many research groups in academia as well as industry. In the last decades, data-driven approaches came into existence and gained more popularity because they require much less human effort. Natural Language Processing (NLP) is strongly among the fields influenced by data. The growth of data is behind the performance improvement of most NLP applications such as machine translation and automatic speech recognition. Consequently, many NLP applications are frequently moving from rule-based systems and knowledge-based methods to data driven approaches. However, collected data that are based on undefined design criteria or on technically unsuitable forms will be useless. Also, they will be neglected if the size is not enough to perform the required analysis and to infer the accurate information. The chief purpose of this overview is to shed some lights on the vital role of data in various fields and give a better understanding of data in light of NLP. Expressly, it describes what happen to data during its life-cycle: building, processing, analyzing, and exploring phases. (C) 2018 The Authors. Published by Elsevier B.V.
引用
收藏
页码:82 / 91
页数:10
相关论文
共 50 条
  • [21] Natural Language Processing: An Overview of Models, Transformers and Applied Practices
    Canchila, Santiago
    Meneses-Eraso, Carlos
    Casanoves-Boix, Javier
    Cortes-Pellicer, Pascual
    Castello-Sirvent, Fernando
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2024, 21 (03)
  • [22] Multi-Task Learning in Natural Language Processing: An Overview
    Chen, Shijie
    Zhang, Yu
    Yang, Qiang
    ACM Computing Surveys, 2024, 56 (12)
  • [23] Overview of Character-Based Models for Natural Language Processing
    Adel, Heike
    Asgari, Ehsaneddin
    Schuetze, Hinrich
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 3 - 16
  • [24] Data augmentation techniques in natural language processing
    Pellicer, Lucas Francisco Amaral Orosco
    Ferreira, Taynan Maier
    Costa, Anna Helena Reali
    APPLIED SOFT COMPUTING, 2023, 132
  • [25] The EarthCARE mission: science data processing chain overview
    Eisinger, Michael
    Marnas, Fabien
    Wallace, Kotska
    Kubota, Takuji
    Tomiyama, Nobuhiro
    Ohno, Yuichi
    Tanaka, Toshiyuki
    Tomita, Eichi
    Wehr, Tobias
    Bernaerts, Dirk
    ATMOSPHERIC MEASUREMENT TECHNIQUES, 2024, 17 (02) : 839 - 862
  • [26] Landsat 7 Science Data Processing: a systems overview
    Schweiss, RJ
    Daniel, NE
    Derrick, DK
    ALGORITHMS FOR MULTISPECTRAL, HYPERSPECTRAL, AND ULTRASPECTRAL IMAGERY VI, 2000, 4049 : 300 - 309
  • [27] Natural language processing algorithms for domain-specific data extraction in material science: Reseractor
    Gupta, Antrakrate
    Mittal, Divyansh
    Goel, Ojsi
    Jha, Shikhar Krishn
    JOURNAL OF MATERIALS SCIENCE, 2024, 59 (30) : 13856 - 13872
  • [28] Processing natural language without natural language processing
    Brill, E
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 360 - 369
  • [29] Connectionist natural language processing: readings from connection science
    Sharkey, Noel
    Machine Translation, 10 (04): : 321 - 327
  • [30] An overview of natural language processing techniques in text-to-speech systems
    Külekci, MO
    Oflazer, K
    PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 454 - 457