Comparative Analysis of NLP-Based Models for Company Classification

被引:0
|
作者
Rizinski, Maryan [1 ,2 ]
Jankov, Andrej [2 ]
Sankaradas, Vignesh [1 ]
Pinsky, Eugene [1 ]
Mishkovski, Igor [2 ]
Trajanov, Dimitar [1 ,2 ]
机构
[1] Boston Univ, Metropolitan Coll, Dept Comp Sci, Boston, MA 02215 USA
[2] Ss Cyril & Methodius Univ, Fac Comp Sci & Engn, Skopje 1000, North Macedonia
关键词
company classification; industry classification; natural language processing; machine learning; deep learning; finance; fintech; INDUSTRY CLASSIFICATION; SCHEMES;
D O I
10.3390/info15020077
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow, costly, and vendor-specific assignments. Therefore, we investigate recent natural language processing (NLP) advancements to automate the company classification process. In particular, we employ and evaluate various NLP-based models, including zero-shot learning, One-vs-Rest classification, multi-class classifiers, and ChatGPT-aided classification. We conduct a comprehensive comparison among these models to assess their effectiveness in the company classification task. The evaluation uses the Wharton Research Data Services (WRDS) dataset, consisting of textual descriptions of publicly traded companies. Our findings reveal that the RoBERTa and One-vs-Rest classifiers surpass the other methods, achieving F1 scores of 0.81 and 0.80 on the WRDS dataset, respectively. These results demonstrate that deep learning algorithms offer the potential to automate, standardize, and continuously update classification systems in an efficient and cost-effective way. In addition, we introduce several improvements to the multi-class classification techniques: (1) in the zero-shot methodology, we TF-IDF to enhance sector representation, yielding improved accuracy in comparison to standard zero-shot classifiers; (2) next, we use ChatGPT for dataset generation, revealing potential in scenarios where datasets of company descriptions are lacking; and (3) we also employ K-Fold to reduce noise in the WRDS dataset, followed by conducting experiments to assess the impact of noise reduction on the company classification results.
引用
下载
收藏
页数:32
相关论文
共 50 条
  • [1] NLP-based music processing for composer classification
    Deepaisarn, Somrudee
    Chokphantavee, Sirawit
    Chokphantavee, Sorawit
    Prathipasen, Phuriphan
    Buaruk, Suphachok
    Sornlertlamvanich, Virach
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [2] NLP-Based Detection of Mathematics Subject Classification
    Dong, Yihe
    MATHEMATICAL SOFTWARE - ICMS 2018, 2018, 10931 : 147 - 155
  • [3] Medical prescription classification: a NLP-based approach
    Carchiolo, Vincenza
    Longheu, Alessandro
    Reitano, Giuseppa
    Zagarella, Luca
    PROCEEDINGS OF THE 2019 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2019, : 605 - 609
  • [4] NLP-based music processing for composer classification
    Somrudee Deepaisarn
    Sirawit Chokphantavee
    Sorawit Chokphantavee
    Phuriphan Prathipasen
    Suphachok Buaruk
    Virach Sornlertlamvanich
    Scientific Reports, 13
  • [5] On the Evaluation of NLP-based Models for Software Engineering
    Izadi, Maliheh
    Ahmadabadi, Matin Nili
    2022 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING (NLBSE 2022), 2022, : 48 - 50
  • [6] An NLP-Based Architecture for the Autocompletion of Partial Domain Models
    Burgueno, Loli
    Clariso, Robert
    Gerard, Sebastien
    Li, Shuai
    Cabot, Jordi
    ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2021), 2021, 12751 : 91 - 106
  • [7] An NLP-based Tool for Software Artifacts Analysis
    Di Sorbo, Andrea
    Visaggio, Corrado A.
    Di Penta, Massimiliano
    Canfora, Gerardo
    Panichella, Sebastiano
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2021), 2021, : 569 - 573
  • [8] Geoinference of author affiliations using NLP-based text classification
    Brian Lee
    John S. Brownstein
    Isaac S. Kohane
    Scientific Reports, 14 (1)
  • [9] NLP-based Approaches for Malware Classification from API Sequences
    Trung Kien Tran
    Sato, Hiroshi
    2017 21ST ASIA PACIFIC SYMPOSIUM ON INTELLIGENT AND EVOLUTIONARY SYSTEMS (IES), 2017, : 101 - 105
  • [10] An NLP-based citation reason analysis using CCRO
    Ihsan, Imran
    Qadir, M. Abdul
    SCIENTOMETRICS, 2021, 126 (06) : 4769 - 4791