Automatic Taxonomy Classification by Pretrained Language Model

被引:0
|
作者
Kuwana, Ayato [1 ]
Oba, Atsushi [1 ]
Sawai, Ranto [1 ]
Paik, Incheon [1 ]
机构
[1] Univ Aizu, Grad Dept Comp Sci & Informat Syst, Fukushima, Fukui 9658580, Japan
关键词
ontology; automation; natural language processing (NLP); pretrained model;
D O I
10.3390/electronics10212656
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, automatic ontology generation has received significant attention in information science as a means of systemizing vast amounts of online data. As our initial attempt of ontology generation with a neural network, we proposed a recurrent neural network-based method. However, updating the architecture is possible because of the development in natural language processing (NLP). By contrast, the transfer learning of language models trained by a large, unlabeled corpus has yielded a breakthrough in NLP. Inspired by these achievements, we propose a novel workflow for ontology generation comprising two-stage learning. Our results showed that our best method improved accuracy by over 12.5%. As an application example, we applied our model to the Stanford Question Answering Dataset to show ontology generation in a real field. The results showed that our model can generate a good ontology, with some exceptions in the real field, indicating future research directions to improve the quality.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Low-resource Taxonomy Enrichment with Pretrained Language Models
    Takeoka, Kunihiro
    Akimoto, Kosuke
    Oyamada, Masafumi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2747 - 2758
  • [2] Pretrained Language Models for Sequential Sentence Classification
    Cohan, Arman
    Beltagy, Iz
    King, Daniel
    Dalvi, Bhavana
    Weld, Daniel S.
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3693 - 3699
  • [3] A study of Turkish emotion classification with pretrained language models
    Ucan, Alaettin
    Dorterler, Murat
    Akcapinar Sezer, Ebru
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (06) : 857 - 865
  • [4] Topic Classification for Political Texts with Pretrained Language Models
    Wang, Yu
    POLITICAL ANALYSIS, 2023, 31 (04) : 662 - 668
  • [5] Your fairness may vary: Pretrained language model fairness in toxic text classification
    Baldini, Ioana
    Wei, Dennis
    Ramamurthy, Karthikeyan Natesan
    Yurochkin, Mikhail
    Singh, Moninder
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2245 - 2262
  • [6] Distilling a Pretrained Language Model to a Multilingual ASR Model
    Choi, Kwanghee
    Park, Hyung-Min
    INTERSPEECH 2022, 2022, : 2203 - 2207
  • [7] Automatic classification of documents in a natural language: A conceptual model
    N. D. Lyfenko
    Automatic Documentation and Mathematical Linguistics, 2014, 48 (3) : 158 - 166
  • [8] Automatic Classification of Documents in a Natural Language: A Conceptual Model
    Lyfenko, N. D.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2014, 48 (03) : 158 - 166
  • [9] Constructing Chinese taxonomy trees from understanding and generative pretrained language models
    Guo, Jianyu
    Chen, Jingnan
    Ren, Li
    Zhou, Huanlai
    Xu, Wenbo
    Jia, Haitao
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [10] When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification
    Li, Xuedong
    Yuan, Walter
    Peng, Dezhong
    Mei, Qiaozhu
    Wang, Yue
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 21 (SUPPL 9)