Transfer Joint Embedding for Cross-Domain Named Entity Recognition

被引:26
|
作者
Pan, Sinno Jialin [1 ]
Toh, Zhiqiang [1 ]
Su, Jian [1 ]
机构
[1] Inst Infocomm Res, Data Analyt Dept, Singapore 138632, Singapore
关键词
Algorithms; Experimentation; Named entity recognition; transfer learning; multiclass classification;
D O I
10.1145/2457465.2457467
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Named Entity Recognition (NER) is a fundamental task in information extraction from unstructured text. Most previous machine-learning-based NER systems are domain-specific, which implies that they may only perform well on some specific domains (e.g., Newswire) but tend to adapt poorly to other related but different domains (e.g., Weblog). Recently, transfer learning techniques have been proposed to NER. However, most transfer learning approaches to NER are developed for binary classification, while NER is a multiclass classification problem in nature. Therefore, one has to first reduce the NER task to multiple binary classification tasks and solve them independently. In this article, we propose a new transfer learning method, named Transfer Joint Embedding (TJE), for cross-domain multiclass classification, which can fully exploit the relationships between classes (labels), and reduce domain difference in data distributions for transfer learning. More specifically, we aim to embed both labels (outputs) and high-dimensional features (inputs) from different domains (e.g., a source domain and a target domain) into a unified low-dimensional latent space, where 1) each label is represented by a prototype and the intrinsic relationships between labels can be measured by Euclidean distance; 2) the distance in data distributions between the source and target domains can be reduced; 3) the source domain labeled data are closer to their corresponding label-prototypes than others. After the latent space is learned, classification on the target domain data can be done with the simple nearest neighbor rule in the latent space. Furthermore, in order to scale up TJE, we propose an efficient algorithm based on stochastic gradient descent (SGD). Finally, we apply the proposed TJE method for NER across different domains on the ACE 2005 dataset, which is a benchmark in Natural Language Processing (NLP). Experimental results demonstrate the effectiveness of TJE and show that TJE can outperform state-of-the-art transfer learning approaches to NER.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] PDALN: Progressive Domain Adaptation over a Pre-trained Model for Low-Resource Cross-Domain Named Entity Recognition
    Zhang, Tao
    Xia, Congying
    Yu, Philip S.
    Liu, Zhiwei
    Zhao, Shu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5441 - 5451
  • [32] Stylometric Fake News Detection Based on Natural Language Processing Using Named Entity Recognition: In-Domain and Cross-Domain Analysis
    Tsai, Chih-Ming
    ELECTRONICS, 2023, 12 (17)
  • [33] Transfer Learning for Domain-Specific Named Entity Recognition in German
    Torge, Sunna
    Hahn, Waldemar
    Jaekel, Rene
    2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 321 - 327
  • [34] Embedding Transfer with Enhanced Correlation Modeling for Cross-Domain Recommendation
    Cao, Shilei
    Lin, Yujie
    Zhang, Xianli
    Chen, Yufu
    Zhu, Zhen
    Chen, Yuxin
    Qian, Buyue
    Wang, Feng
    Li, Zang
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 73 - 81
  • [35] Cross-Domain Recognition by Identifying Joint Subspaces of Source Domain and Target Domain
    Lin, Yuewei
    Chen, Jing
    Cao, Yu
    Zhou, Youjie
    Zhang, Lingfeng
    Tang, Yuan Yan
    Wang, Song
    IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (04) : 1090 - 1101
  • [36] Cross-Lingual Transfer Learning for Medical Named Entity Recognition
    Ding, Pengjie
    Wang, Lei
    Liang, Yaobo
    Lu, Wei
    Li, Linfeng
    Wang, Chun
    Tang, Buzhou
    Yan, Jun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 403 - 418
  • [37] Cross-Domain Transfer Learning for Complex hmotion Recognition
    Nagarajan, Bhalaji
    Oruganti, V. Ramana Murthy
    PROCEEDINGS OF 2019 IEEE REGION 10 SYMPOSIUM (TENSYMP), 2019, : 649 - 653
  • [38] Adversarial transfer learning for cross-domain visual recognition
    Wang, Shanshan
    Zhang, Lei
    Fu, Jingru
    KNOWLEDGE-BASED SYSTEMS, 2020, 204
  • [39] Cross-lingual Transfer Learning for Japanese Named Entity Recognition
    Johnson, Andrew
    Karanasou, Penny
    Gaspers, Judith
    Klakow, Dietrich
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES(NAACL HLT 2019), VOL. 2 (INDUSTRY PAPERS), 2019, : 182 - 189
  • [40] Enhanced character embedding for Chinese named entity recognition
    Jia, Bingjing
    Wu, Zhongli
    Wu, Bin
    Liu, Yutong
    Zhou, Pengpeng
    MEASUREMENT & CONTROL, 2020, 53 (9-10): : 1669 - 1681