Named Entity Recognition for Code Review Comments

被引:0
|
作者
Kachanov, V. V. [1 ,2 ]
Khitrova, A. S. [1 ,3 ]
Markov, S. I. [1 ]
机构
[1] Russian Acad Sci, Inst Syst Programming, Moscow 109004, Russia
[2] Moscow Inst Phys & Technol, Dolgoprudnyi 141701, Moscow Oblast, Russia
[3] Lomonosov Moscow State Univ, Moscow 119991, Russia
关键词
machine learning; named entity recognition; dataset;
D O I
10.1134/S0361768824700233
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper addresses the problem of named entities recognition from source code reviews. The paper provides a comparative analysis of existing approaches and proposes its own methods to improve the quality of problem solving. Proposed and implemented improvements include: methods to deal with data imbalances, improved tokenization of input data, the use of large arrays of unlabeled data, and the use of additional binary classifiers. To assess quality, a new set of 3000 user code reviews was collected and manually labeled. It is shown that the proposed improvements can significantly increase the performance measured by quality metrics, calculated both at the token level (+22%) and at the entire entity level (+13%).
引用
收藏
页码:511 / 523
页数:13
相关论文
共 50 条
  • [1] A review of biomedical named entity recognition
    Chang, Lu
    Zhang, Ruihuan
    Lv, Jia
    Zhou, Weiguang
    Bai, Yunli
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2022, 22 (03) : 893 - 900
  • [2] A review of Chinese named entity recognition
    Cheng, Jieren
    Liu, Jingxin
    Xu, Xinbin
    Xia, Dongwan
    Liu, Le
    Sheng, Victor S.
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (06): : 2012 - 2030
  • [3] Named Entity Recognition for Code Mixed Social Media Sentences
    Sharma, Yashvardhan
    Bhargava, Rupal
    Tadikonda, Bapiraju Vamsi
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2021, 13 (02): : 23 - 36
  • [4] A review on cyber security named entity recognition
    Gao, Chen
    Zhang, Xuan
    Han, Mengting
    Liu, Hui
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (09) : 1153 - 1168
  • [5] Review of Multimodal Named Entity Recognition Studies
    Han P.
    Chen W.
    Data Analysis and Knowledge Discovery, 2024, 8 (04) : 50 - 63
  • [6] Code-Switched Named Entity Recognition with Embedding Attention
    Wang, Changhan
    Cho, Kyunghyun
    Kiela, Douwe
    COMPUTATIONAL APPROACHES TO LINGUISTIC CODE-SWITCHING, 2018, : 154 - 158
  • [7] A Systematic Review on Biomedical Named Entity Recognition
    Kanimozhi, U.
    Manjula, D.
    DATA SCIENCE ANALYTICS AND APPLICATIONS, DASAA 2017, 2018, 804 : 19 - 37
  • [8] Biomedical Named Entity Recognition in Eight Languages with Zero Code Changes
    Kocaman, Veysel
    Pirge, Gursev
    Polat, Bunyamin
    Talby, David
    CEUR Workshop Proceedings, 2022, 3202
  • [9] Python']Python source code vulnerability detection with named entity recognition
    Ehrenberg, Melanie
    Sarkani, Shahram
    Mazzuchi, Thomas A.
    COMPUTERS & SECURITY, 2024, 140
  • [10] Gazetteer Enhanced Named Entity Recognition for Code-Mixed WebQueries
    Fetahu, Besnik
    Fang, Anjie
    Rokhlenko, Oleg
    Malmasi, Shervin
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1677 - 1681