Named Entity Recognition for Code Review Comments

被引:0
|
作者
Kachanov, V. V. [1 ,2 ]
Khitrova, A. S. [1 ,3 ]
Markov, S. I. [1 ]
机构
[1] Russian Acad Sci, Inst Syst Programming, Moscow 109004, Russia
[2] Moscow Inst Phys & Technol, Dolgoprudnyi 141701, Moscow Oblast, Russia
[3] Lomonosov Moscow State Univ, Moscow 119991, Russia
关键词
machine learning; named entity recognition; dataset;
D O I
10.1134/S0361768824700233
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper addresses the problem of named entities recognition from source code reviews. The paper provides a comparative analysis of existing approaches and proposes its own methods to improve the quality of problem solving. Proposed and implemented improvements include: methods to deal with data imbalances, improved tokenization of input data, the use of large arrays of unlabeled data, and the use of additional binary classifiers. To assess quality, a new set of 3000 user code reviews was collected and manually labeled. It is shown that the proposed improvements can significantly increase the performance measured by quality metrics, calculated both at the token level (+22%) and at the entire entity level (+13%).
引用
收藏
页码:511 / 523
页数:13
相关论文
共 50 条
  • [31] Named Entity Recognition in Electronic Health Records: A Methodological Review
    Durango, Maria C.
    Torres-Silva, Ever A.
    Orozco-Duque, Andres
    HEALTHCARE INFORMATICS RESEARCH, 2023, 29 (04) : 286 - 300
  • [32] Clinical Relevance of Pharmacist Intervention: Development of a Named Entity Recognition Model on Unstructured Comments
    Clarenne, Justine
    Priou, Sonia
    Alixe, Aymeric
    Martin, Olivier
    Mongaret, Celine
    Bedouch, Pierrick
    PUBLIC HEALTH AND INFORMATICS, PROCEEDINGS OF MIE 2021, 2021, 281 : 492 - 493
  • [33] Named Entity Recognition on Code-Switched Data Using Conditional Random Fields
    Sikdar, Utpal Kumar
    Barik, Biswanath
    Gamback, Bjorn
    COMPUTATIONAL APPROACHES TO LINGUISTIC CODE-SWITCHING, 2018, : 115 - 119
  • [34] LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition
    Sharma, Rishab
    Chen, Fuxiang
    Fard, Fatemeh
    IEEE International Conference on Program Comprehension, 2022, 2022-March : 48 - 59
  • [35] Joint Learning of Named Entity Recognition and Entity Linking
    Martins, Pedro Henrique
    Marinho, Zita
    Martins, Andre F. T.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 190 - 196
  • [36] Learning Multilingual Meta-Embeddings for Code-Switching Named Entity Recognition
    Winata, Genta Indra
    Lin, Zhaojiang
    Fung, Pascale
    4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 181 - 186
  • [37] Named Entity Recognition for Code-Mixed Indian Corpus using Meta Embedding
    Priyadharshini, Ruba
    Chakravarthi, Bharathi Raja
    Vegupatti, Mani
    McCrae, John P.
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 68 - 72
  • [38] LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition
    Sharma, Rishab
    Chen, Fuxiang
    Fard, Fatemeh
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 48 - 59
  • [39] Named Entity Recognition for Mongolian Language
    Munkhjargal, Zoljargal
    Bella, Gabor
    Chagnaa, Altangerel
    Giunchiglia, Fausto
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 243 - 251
  • [40] A composite kernel for named entity recognition
    Saha, Sujan Kumar
    Narayan, Shashi
    Sarkar, Sudeshna
    Mitra, Pabitra
    PATTERN RECOGNITION LETTERS, 2010, 31 (12) : 1591 - 1597