Vocabulary Normalization Improves IR-Based Concept Location

被引:0
|
作者
Binkley, Dave [1 ]
Lawrie, Dawn [1 ]
Uehlinger, Christopher [1 ]
机构
[1] Loyola Univ Maryland, Dept Comp Sci, Baltimore, MD 21210 USA
关键词
vocabulary normalization; information retrieval; concept location;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Tool support is crucial to modern software development, evolution, and maintenance. Early tools reused the static analysis performed by the compiler. These were followed by dynamic analysis tools and more recently tools that exploit natural language. This later class has the advantage that it can incorporate not only the code, but artifacts from all phases of software construction and its subsequent evolution. Unfortunately, the natural language found in source code often uses a vocabulary different from that used in other software artifacts and thus increases the vocabulary mismatch problem. This problem exists because many natural-language tools imported from Information Retrieval (IR) and Natural Language Processing (NLP) implicitly assume the use of a single natural language vocabulary. Vocabulary normalization, which goes well beyond simple identifier splitting, brings the vocabulary of the source into line with other artifacts. Consequently, it is expected to improve the performance of existing and future IR and NLP based tools. As a case study, an experiment with an LSI-based feature locator is replicated. Normalization universally improves performance. For the tersest queries, this improvement is over 180% (p < 0.0001).
引用
收藏
页码:588 / 591
页数:4
相关论文
共 50 条
  • [31] Proximity sensing - NASA is drawn to IR-based sensitive skin
    Jones-Bey, HA
    LASER FOCUS WORLD, 2005, 41 (08): : 32 - 33
  • [32] IR-Based Obstacle Avoiding and Self-Navigating Robot
    Dube, Bhavya
    Kazi, Raef
    Malya, Akash
    Gala, Nikhil
    2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
  • [33] The Impact of Retrieval Direction on IR-based Traceability Link Recovery
    Mills, Chris
    Haiduc, Sonia
    2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING TECHNOLOGIES RESULTS TRACK (ICSE-NIER), 2017, : 51 - 54
  • [34] An Ir-based anode for a practical CO2 electrolyzer
    Luc, Wesley
    Rosen, Jonathan
    Jiao, Feng
    CATALYSIS TODAY, 2017, 288 : 79 - 84
  • [35] CH hydroxylation with retention of configuration by Cp*Ir-based catalysts
    Crabtree, Robert
    Zhou, Meng
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2011, 241
  • [36] Improving IR-based Traceability Recovery Using Smoothing Filters
    De Lucia, Andrea
    Di Penta, Massimiliano
    Oliveto, Rocco
    Panichella, Annibale
    Panichella, Sebastiano
    2011 IEEE 19TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2011, : 21 - 30
  • [37] CH hydroxylation with retention of configuration by Cp☆Ir-based catalysts
    Crabtree, Robert
    Zhou, Meng
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2011, 241
  • [38] JISET: Java']JavaScript IR-based Semantics Extraction Toolchain
    Park, Jihyeok
    Park, Jihee
    An, Seungmin
    Ryu, Sukyoung
    2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 647 - 658
  • [40] Design, Implementation and Evaluation of IR-Based Tagging System for RTLS
    Attiya Baqai
    Anum Talpur
    Fahim Aziz Umrani
    Inamullah Lakho
    Wireless Personal Communications, 2020, 113 : 1345 - 1358