Vocabulary Normalization Improves IR-Based Concept Location

被引:0
|
作者
Binkley, Dave [1 ]
Lawrie, Dawn [1 ]
Uehlinger, Christopher [1 ]
机构
[1] Loyola Univ Maryland, Dept Comp Sci, Baltimore, MD 21210 USA
关键词
vocabulary normalization; information retrieval; concept location;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Tool support is crucial to modern software development, evolution, and maintenance. Early tools reused the static analysis performed by the compiler. These were followed by dynamic analysis tools and more recently tools that exploit natural language. This later class has the advantage that it can incorporate not only the code, but artifacts from all phases of software construction and its subsequent evolution. Unfortunately, the natural language found in source code often uses a vocabulary different from that used in other software artifacts and thus increases the vocabulary mismatch problem. This problem exists because many natural-language tools imported from Information Retrieval (IR) and Natural Language Processing (NLP) implicitly assume the use of a single natural language vocabulary. Vocabulary normalization, which goes well beyond simple identifier splitting, brings the vocabulary of the source into line with other artifacts. Consequently, it is expected to improve the performance of existing and future IR and NLP based tools. As a case study, an experiment with an LSI-based feature locator is replicated. Normalization universally improves performance. For the tersest queries, this improvement is over 180% (p < 0.0001).
引用
收藏
页码:588 / 591
页数:4
相关论文
共 50 条
  • [1] On the Effect of the Query in IR-based Concept Location
    Haiduc, Sonia
    Marcus, Andrian
    [J]. 2011 IEEE 19TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2011, : 234 - 237
  • [2] On the Use of Relevance Feedback in IR-Based Concept Location
    Gay, Gregory
    Haiduc, Sonia
    Marcus, Andrian
    Menzies, Tim
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, CONFERENCE PROCEEDINGS, 2009, : 351 - +
  • [3] Towards a Benchmark and Automatic Calibration for IR-Based Concept Location
    Ohlemacher, Scott D.
    Marcus, Andrian
    [J]. 2011 IEEE 19TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2011, : 246 - 249
  • [4] Enabling improved IR-based feature location
    Binkley, Dave
    Lawrie, Dawn
    Uehlinger, Christopher
    Heinz, Daniel
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2015, 101 : 30 - 42
  • [5] On the use of positional proximity in IR-based feature location
    Hill, Emily
    Sisman, Bunyamin
    Kak, Avinash
    [J]. 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, CSMR-WCRE 2014 - Proceedings, 2014, : 318 - 322
  • [6] On the Use of Positional Proximity in IR-Based Feature Location
    Hill, Emily
    Sisman, Bunyamin
    Kak, Avinash
    [J]. 2014 SOFTWARE EVOLUTION WEEK - IEEE CONFERENCE ON SOFTWARE MAINTENANCE, REENGINEERING, AND REVERSE ENGINEERING (CSMR-WCRE), 2014, : 318 - +
  • [7] IR-Based Protein & Peptide Quantitation
    Strug, Ivona
    Utzat, Christopher
    Nadler, Timothy
    [J]. GENETIC ENGINEERING & BIOTECHNOLOGY NEWS, 2012, 32 (19): : 30 - 31
  • [8] On the Role of the Nouns in IR-based Traceability Recovery
    Capobianco, Giovanni
    De Lucia, Andrea
    Oliveto, Rocco
    Panichella, Annibale
    Panichella, Sebastiano
    [J]. ICPC: 2009 IEEE 17TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, 2009, : 148 - +
  • [9] Strength behavior of Ir-based refractory superalloys
    Yamabe-Mitarai, Y
    Gu, YF
    Ro, Y
    Nakazawa, S
    Maruko, T
    Harada, H
    [J]. IRIDIUM, 2000, : 41 - 50
  • [10] Recent Advancement in the Synthesis of Ir-Based Complexes
    Joshi, Bhumika
    Shivashankar, Murugesh
    [J]. ACS OMEGA, 2023, 8 (46): : 43408 - 43432