Post2Vec: Learning Distributed Representations of Stack Overflow Posts

被引:13
|
作者
Xu, Bowen [1 ]
Thong Hoang [1 ]
Sharma, Abhishek [1 ]
Yang, Chengran [1 ]
Xia, Xin [2 ]
Lo, David [1 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 188065, Singapore
[2] Huawei, Software Engn Applicat Technol Lab, Shenzhen 518129, Guangdong, Peoples R China
关键词
Task analysis; Feature extraction; Semantics; Computer architecture; Encoding; Deep learning; Computational modeling; TAG RECOMMENDATION SYSTEM; NEURAL-NETWORKS; INFORMATION; BACKPROPAGATION;
D O I
10.1109/TSE.2021.3093761
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those solutions is to extract suitable representations of posts; typically, in the form of meaningful vectors. These vectors are then used for different tasks, for example, tag recommendation, relatedness prediction, post classification, and API recommendation. Intuitively, the quality of the vector representations of posts determines the effectiveness of the solutions in performing the respective tasks. In this work, to aid existing studies that analyze Stack Overflow posts, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. Post2Vec is aware of different types of content present in Stack Overflow posts, i.e., title, description, and code snippets, and integrates them seamlessly to learn post representations. Tags provided by Stack Overflow users that serve as a common vocabulary that captures the semantics of posts are used to guide Post2Vec in its task. To evaluate the quality of Post2Vec's deep learning architecture, we first investigate its end-to-end effectiveness in tag recommendation task. The results are compared to those of state-of-the-art tag recommendation approaches that also employ deep neural networks. We observe that Post2Vec achieves 15-25 percent improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins (by 10, 7, and 10 percent in terms of F1-score, F1-score, and correctness, respectively). We release our replication package at https://github.com/maxxbw/Post2Vec.
引用
收藏
页码:3423 / 3441
页数:19
相关论文
共 50 条
  • [1] Representation Learning for Stack Overflow Posts: How Far Are We?
    He, Junda
    Zhou, Xin
    Xu, Bowen
    Zhang, Ting
    Kim, Kisub
    Yang, Zhou
    Thung, Ferdian
    Irsan, Ivana Clairine
    Lo, David
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (03)
  • [2] Topic2Vec: Learning Distributed Representations of Topics
    Niu, Liqiang
    Dai, Xinyu
    Zhang, Jianbing
    Chen, Jiajun
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 193 - 196
  • [3] code2vec: Learning Distributed Representations of Code
    Alon, Uri
    Zilberstein, Meital
    Levy, Omer
    Yahav, Eran
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL):
  • [4] Commit2Vec: Learning Distributed Representations of Code Changes
    Cabrera Lozoya R.
    Baumann A.
    Sabetta A.
    Bezzi M.
    SN Computer Science, 2021, 2 (3)
  • [5] DENT: A Tool for Tagging Stack Overflow Posts with Deep Learning Energy Patterns
    Shanbhag, Shriram
    Chimalakonda, Sridhar
    Sharma, Vibhu Saujanya
    Kaulgud, Vikrant
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 2157 - 2161
  • [6] Why is Developing Machine Learning Applications Challenging? A Study on Stack Overflow Posts
    Alshangiti, Moayad
    Sapkota, Hitesh
    Murukannaiah, Pradeep K.
    Liu, Xumin
    Yu, Qi
    2019 13TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM 2019), 2019, : 117 - 127
  • [7] Mining software architecture knowledge: Classifying stack overflow posts using machine learning
    Ali, Mubashir
    Mushtaq, Husnain
    Rasheed, Muhammad B.
    Baqir, Anees
    Alquthami, Thamer
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (16):
  • [8] Assessment2Vec: Learning Distributed Representations of Assessments to Reduce Marking Workload
    Wang, Shuang
    Beheshti, Amin
    Wang, Yufei
    Lu, Jianchao
    Sheng, Quan Z.
    Elbourn, Stephen
    Alinejad-Rokny, Hamid
    Galanis, Elizabeth
    ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 384 - 389
  • [9] Mining Stack Overflow for API class recommendation using DOC2VEC and LDA
    Lee, Wai Keat
    Su, Moon Ting
    IET SOFTWARE, 2021, 15 (05) : 308 - 322
  • [10] CC2Vec: Distributed Representations of Code Changes
    Hoang, Thong
    Kang, Hong Jin
    Lo, David
    Lawall, Julia
    2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 518 - 529