Post2Vec: Learning Distributed Representations of Stack Overflow Posts

被引:13
|
作者
Xu, Bowen [1 ]
Thong Hoang [1 ]
Sharma, Abhishek [1 ]
Yang, Chengran [1 ]
Xia, Xin [2 ]
Lo, David [1 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 188065, Singapore
[2] Huawei, Software Engn Applicat Technol Lab, Shenzhen 518129, Guangdong, Peoples R China
关键词
Task analysis; Feature extraction; Semantics; Computer architecture; Encoding; Deep learning; Computational modeling; TAG RECOMMENDATION SYSTEM; NEURAL-NETWORKS; INFORMATION; BACKPROPAGATION;
D O I
10.1109/TSE.2021.3093761
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those solutions is to extract suitable representations of posts; typically, in the form of meaningful vectors. These vectors are then used for different tasks, for example, tag recommendation, relatedness prediction, post classification, and API recommendation. Intuitively, the quality of the vector representations of posts determines the effectiveness of the solutions in performing the respective tasks. In this work, to aid existing studies that analyze Stack Overflow posts, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. Post2Vec is aware of different types of content present in Stack Overflow posts, i.e., title, description, and code snippets, and integrates them seamlessly to learn post representations. Tags provided by Stack Overflow users that serve as a common vocabulary that captures the semantics of posts are used to guide Post2Vec in its task. To evaluate the quality of Post2Vec's deep learning architecture, we first investigate its end-to-end effectiveness in tag recommendation task. The results are compared to those of state-of-the-art tag recommendation approaches that also employ deep neural networks. We observe that Post2Vec achieves 15-25 percent improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins (by 10, 7, and 10 percent in terms of F1-score, F1-score, and correctness, respectively). We release our replication package at https://github.com/maxxbw/Post2Vec.
引用
收藏
页码:3423 / 3441
页数:19
相关论文
共 50 条
  • [41] Grid2Vec: Learning Node Representations of Digital Power Systems for Anomaly Detection
    Wang, Zhiwei
    Jiang, Wei
    Xu, Junjun
    Xu, Zhiqi
    Zhou, Aihua
    Xu, Min
    IEEE TRANSACTIONS ON SMART GRID, 2024, 15 (05) : 5031 - 5042
  • [42] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
    Baevski, Alexei
    Zhou, Henry
    Mohamed, Abdelrahman
    Auli, Michael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [43] Distributed representation learning via node2vec for implicit feedback recommendation
    Liu, Yezheng
    Tian, Zhiqiang
    Sun, Jianshan
    Jiang, Yuanchun
    Zhang, Xue
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (09): : 4335 - 4345
  • [44] Distributed representation learning via node2vec for implicit feedback recommendation
    Yezheng Liu
    Zhiqiang Tian
    Jianshan Sun
    Yuanchun Jiang
    Xue Zhang
    Neural Computing and Applications, 2020, 32 : 4335 - 4345
  • [45] Expert2Vec: Distributed Expert Representation Learning in Question Answering Community
    Chen, Xiaocong
    Huang, Chaoran
    Zhang, Xiang
    Wang, Xianzhi
    Liu, Wei
    Yao, Lina
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2019, 2019, 11888 : 288 - 301
  • [46] Identity2Vec: learning mesoscopic structural identity representations via Poisson probability metric
    Oluigbo, Ikenna Victor
    Seba, Hamida
    Haddad, Mohammed
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 17 (03) : 249 - 260
  • [47] Identity2Vec: learning mesoscopic structural identity representations via Poisson probability metric
    Ikenna Victor Oluigbo
    Hamida Seba
    Mohammed Haddad
    International Journal of Data Science and Analytics, 2024, 17 : 249 - 260
  • [48] People2Vec: Learning Latent Representations of Users Using Their Social-Media Activities
    Kumar, Sumeet
    Carley, Kathleen M.
    SOCIAL, CULTURAL, AND BEHAVIORAL MODELING, SBP-BRIMS 2018, 2018, 10899 : 154 - 163
  • [49] Event2Vec: Learning Event Representations Using Spatial-Temporal Information for Recommendation
    Wang, Yan
    Tang, Jie
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT III, 2019, 11441 : 314 - 326
  • [50] DNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech Scoring
    Tao, Jidong
    Chen, Lei
    Lee, Chong Min
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3117 - 3121