Post2Vec: Learning Distributed Representations of Stack Overflow Posts

被引:13
|
作者
Xu, Bowen [1 ]
Thong Hoang [1 ]
Sharma, Abhishek [1 ]
Yang, Chengran [1 ]
Xia, Xin [2 ]
Lo, David [1 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 188065, Singapore
[2] Huawei, Software Engn Applicat Technol Lab, Shenzhen 518129, Guangdong, Peoples R China
关键词
Task analysis; Feature extraction; Semantics; Computer architecture; Encoding; Deep learning; Computational modeling; TAG RECOMMENDATION SYSTEM; NEURAL-NETWORKS; INFORMATION; BACKPROPAGATION;
D O I
10.1109/TSE.2021.3093761
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those solutions is to extract suitable representations of posts; typically, in the form of meaningful vectors. These vectors are then used for different tasks, for example, tag recommendation, relatedness prediction, post classification, and API recommendation. Intuitively, the quality of the vector representations of posts determines the effectiveness of the solutions in performing the respective tasks. In this work, to aid existing studies that analyze Stack Overflow posts, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. Post2Vec is aware of different types of content present in Stack Overflow posts, i.e., title, description, and code snippets, and integrates them seamlessly to learn post representations. Tags provided by Stack Overflow users that serve as a common vocabulary that captures the semantics of posts are used to guide Post2Vec in its task. To evaluate the quality of Post2Vec's deep learning architecture, we first investigate its end-to-end effectiveness in tag recommendation task. The results are compared to those of state-of-the-art tag recommendation approaches that also employ deep neural networks. We observe that Post2Vec achieves 15-25 percent improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins (by 10, 7, and 10 percent in terms of F1-score, F1-score, and correctness, respectively). We release our replication package at https://github.com/maxxbw/Post2Vec.
引用
收藏
页码:3423 / 3441
页数:19
相关论文
共 50 条
  • [21] G-HIN2Vec: Distributed heterogeneous graph representations for cardholder transactions
    Damoun, Farouk
    Seba, Hamida
    Hilger, Jean
    State, Radu
    38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 528 - 535
  • [22] G2Vec: Distributed gene representations for identification of cancer prognostic genes
    Jonghwan Choi
    Ilhwan Oh
    Sangmin Seo
    Jaegyoon Ahn
    Scientific Reports, 8
  • [23] Mouse2Vec: Learning Reusable Semantic Representations of Mouse Behaviour
    Zhang, Guanhua
    Hu, Zhiming
    Bace, Mihai
    Bulling, Andreas
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2024, 2024,
  • [24] evolve2vec: Learning Network Representations Using Temporal Unfolding
    Bastas, Nikolaos
    Semertzidis, Theodoros
    Axenopoulos, Apostolos
    Daras, Petros
    MULTIMEDIA MODELING (MMM 2019), PT I, 2019, 11295 : 447 - 458
  • [25] struc2vec: Learning Node Representations from Structural Identity
    Ribeiro, Leonardo F. R.
    Saverese, Pedro H. P.
    Figueiredo, Daniel R.
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 385 - 394
  • [26] API2Vec: Learning Representations of API Sequences for Malware Detection
    Cui, Lei
    Cui, Jiancong
    Ji, Yuede
    Hao, Zhiyu
    Li, Lun
    Ding, Zhenquan
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 261 - 273
  • [27] Zone2Vec: Distributed Representation Learning of Urban Zones
    Du, Jiahong
    Chen, Yujun
    Wang, Yue
    Pu, Juhua
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 880 - 885
  • [28] Location2Vec: Generating Distributed Representation of Location by Using Geo-tagged Microblog Posts
    Shoji, Yoshiyuki
    Takahashi, Katsurou
    Durst, Martin J.
    Yamamoto, Yusuke
    Ohshima, Hiroaki
    SOCIAL INFORMATICS (SOCINFO 2018), PT II, 2018, 11186 : 261 - 270
  • [29] Behavior2Vec: Generating Distributed Representations of Users' Behaviors on Products for Recommender Systems
    Chen, Hung-Hsuan
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (04)
  • [30] Bin2vec: learning representations of binary executable programs for security tasks
    Arakelyan, Shushan
    Arasteh, Sima
    Hauser, Christophe
    Kline, Erik
    Galstyan, Aram
    CYBERSECURITY, 2021, 4 (01)