Post2Vec: Learning Distributed Representations of Stack Overflow Posts

被引:13
|
作者
Xu, Bowen [1 ]
Thong Hoang [1 ]
Sharma, Abhishek [1 ]
Yang, Chengran [1 ]
Xia, Xin [2 ]
Lo, David [1 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 188065, Singapore
[2] Huawei, Software Engn Applicat Technol Lab, Shenzhen 518129, Guangdong, Peoples R China
关键词
Task analysis; Feature extraction; Semantics; Computer architecture; Encoding; Deep learning; Computational modeling; TAG RECOMMENDATION SYSTEM; NEURAL-NETWORKS; INFORMATION; BACKPROPAGATION;
D O I
10.1109/TSE.2021.3093761
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those solutions is to extract suitable representations of posts; typically, in the form of meaningful vectors. These vectors are then used for different tasks, for example, tag recommendation, relatedness prediction, post classification, and API recommendation. Intuitively, the quality of the vector representations of posts determines the effectiveness of the solutions in performing the respective tasks. In this work, to aid existing studies that analyze Stack Overflow posts, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. Post2Vec is aware of different types of content present in Stack Overflow posts, i.e., title, description, and code snippets, and integrates them seamlessly to learn post representations. Tags provided by Stack Overflow users that serve as a common vocabulary that captures the semantics of posts are used to guide Post2Vec in its task. To evaluate the quality of Post2Vec's deep learning architecture, we first investigate its end-to-end effectiveness in tag recommendation task. The results are compared to those of state-of-the-art tag recommendation approaches that also employ deep neural networks. We observe that Post2Vec achieves 15-25 percent improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins (by 10, 7, and 10 percent in terms of F1-score, F1-score, and correctness, respectively). We release our replication package at https://github.com/maxxbw/Post2Vec.
引用
收藏
页码:3423 / 3441
页数:19
相关论文
共 50 条
  • [31] Author2Vec: Learning Author Representations by Combining Content and Link Information
    Ganesh, J.
    Ganguly, Soumyajit
    Gupta, Manish
    Varma, Vasudeva
    Pudi, Vikram
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 49 - 50
  • [32] Basket2vec: Learning Retail Basket Representations From Transactional Data
    Piguave, Bryan V.
    Abad, Andres G.
    IEEE Access, 2024, 12 : 162392 - 162398
  • [33] Bin2vec: learning representations of binary executable programs for security tasks
    Shushan Arakelyan
    Sima Arasteh
    Christophe Hauser
    Erik Kline
    Aram Galstyan
    Cybersecurity, 4
  • [34] DRG2vec: Learning Word Representations from Definition Relational Graph
    Shu, Xiaobo
    Yu, Bowen
    Zhang, Zhenyu
    Liu, Tingwen
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [35] Tile2Vec: Unsupervised Representation Learning for Spatially Distributed Data
    Jean, Neal
    Wang, Sherrie
    Samar, Anshul
    Azzari, George
    Lobell, David
    Ermon, Stefano
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3967 - 3974
  • [36] Nuc2Vec: Learning Representations of Nuclei in Histopathology Images with Contrastive Loss
    Feng, Chao
    Vanderbilt, Chad
    Fuchs, Thomas J.
    MEDICAL IMAGING WITH DEEP LEARNING, VOL 143, 2021, 143 : 179 - 189
  • [37] Resource2Vec: Linked Data distributed representations for term discovery in automatic speech recognition
    Coucheiro-Limeres, Alejandro
    Ferreiros-Lopez, Javier
    San-Segundo, Ruben
    Cordoba, Ricardo
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 112 : 301 - 320
  • [38] A deep learning analysis on question classification task using Word2vec representations
    Yilmaz, Seyhmus
    Toklu, Sinan
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (07): : 2909 - 2928
  • [39] Persona2vec: a flexible multi-role representations learning framework for graphs
    Yoon, Jisung
    Yang, Kai-Cheng
    Jung, Woo-Sung
    Ahn, Yong-Yeol
    PEERJ COMPUTER SCIENCE, 2021,
  • [40] A deep learning analysis on question classification task using Word2vec representations
    Seyhmus Yilmaz
    Sinan Toklu
    Neural Computing and Applications, 2020, 32 : 2909 - 2928