Third-Party Library Dependency for Large-Scale SCA in the C/C plus plus Ecosystem: How Far Are We?

被引:5
|
作者
Jiang, Ling [1 ,3 ]
Yuan, Hengchen [1 ]
Tang, Qiyi [2 ]
Nie, Sen [2 ]
Wu, Shi [2 ]
Zhang, Yuqun [1 ,3 ,4 ]
机构
[1] Southern Univ Sci & Technol, Shenzhen, Peoples R China
[2] Tencent Secur Keen Lab, Shanghai, Peoples R China
[3] Res Inst Trustworthy Autonomous Syst, Shenzhen, Peoples R China
[4] Guangdong Prov Key Lab Brain Inspired Intelligent, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Software Composition Analysis; Code Clone Detection; Mining Software Repositories; CODE; CENTRALITY; PAGERANK; REUSE;
D O I
10.1145/3597926.3598143
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing software composition analysis (SCA) techniques for the C/C++ ecosystem tend to identify the reused components through feature matching between target software project and collected third-party libraries (TPLs). However, feature duplication caused by internal code clone can cause inaccurate SCA results. To mitigate this issue, Centris, a state-of-the-art SCA technique for the C/C++ ecosystem, was proposed to adopt function-level code clone detection to derive the TPL dependencies for eliminating the redundant features before performing SCA tasks. Although Centris has been shown effective in the original paper, the accuracy of the derived TPL dependencies is not evaluated. Additionally, the dataset to evaluate the impact of TPL dependency on SCA is limited. To further investigate the efficacy and limitations of Centris, we first construct two large-scale ground-truth datasets for evaluating the accuracy of deriving TPL dependency and SCA results respectively. Then we extensively evaluate Centris where the evaluation results suggest that the accuracy of TPL dependencies derived by Centris may not well generalize to our evaluation dataset. We further infer the key factors that degrade the performance can be the inaccurate function birth time and the threshold-based recall. In addition, the impact on SCA from the TPL dependencies derived by Centris can be somewhat limited. Inspired by our findings, we propose TPLite with function-level origin TPL detection and graph-based dependency recall to enhance the accuracy of TPL reuse detection in the C/C++ ecosystem. Our evaluation results indicate that TPLite effectively increases the precision from 35.71% to 88.33% and the recall from 49.44% to 62.65% of deriving TPL dependencies compared with Centris. Moreover, TPLite increases the precision from 21.08% to 75.90% and the recall from 57.62% to 64.17% compared with the SOTA academic SCA tool B2SFinder and even outperforms the well-adopted commercial SCA tool BDBA, i.e., increasing the precision from 72.46% to 75.90% and the recall from 58.55% to 64.17%.
引用
收藏
页码:1383 / 1395
页数:13
相关论文
共 30 条
  • [21] Algorithm 992: An OpenGL- and C plus plus -based Function Library for Curve and Surface Modeling in a Large Class of Extended Chebyshev Spaces
    Roth, Agoston
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2019, 45 (01):
  • [22] Impact of Photovoltaic-Oriented DC Stray Current Corrosion on Large-Scale Solar Farms' Grounding and Third-Party Infrastructure: Modeling and Assessment
    Charalambous, Charalambos A.
    Dimitriou, Andreas
    Kokkinos, Nikolaos D.
    [J]. IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2015, 51 (06) : 5421 - 5430
  • [23] Remark on Algorithm 992: An OpenGL- and C plus plus -based Function Library for Curve and Surface Modeling in a Large Class of Extended Chebyshev Spaces
    Roth, Agoston
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2021, 47 (04):
  • [24] An End-to-End, Large-Scale Measurement of DNS-over-Encryption: How Far Have We Come?
    Lu, Chaoyi
    Liu, Baojun
    Li, Zhou
    Hao, Shuang
    Duan, Haixin
    Zhang, Mingming
    Leng, Chunying
    Liu, Ying
    Zhang, Zaifeng
    Wu, Jianping
    [J]. IMC'19: PROCEEDINGS OF THE 2019 ACM INTERNET MEASUREMENT CONFERENCE, 2019, : 22 - 35
  • [25] Deep Neural Network-enabled Fast and Large-Scale QoT Estimation for Dynamic C plus L-Band Mesh Networks
    Zhang, Yao
    Song, Yuchen
    Shi, Yan
    Li, Jin
    Zhang, Chuanbiao
    Tang, Yu
    Zhang, Min
    Wang, Danshi
    [J]. 2023 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXHIBITION, OFC, 2023,
  • [26] How far in advance can we predict changes in large-scale flow leading to severe cold conditions over Europe?
    Ferranti, Laura
    Magnusson, Linus
    Vitart, Frederic
    Richardson, David S.
    [J]. QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2018, 144 (715) : 1788 - 1802
  • [27] Pay more attention to consumers: exploring customer acquisition strategies of large third-party sellers on e-B2C market
    Li, Xiaoling
    Wu, Zongshu
    Huang, Qing
    Liu, Juanyi
    [J]. INDUSTRIAL MANAGEMENT & DATA SYSTEMS, 2024, 124 (04) : 1558 - 1581
  • [28] Large-scale survey observations of the far-infrared [C II] line emission from the Galactic plane
    Nakagawa, T
    Yui, YY
    Doi, Y
    Mochizuki, K
    Okuda, H
    Shibai, H
    Nishimura, T
    Low, FJ
    [J]. ASTROPHYSICS WITH INFRARED SURVEYS: A PRELUDE TO SIRTF, 1999, 177 : 332 - 335
  • [29] Round-table discussions on large-scale computations by non-equilibrium molecular dynamics: How far we can go?
    Mareschal, M
    [J]. MONTE CARLO AND MOLECULAR DYNAMICS OF CONDENSED MATTER SYSTEMS, 1996, 49 : 871 - &