Third-Party Library Dependency for Large-Scale SCA in the C/C plus plus Ecosystem: How Far Are We?

被引:5
|
作者
Jiang, Ling [1 ,3 ]
Yuan, Hengchen [1 ]
Tang, Qiyi [2 ]
Nie, Sen [2 ]
Wu, Shi [2 ]
Zhang, Yuqun [1 ,3 ,4 ]
机构
[1] Southern Univ Sci & Technol, Shenzhen, Peoples R China
[2] Tencent Secur Keen Lab, Shanghai, Peoples R China
[3] Res Inst Trustworthy Autonomous Syst, Shenzhen, Peoples R China
[4] Guangdong Prov Key Lab Brain Inspired Intelligent, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Software Composition Analysis; Code Clone Detection; Mining Software Repositories; CODE; CENTRALITY; PAGERANK; REUSE;
D O I
10.1145/3597926.3598143
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing software composition analysis (SCA) techniques for the C/C++ ecosystem tend to identify the reused components through feature matching between target software project and collected third-party libraries (TPLs). However, feature duplication caused by internal code clone can cause inaccurate SCA results. To mitigate this issue, Centris, a state-of-the-art SCA technique for the C/C++ ecosystem, was proposed to adopt function-level code clone detection to derive the TPL dependencies for eliminating the redundant features before performing SCA tasks. Although Centris has been shown effective in the original paper, the accuracy of the derived TPL dependencies is not evaluated. Additionally, the dataset to evaluate the impact of TPL dependency on SCA is limited. To further investigate the efficacy and limitations of Centris, we first construct two large-scale ground-truth datasets for evaluating the accuracy of deriving TPL dependency and SCA results respectively. Then we extensively evaluate Centris where the evaluation results suggest that the accuracy of TPL dependencies derived by Centris may not well generalize to our evaluation dataset. We further infer the key factors that degrade the performance can be the inaccurate function birth time and the threshold-based recall. In addition, the impact on SCA from the TPL dependencies derived by Centris can be somewhat limited. Inspired by our findings, we propose TPLite with function-level origin TPL detection and graph-based dependency recall to enhance the accuracy of TPL reuse detection in the C/C++ ecosystem. Our evaluation results indicate that TPLite effectively increases the precision from 35.71% to 88.33% and the recall from 49.44% to 62.65% of deriving TPL dependencies compared with Centris. Moreover, TPLite increases the precision from 21.08% to 75.90% and the recall from 57.62% to 64.17% compared with the SOTA academic SCA tool B2SFinder and even outperforms the well-adopted commercial SCA tool BDBA, i.e., increasing the precision from 72.46% to 75.90% and the recall from 58.55% to 64.17%.
引用
收藏
页码:1383 / 1395
页数:13
相关论文
共 30 条
  • [1] Towards Understanding Third-party Library Dependency in C/C plus plus Ecosystem
    Tang, Wei
    Xu, Zhengzi
    Liu, Chengwei
    Wu, Jiahui
    Yang, Shouguo
    Li, Yi
    Luo, Ping
    Liu, Yang
    [J]. PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
  • [2] Towards Understanding Third-party Library Dependency in C/C++ Ecosystem
    Tang, Wei
    Xu, Zhengzi
    Liu, Chengwei
    Wu, Jiahui
    Yang, Shouguo
    Li, Yi
    Luo, Ping
    Liu, Yang
    [J]. arXiv, 2022,
  • [3] OSSFP: Precise and Scalable C/C plus plus Third-Party Library Detection using Fingerprinting Functions
    Wu, Jiahui
    Xu, Zhengzi
    Tang, Wei
    Zhang, Lyuye
    Wu, Yueming
    Liu, Chengyue
    Sun, Kairan
    Zhao, Lida
    Liu, Yang
    [J]. 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 270 - 282
  • [4] Large-Scale Third-Party Library Detection in Android Markets
    Li, Menghao
    Wang, Pei
    Wang, Wei
    Wang, Shuai
    Wu, Dinghao
    Liu, Jian
    Xue, Rui
    Huo, Wei
    Zou, Wei
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2020, 46 (09) : 981 - 1003
  • [5] An Interactive Reverse Engineering Environment for Large-Scale C plus plus Code
    Telea, Alexandru
    Voinea, Lucian
    [J]. SOFTVIS 2008: PROCEEDINGS OF THE 4TH ACM SYMPOSIUM ON SOFTWARE VISUALIZATION, 2008, : 67 - 76
  • [6] Large-scale semi-automated migration of legacy C/C plus plus test code
    Schuts, Mathijs T. W.
    Aarssen, Rodin T. A.
    Tielemans, Paul M.
    Vinju, Jurgen J.
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2022, 52 (07): : 1543 - 1580
  • [7] Compiler-Assisted Instrumentation Selection for Large-Scale C plus plus Codes
    Kreutzer, Sebastian
    Iwainsky, Christian
    Lehr, Jan-Patrick
    Bischof, Christian
    [J]. HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2022 INTERNATIONAL WORKSHOPS, 2022, 13387 : 5 - 19
  • [8] Automated Fortran-C plus plus Bindings for Large-Scale Scientific Applications
    Johnson, Seth R.
    Prokopenko, Andrey
    Evans, Katherine J.
    [J]. COMPUTING IN SCIENCE & ENGINEERING, 2020, 22 (05) : 84 - 93
  • [9] Dealing with Popularity Bias in Recommender Systems for Third-party Libraries: How far Are We?
    Nguyen, Phuong T.
    Rubei, Riccardo
    Di Rocco, Juri
    Di Sipio, Claudio
    Di Ruscio, Davide
    Di Penta, Massimiliano
    [J]. 2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 12 - 24
  • [10] A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware
    Zhao, Binbin
    Ji, Shouling
    Xu, Jiacheng
    Tian, Yuan
    Wei, Qiuyang
    Wang, Qinying
    Lyu, Chenyang
    Zhang, Xuhong
    Lin, Changting
    Wu, Jingzheng
    Beyah, Raheem
    [J]. PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022, 2022, : 442 - 454