Third-Party Library Dependency for Large-Scale SCA in the C/C plus plus Ecosystem: How Far Are We?

被引：5

作者：

Jiang, Ling ^{[1
,3
]}

Yuan, Hengchen ^{[1
]}

Tang, Qiyi ^{[2
]}

Nie, Sen ^{[2
]}

Wu, Shi ^{[2
]}

Zhang, Yuqun ^{[1
,3
,4
]}

机构：

[1] Southern Univ Sci & Technol, Shenzhen, Peoples R China

[2] Tencent Secur Keen Lab, Shanghai, Peoples R China

[3] Res Inst Trustworthy Autonomous Syst, Shenzhen, Peoples R China

[4] Guangdong Prov Key Lab Brain Inspired Intelligent, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Software Composition Analysis; Code Clone Detection; Mining Software Repositories; CODE; CENTRALITY; PAGERANK; REUSE;

D O I：

10.1145/3597926.3598143

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Existing software composition analysis (SCA) techniques for the C/C++ ecosystem tend to identify the reused components through feature matching between target software project and collected third-party libraries (TPLs). However, feature duplication caused by internal code clone can cause inaccurate SCA results. To mitigate this issue, Centris, a state-of-the-art SCA technique for the C/C++ ecosystem, was proposed to adopt function-level code clone detection to derive the TPL dependencies for eliminating the redundant features before performing SCA tasks. Although Centris has been shown effective in the original paper, the accuracy of the derived TPL dependencies is not evaluated. Additionally, the dataset to evaluate the impact of TPL dependency on SCA is limited. To further investigate the efficacy and limitations of Centris, we first construct two large-scale ground-truth datasets for evaluating the accuracy of deriving TPL dependency and SCA results respectively. Then we extensively evaluate Centris where the evaluation results suggest that the accuracy of TPL dependencies derived by Centris may not well generalize to our evaluation dataset. We further infer the key factors that degrade the performance can be the inaccurate function birth time and the threshold-based recall. In addition, the impact on SCA from the TPL dependencies derived by Centris can be somewhat limited. Inspired by our findings, we propose TPLite with function-level origin TPL detection and graph-based dependency recall to enhance the accuracy of TPL reuse detection in the C/C++ ecosystem. Our evaluation results indicate that TPLite effectively increases the precision from 35.71% to 88.33% and the recall from 49.44% to 62.65% of deriving TPL dependencies compared with Centris. Moreover, TPLite increases the precision from 21.08% to 75.90% and the recall from 57.62% to 64.17% compared with the SOTA academic SCA tool B2SFinder and even outperforms the well-adopted commercial SCA tool BDBA, i.e., increasing the precision from 72.46% to 75.90% and the recall from 58.55% to 64.17%.

引用

页码：1383 / 1395

页数：13

共 30 条

[1] Towards Understanding Third-party Library Dependency in C/C plus plus Ecosystem
Tang, Wei
Xu, Zhengzi
Liu, Chengwei
Wu, Jiahui
Yang, Shouguo
Li, Yi
Luo, Ping
Liu, Yang
[J]. PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
[2] Towards Understanding Third-party Library Dependency in C/C++ Ecosystem
Tang, Wei
Xu, Zhengzi
Liu, Chengwei
Wu, Jiahui
Yang, Shouguo
Li, Yi
Luo, Ping
Liu, Yang
[J]. arXiv, 2022,
[3] OSSFP: Precise and Scalable C/C plus plus Third-Party Library Detection using Fingerprinting Functions
Wu, Jiahui
Xu, Zhengzi
Tang, Wei
Zhang, Lyuye
Wu, Yueming
Liu, Chengyue
Sun, Kairan
Zhao, Lida
Liu, Yang
[J]. 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 270 - 282
[4] Large-Scale Third-Party Library Detection in Android Markets
Li, Menghao
Wang, Pei
Wang, Wei
Wang, Shuai
Wu, Dinghao
Liu, Jian
Xue, Rui
Huo, Wei
Zou, Wei
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2020, 46 (09) : 981 - 1003
[5] An Interactive Reverse Engineering Environment for Large-Scale C plus plus Code
Telea, Alexandru
Voinea, Lucian
[J]. SOFTVIS 2008: PROCEEDINGS OF THE 4TH ACM SYMPOSIUM ON SOFTWARE VISUALIZATION, 2008, : 67 - 76
[6] Large-scale semi-automated migration of legacy C/C plus plus test code
Schuts, Mathijs T. W.
Aarssen, Rodin T. A.
Tielemans, Paul M.
Vinju, Jurgen J.
[J]. SOFTWARE-PRACTICE & EXPERIENCE, 2022, 52 (07): : 1543 - 1580
[7] Compiler-Assisted Instrumentation Selection for Large-Scale C plus plus Codes
Kreutzer, Sebastian
Iwainsky, Christian
Lehr, Jan-Patrick
Bischof, Christian
[J]. HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2022 INTERNATIONAL WORKSHOPS, 2022, 13387 : 5 - 19
[8] Automated Fortran-C plus plus Bindings for Large-Scale Scientific Applications
Johnson, Seth R.
Prokopenko, Andrey
Evans, Katherine J.
[J]. COMPUTING IN SCIENCE & ENGINEERING, 2020, 22 (05) : 84 - 93
[9] Dealing with Popularity Bias in Recommender Systems for Third-party Libraries: How far Are We?
Nguyen, Phuong T.
Rubei, Riccardo
Di Rocco, Juri
Di Sipio, Claudio
Di Ruscio, Davide
Di Penta, Massimiliano
[J]. 2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 12 - 24
[10] A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware
Zhao, Binbin
Ji, Shouling
Xu, Jiacheng
Tian, Yuan
Wei, Qiuyang
Wang, Qinying
Lyu, Chenyang
Zhang, Xuhong
Lin, Changting
Wu, Jingzheng
Beyah, Raheem
[J]. PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022, 2022, : 442 - 454

← 1 2 3 →