ON THE DISTRIBUTION OF SOURCE CODE FILE SIZES

被引:0
|
作者
Herraiz, Israel [1 ]
German, Daniel M. [2 ]
Hassan, Ahmed E. [3 ]
机构
[1] Tech Univ Madrid, Madrid, Spain
[2] Univ Victoria, Victoria, BC, Canada
[3] Queens Univ, Kingston, ON, Canada
关键词
Mining software repositories; Software size estimation; Open source;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Source code size is an estimator of software effort. Size is also often used to calibrate models and equations to estimate the cost of software. The distribution of source code file sizes has been shown in the literature to be a lognormal distribution. In this paper, we measure the size of a large collection of software (the Debian GNU/Linux distribution version 5.0.2), and we find that the statistical distribution of its source code file sizes follows a double Pareto distribution. This means that large files are to be found more often than predicted by the lognormal distribution, therefore the previously proposed models underestimate the cost of software.
引用
收藏
页码:5 / 14
页数:10
相关论文
共 50 条
  • [1] Identifying Source Code File Experts
    Cury, Otavio
    Avelino, Guilherme
    Neto, Pedro Santos
    Britto, Ricardo
    Valente, Marco Tulio
    PROCEEDINGS OF THE16TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT, ESEM 2022, 2022, : 125 - 136
  • [2] Revisiting file context for source code summarization
    Su, Chia-Yi
    Bansal, Aakash
    McMillan, Collin
    AUTOMATED SOFTWARE ENGINEERING, 2024, 31 (02)
  • [3] Weak Labelling for File -level Source Code Classification
    Sas, Cezar
    Capiluppi, Andrea
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 698 - 702
  • [4] A Distributed File System for Frequency Reading of Various File Sizes
    Ma, Pengfei
    Yin, Yanshen
    Lan, Chao
    Zhang, Yong
    Xing, Chunxiao
    2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 339 - +
  • [5] Determining the optimal file size on tertiary storage systems based on the distribution of query sizes
    Bernardo, LM
    Nordberg, H
    Rotem, D
    Shoshani, A
    TENTH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT - PROCEEDINGS, 1998, : 22 - 31
  • [6] Erasure Code of Small File in a Distributed File System
    Chen, Xinhai
    Liu, Jie
    Xie, Peizhen
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2549 - 2554
  • [7] An Empirical Analysis for Predicting Source Code File Reusability Using Meta-Classification Algorithms
    Kaur, Loveleen
    Mishra, Ashutosh
    ADVANCED COMPUTATIONAL AND COMMUNICATION PARADIGMS, VOL 2, 2018, 706 : 493 - 504
  • [9] Coded Caching for Files with Distinct File Sizes
    Zhang, Jinbei
    Lin, Xiaojun
    Wang, Chih-Chun
    Wang, Xinbing
    2015 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2015, : 1686 - 1690
  • [10] Information source file
    Appliance, 1998, 55 (2 pt 2):