Logging requirement for continuous auditing of responsible machine learning-based applications

被引：0

作者：

Patrick Loic Foalem ^{[1
]}

Leuson Da Silva ^{[1
]}

Foutse Khomh ^{[1
]}

Heng Li ^{[1
]}

Ettore Merlo ^{[1
]}

机构：

[1] Polytechnique Montreal,Department of Computer Engineering and Software Engineering

来源：

Empirical Software Engineering | 2025年 / 30卷 / 3期

关键词：

Empirical; GitHub repository; Machine learning; Responsible ML; Logging; Auditing; Transparency; Fairness; Accountability;

D O I：

10.1007/s10664-025-10656-8

中图分类号：

学科分类号：

摘要：

Machine learning (ML) is increasingly used across various industries to automate decision-making processes. However, concerns about the ethical and legal compliance of ML models have arisen due to their lack of transparency, fairness, and accountability. Monitoring, particularly through logging, is a widely used technique in traditional software systems that could be leveraged to assist in auditing ML-based applications. Logs provide a record of an application’s behavior, which can be used for continuous auditing, debugging, and analyzing both the behavior and performance of the application. In this study, we investigate the logging practices of ML practitioners to capture responsible ML-related information in ML applications. We analyzed 85 ML projects hosted on GitHub, leveraging 20 responsible ML libraries that span principles such as privacy, transparency & explainability, fairness, and security & safety. Our analysis revealed important differences in the implementation of responsible AI principles. For example, out of 5,733 function calls analyzed, privacy accounted for 89.3% (5,120 calls), while fairness represented only 2.1% (118 calls), highlighting the uneven emphasis on these principles across projects. Furthermore, our manual analysis of 44,877 issue discussions revealed that only 8.1% of the sampled issues addressed responsible AI principles, with transparency & explainability being the most frequently discussed principles (32.2% of all issues related to responsible AI principles). Additionally, a survey conducted with ML practitioners provided direct insights into their perspectives, informing our exploration of ways to enhance logging practices for more effective, responsible ML auditing. We discovered that while privacy, model interpretability & explainability, fairness, and security & safety are commonly considered, there is a gap in how metrics associated with these principles are logged. Specifically, crucial fairness metrics like group and individual fairness, privacy metrics such as epsilon and delta, and explainability metrics like SHAP values are not considered current logging practices. The insights from this study highlight the need for ML practitioners and logging tool developers to adopt enhanced logging strategies that incorporate a broader range of responsible AI metrics. This adjustment will facilitate the development of auditable and ethically responsible ML applications, ensuring they meet emerging regulatory and societal expectations. These specific insights offer actionable guidance for improving the accountability and trustworthiness of ML systems.

引用

共 50 条

[21] Detecting the impact of subject characteristics on machine learning-based diagnostic applications
Neto, Elias Chaibub
Pratap, Abhishek
Perumal, Thanneer M.
Tummalacherla, Meghasyam
Snyder, Phil
Bot, Brian M.
Trister, Andrew D.
Friend, Stephen H.
Mangravite, Lara
Omberg, Larsson
NPJ DIGITAL MEDICINE, 2019, 2 (1)
[22] Conceptual Mappings of Conventional Software and Machine Learning-based Applications Development
Angel, Shannon
Namin, Akbar Siami
2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1223 - 1230
[23] A DT Machine Learning-Based Satellite Orbit Prediction for IoT Applications
Xu X.
Wen H.
Song H.
Zhao Y.
IEEE Internet of Things Magazine, 2023, 6 (02): : 96 - 100
[24] Efficient Encoding and Decoding of Voxelized Models for Machine Learning-Based Applications
Strnad, Damjan
Kohek, Stefan
Zalik, Borut
Vasa, Libor
Nerat, Andrej
IEEE ACCESS, 2025, 13 : 5551 - 5561
[25] Bayesian and machine learning-based fault detection and diagnostics for marine applications
Cheliotis, Michail
Lazakis, Iraklis
Cheliotis, Angelos
SHIPS AND OFFSHORE STRUCTURES, 2022, 17 (12) : 2686 - 2698
[26] Detecting the impact of subject characteristics on machine learning-based diagnostic applications
Elias Chaibub Neto
Abhishek Pratap
Thanneer M. Perumal
Meghasyam Tummalacherla
Phil Snyder
Brian M. Bot
Andrew D. Trister
Stephen H. Friend
Lara Mangravite
Larsson Omberg
npj Digital Medicine, 2
[27] A review of machine learning-based human activity recognition for diverse applications
Kulsoom, Farzana
Narejo, Sanam
Mehmood, Zahid
Chaudhry, Hassan Nazeer
Butt, Aisha
Bashir, Ali Kashif
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (21): : 18289 - 18324
[28] A review of machine learning-based human activity recognition for diverse applications
Farzana Kulsoom
Sanam Narejo
Zahid Mehmood
Hassan Nazeer Chaudhry
Ayesha Butt
Ali Kashif Bashir
Neural Computing and Applications, 2022, 34 : 18289 - 18324
[29] Evolution of Machine Learning in Tuberculosis Diagnosis: A Review of Deep Learning-Based Medical Applications
Singh, Manisha
Pujar, Gurubasavaraj Veeranna
Kumar, Sethu Arun
Bhagyalalitha, Meduri
Akshatha, Handattu Shankaranarayana
Abuhaija, Belal
Alsoud, Anas Ratib
Abualigah, Laith
Beeraka, Narasimha M.
Gandomi, Amir H.
ELECTRONICS, 2022, 11 (17)
[30] Continuous Defect Prediction in CI/CD Pipelines: A Machine Learning-Based Framework
Giorgio, Lazzarinetti
Nicola, Massarenti
Fabio, Sgro
Andrea, Salafia
AIXIA 2021 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13196 : 591 - 606

← 1 2 3 4 5 →