Balancing Privacy and Utility in Cross-Company Defect Prediction

被引:100
|
作者
Peters, Fayola [1 ]
Menzies, Tim [1 ]
Gong, Liang [2 ]
Zhang, Hongyu [2 ]
机构
[1] W Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
[2] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
基金
美国国家科学基金会;
关键词
Privacy; classification; defect prediction; STATIC CODE ATTRIBUTES; K-ANONYMITY; MODEL;
D O I
10.1109/TSE.2013.6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Background: Cross-company defect prediction (CCDP) is a field of study where an organization lacking enough local data can use data from other organizations for building defect predictors. To support CCDP, data must be shared. Such shared data must be privatized, but that privatization could severely damage the utility of the data. Aim: To enable effective defect prediction from shared data while preserving privacy. Method: We explore privatization algorithms that maintain class boundaries in a dataset. CLIFF is an instance pruner that deletes irrelevant examples. MORPH is a data mutator that moves the data a random distance, taking care not to cross class boundaries. CLIFF+MORPH are tested in a CCDP study among 10 defect datasets from the PROMISE data repository. Results: We find: 1) The CLIFFed+MORPHed algorithms provide more privacy than the state-of-the-art privacy algorithms; 2) in terms of utility measured by defect prediction, we find that CLIFF+MORPH performs significantly better. Conclusions: For the OO defect data studied here, data can be privatized and shared without a significant degradation in utility. To the best of our knowledge, this is the first published result where privatization does not compromise defect prediction.
引用
收藏
页码:1054 / 1068
页数:15
相关论文
共 50 条
  • [21] Cross-company customer churn prediction in telecommunication: A comparison of data transformation methods
    Amin, Adnan
    Shah, Babar
    Khattak, Asad Masood
    Lopes Moreira, Fernando Joaquim
    Ali, Gohar
    Rocha, Alvaro
    Anwar, Sajid
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2019, 46 : 304 - 319
  • [22] An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation
    Amasaki, Sousuke
    Aman, Hirohisa
    Yokogawa, Tomoyuki
    EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (02)
  • [23] An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation
    Sousuke Amasaki
    Hirohisa Aman
    Tomoyuki Yokogawa
    Empirical Software Engineering, 2022, 27
  • [24] Reference Architecture for Cross-Company Electronic Collaboration
    Schroth, Christoph
    Schmid, Beat
    INTERNATIONAL JOURNAL OF E-COLLABORATION, 2009, 5 (02) : 75 - 91
  • [25] CROSS-COMPANY DATA FLOWS WITH ERP SYSTEM
    Kuzdowicz, Pawel
    Relich, Marcin
    Kuzdowicz, Dorota
    SBORNIK Z MEZINARODNI VEDECKE KONFERENCE ZNALOSTI PRO TRZNI PRAXI 2012: VYZNAM ZNALOSTI V AKTUALNI FAZI EKONOMICKEHO CYKLU, 2012, : 264 - 269
  • [26] Supporting Defect Causal Analysis in Practice with Cross-Company Data on Causes of Requirements Engineering Problems
    Kalinowski, Marcos
    Curty, Pablo
    Paes, Aline
    Ferreira, Alexandre
    Spinola, Rodrigo
    Fernandez, Daniel Mendez
    Felderer, Michael
    Wagner, Stefan
    2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE TRACK (ICSE-SEIP 2017), 2017, : 223 - 232
  • [27] Loosening the hierarchy of cross-company electronic collaboration
    Schroth, Christoph
    INFORMATION SYSTEMS AND E-BUSINESS TECHNOLOGIES, 2008, 5 : 567 - 578
  • [28] Cross-company jump spillover and the role of news
    Poli, Francesco
    Caporin, Massimiliano
    HELIYON, 2024, 10 (14)
  • [29] Short-term, cross-company production cooperation
    Reinhart, Gunther
    Broser, Welf
    Suchanek, Siegfried
    Weber, Volker
    ZWF Zeitschrift fuer Wirtschaftlichen Fabrikbetrieb, 2002, 97 (12): : 610 - 614
  • [30] Engineered quality via a cross-company communications structure
    Kreis, Willibald
    Kuhlenkoetter, Bernd
    ZWF Zeitschrift fuer Wirtschaftlichen Fabrikbetrieb, 1998, 93 (12): : 620 - 623