Akademska digitalna zbirka SLovenije - logo
E-resources
Full text
Peer reviewed
  • On the Multiple Sources and...
    Li, Zhiqiang; Jing, Xiao-Yuan; Zhu, Xiaoke; Zhang, Hongyu; Xu, Baowen; Ying, Shi

    IEEE transactions on software engineering, 04/2019, Volume: 45, Issue: 4
    Journal Article

    Heterogeneous defect prediction (HDP) refers to predicting defect-proneness of software modules in a target project using heterogeneous metric data from other projects. Existing HDP methods mainly focus on predicting target instances with single source. In practice, there exist plenty of external projects. Multiple sources can generally provide more information than a single project. Therefore, it is meaningful to investigate whether the HDP performance can be improved by employing multiple sources. However, a precondition of conducting HDP is that the external sources are available. Due to privacy concerns, most companies are not willing to share their data. To facilitate data sharing, it is essential to study how to protect the privacy of data owners before they release their data. In this paper, we study the above two issues in HDP. Specifically, to utilize multiple sources effectively, we propose a multi-source selection based manifold discriminant alignment (MSMDA) approach. To protect the privacy of data owners, a sparse representation based double obfuscation algorithm is designed and applied to HDP. Through a case study of 28 projects, our results show that MSMDA can achieve better performance than a range of baseline methods. The improvement is 3.4-15.3 percent in g-measure and 3.0-19.1 percent in AUG.