Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
  • Cheng, Yiran; Cheng, Bo; Jin, Pengxiang; Sun, Yongqian; Nie, Xiaohui; Zhao, Nengwen; Zhang, Shenglin; Pei, Dan

    2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE), 2022-Oct.
    Conference Proceeding

    Using large-scale multi-dimensional data for root cause analysis (MDRCA) is vitally important for online software services. It helps operators narrow down the scope of anomalies and failures quickly and localize the root cause to a finer granularity. However, most existing MDRCA algorithms can only solve low-dimensional problems. When dealing with high-dimensional data, the complexity of these algorithms would significantly increase, and even some algorithms would no longer work. Intuitively, passing only a subset of attributes rather than full attributes can improve the performance of these MDRCA algorithms. However, it is challenging due to data imbalance and novel root cause attributes. To better understand the problem of root-cause-oriented attribute selection (RCOAS), we conduct a preliminary study based on real-world data. We find that there exist several straightforward rules to filter out some attributes. In addition, we reveal that existing approaches do not fit the requirements of RCOAS. Motivated by the study, we propose an RCOAS approach, RC-LIR, to select a subset of attributes for downstream algorithms. RC-LIR first performs rule-based selection. Then it improves a feature selection algorithm by two strategies, i.e., scaling up imbalanced data and considering the redundant cost. Experiments on 1000 real-world fault cases demonstrate that RC-LIR can achieve an F1-score of 0.88, outper-forming the baseline approaches by at least 0.15. Furthermore, our experiments with four widely adopted MDRCA algorithms show that integrating RC-LIR can lead to more effective and efficient MDRCA.