Semi-supervised learning concerns the problem of learning in the presence of labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various ...strategies. To our knowledge, however, none of them takes all three semi-supervised assumptions, i.e., smoothness, cluster, and manifold assumptions, together into account during boosting learning. In this paper, we propose a novel cost functional consisting of the margin cost on labeled data and the regularization penalty on unlabeled data based on three fundamental semi-supervised assumptions. Thus, minimizing our proposed cost functional with a greedy yet stagewise functional optimization procedure leads to a generic boosting framework for semi-supervised learning. Extensive experiments demonstrate that our algorithm yields favorite results for benchmark and real-world classification tasks in comparison to state-of-the-art semi-supervised learning algorithms, including newly developed boosting algorithms. Finally, we discuss relevant issues and relate our algorithm to the previous work.
Automatic fault localization is essential for software engineering. However, fault localization suffers from the interactions among multiple faults. Our previous research revealed that the ...fault-coupling effect is responsible for the weakened fault localization performance in multiple-fault programs. On the basis of this finding, we propose a Test Case Restoration Method based on the Genetic Algorithm (TRGA) to search potential coupling test cases and conduct a restoration process for eliminating the coupling effect. The major contributions of the current study are as follows: (1) the construction of a fitness function to measure the possibility of failed test cases becoming coupling test cases; (2) the development of a TRGA that searches potential coupling test cases; (3) and an evaluation of the TRGA efficiency across 14 open-source programs, three spectrum-based fault localizations, and two parallel debugging techniques. The results revealed the TRGA outperformed the original fault localization techniques in 74.28% and 78.57% of the scenarios in the best and worst cases, respectively. On average, the percentage improvement was 4.43% for the best case and 2% for the worst case. A detailed discussion of TRGA parameter settings is also provided.
•A framework named TRGA is constructed for multiple-fault localization.•A fitness function is constructed.•The parameter selection of the TRGA is reported.•The efficiency of TRGA is validated on 14 open-source programs.
This paper aims to investigate the negative effects of multiple-faults on spectrum-based fault localization (SBFL). Previously, researchers validated the fact that the occurrence of multiple-faults ...could have a significant negative impact on fault localization. However, a very little current research addresses the degree of these impacts through a systemic analysis. Furthermore, the fundamental causes underlying that negative impact have not been investigated and are not fully understood. We conducted experiments on fourteen real-life open source programs to explore and possibly solve these problems. Our results indicate that: 1) although multiple-faults generally do have a negative impact on fault localization, different fault localizations displayed various levels of robustness against that negative impact; 2) restoring pass/fail fault interactions only has a modest effect on that negative impact; 3) our investigation of twelve Fault Localization Interactions (FLI) shows that there is a dominant FLI-1 interaction in multiple-fault programs which should be responsible for that negative impact; 4) restoring FLI-1 can significantly improve the performance of both SBFL and parallel debugging techniques; and 5) furthermore, this paper practically validated the revised Kendall Tau distance as an efficient measure to help locate test cases, which have triggered FLI-1. Based on the revised Kendall Tau distance, a fast search algorithm has been suggested to locate FLI-1 test cases. It is expected that this paper can provide some insight into the fundamental causes of multiple-faults' negative impact on fault localization and drive the development of more efficient fault localization techniques to improve the identification and handling of multiple-faults.
Automatic fault localization is essential to intelligent software system. Most fault localization techniques assume the test oracle is perfect before debugging, which is hard to exist in practice. In ...fact, the test suite would contain a number of unlabelled test cases which have been proved to be useful in fault localization. However, due to the execution diversity, not all unlabelled test cases are suitable for fault localization. Selecting inappropriate unlabelled test cases can even weaken the fault localization efficiency.
To solve the problem of filtering unlabelled test cases, this work aims to construct a feasible framework to select suitable unlabelled test cases for better fault localization.
To address this issue, an entropy-based framework Efilter is constructed to filter unlabelled test cases. In Efilter, a Statement-based entropy and Testsuite-based entropy are constructed to measure the localization uncertainty of given test suite. The unlabelled test case with less Statement-based entropy or Testsuite-based entropy compared with its threshold would be selected. Further, the feature integration strategies for both Statement-based entropy and Testsuite-based entropy are given to calculate the suspiciousness of statements.
The Efilter efficiency is evaluated across 6 open-source programs and 3 spectrum-based fault localizations. The results reveal that Efilter can improve fault localization efficiency by 18.8% and 16.5% with the Statement-based entropy and the Testsuite-based entropy respectively compared with the strategy without Efilter from the perspective of EXAM score on average.
Our results indicate that the Efilter with both the Statement-based entropy and the Testsuite-based entropy can improve the fault localization in the scenario lack of test oracles, serving as an enhancement for fault localization in practice.
The aim of this study was to investigate the association between serum albumin to serum creatinine ratio (sACR) and the prognosis of heart failure (HF). In this single-center prospective cohort ...study, a total of 2625 patients with HF were enrolled between March 2012 and June 2017. All patients were divided into three groups according to the tertiles of sACR. Of 2625 patients, the mean age was 57.0 ± 14.3 years. During a median follow-up time of 23 months, 666 end point events occurred. Prognosis analysis indicated that the lowest sACR was significantly associated with higher mortality risk of HF (hazard ratio HR = 1.920, 95% confidence interval CI = 1.585-2.326, p < 0.001) when compared with the highest tertile. After adjusting for covariates including age, gender, diabetes, systolic blood pressure (SBP), diastolic blood pressure, heart rate, total cholesterol, triglycerides, HDL-C, LDL-C, white blood cell count, hemoglobin, glycosylated hemoglobin, and β-blocker use, the HRs for mortality risk of HF was 1.513 (95% CI = 1.070-2.139, p = 0.019). Subgroup analysis indicated that the mortality risk of HF statistically significantly reduced with the rise in sACR in patients with no β-blocker use, patients with serum creatine less than 97 μmol/L. However, stratification by age, sex, history of hypertension, diabetes, and smoking, level of glycosylated hemoglobin, and albumin have no obvious effect on the association between sACR and the prognosis of HF. Additionally, patients with lower sACR displayed reduced left ventricular ejection fraction and increased left ventricular end-diastolic diameter. The discriminant power of sACR alone and in combination with age, gender, SBP, heart rate, and glycosylated hemoglobin were excellent with C statistic of 0.655 and 0.889, respectively. Lower sACR was an independent risk factor for mortality risk of HF.
Aims
Apolipoproteins have been reported to be involved in many cardiovascular diseases. The aim of our study was to investigate the prognostic value of apolipoprotein B (ApoB) to apolipoprotein A‐I ...(ApoA‐I) ratio (ApoB/ApoA‐I) in patients with heart failure (HF).
Methods and results
We randomly assigned 2400 HF patients into the training cohort (n = 1400) and the validation cohort (n = 1000). Using a receiver operating characteristic curve, we identified the optimal cut‐off value of the ApoB/ApoA‐I in the training cohort as 0.69, which was further validated in the validation cohort. A propensity score matching (PSM) analysis was conducted to eliminate the imbalance in the baseline characteristics of the high and low ApoB/ApoA‐I group. A total of 2242 HF patients were generated in the PSM cohort. We also validated our results with an independent cohort (n = 838). Univariate and multivariate analyses were conducted to explore the independent prognostic value of ApoB/ApoA‐I in the training cohort (n = 1400), the validation cohort (n = 1000), the PSM cohort (n = 2242), and the independent cohort (n = 838). Patients with high ApoB/ApoA‐I ratio had significantly poorer prognosis compared with those with low ApoB/ApoA‐I ratio in the training cohort, the validation cohort, the PSM cohort, and the independent cohort (P < 0.05). Multivariate analysis indicated that the ApoB/ApoA‐I was an independent prognostic factor for HF in the training cohort hazard ratio (HR) = 1.637, 95% confidence interval (CI) = 1.201–2.231, P = 0.002, the validation cohort (HR = 1.54, 95% CI = 1.051–2.257, P = 0.027), the PSM cohort (HR = 1.645, 95% CI = 1.273–2.125, P < 0.001), and the independent cohort (HR = 1.987, 95% CI = 1.251–3.155, P = 0.004).
Conclusions
Serum ApoB/ApoA‐I ratio is an independent predictor for the prognosis of HF patients.
Imbalanced data are a major factor for degrading the performance of software defect models. Software defect dataset is imbalanced in nature, i.e., the number of non-defect-prone modules is far more ...than that of defect-prone ones, which results in the bias of classifiers on the majority class samples. In this paper, we propose a novel credibility-based imbalance boosting (CIB) method in order to address the class-imbalance problem in software defect proneness prediction. The method measures the credibility of synthetic samples based on their distribution by introducing a credit factor to every synthetic sample, and proposes a weight updating scheme to make the base classifiers focus on synthetic samples with high credibility and real samples. Experiments are performed on 11 NASA datasets and nine PROMISE datasets by comparing CIB with MAHAKIL, AdaC2, AdaBoost, SMOTE, RUS, No sampling method in terms of four performance measures, i.e., area under the curve (AUC), F1, AGF, and Matthews correlation coefficient (MCC). Wilcoxon sign-ranked test and Cliff’s δ are separately used to perform statistical test and calculate effect size. The experimental results show that CIB is a more promising alternative for addressing the class-imbalance problem in software defect-prone prediction as compared with previous methods.
Soil microorganisms are essential for crop growth and production as part of soil health. However, our current knowledge of microbial communities in tobacco soils and their impact factors is limited.
...In this study, we compared the characterization of bacterial and fungal communities in tobacco soils and their response to regional and rootstock disease differences.
The results showed that the diversity and composition of bacterial and fungal communities responded more strongly to regional differences than to rootstock diseases, while bacterial niche breadth was more sensitive than fungi to regional differences. Similarly, the core bacterial and fungal taxa shared by the three regions accounted for 21.73% and 20.62% of all OTUs, respectively, which was much lower than that shared by RD and NRD in each region, ranging from 44.87% to 62.14%. Meanwhile, the differences in topological characteristics, connectivity, and stability of microbial networks in different regions also verified the high responsiveness of microbial communities to regions. However, rootstock diseases had a more direct effect on fungal communities than regional differences.
This provided insight into the interactions between microbial communities, regional differences, and rootstock diseases, with important implications for maintaining soil health and improving tobacco yield and quality.
With the rapid development of high integrations in large complex systems, such as aircraft, satellite, and railway systems, due to the increasingly complex coupling relationship between components ...within the system, local disturbances or faults may cause global effects on the system by fault propagation. Therefore, there are new challenges in safety analysis and risk assessment for complex systems. Aiming at analyzing and evaluating the inherent risks of the complex system with coupling correlation characteristics objectively, this paper proposes a novel risk assessment and analysis method for correlation in complex system based on multi-dimensional theory. Firstly, the formal description and coupling degree analysis method of the hierarchical structure of complex systems is established. Moreover, considering the three safety risk factors of fault propagation probability, potential severity, and fault propagation time, a multi-dimensional safety risk theory is proposed, in order to evaluate the risk of each element within the system effecting on the overall system. Furthermore, critical safety elements are identified based on Pareto rules, As Low As Reasonably Practicable (ALARP) principles, and safety risk entropy to support the preventive measures. Finally, an application of an avionics system is provided to demonstrate the effectiveness of the proposed method.
The modeling method of agents and service-oriented architecture (SOA) in avionics systems describes agents and SOA in avionics systems with models. To our knowledge, however, the current modeling ...methods cannot describe the behavior of agents and SOA accurately and do not fit well with the existing avionics system models. This paper addresses the above problems by presenting a modeling method based on architecture analysis and design language (AADL). In this method, the working states of agents are described by the mode components, with the working process being triggered by the input of agents; and the services are described by the process component. The application of the software system is described by the system components that contain several process components. Moreover, different modes of the system are used to describe different applications, and the transitions of application are triggered by specific application requests. Software architecture of an avionics system is modeled by the proposed method. This case demonstrates that the proposed method can accurately describe how agents and SOA work in a new way and fit well with the existing avionics system models.