Predictive models leverage the relationships between environmental factors and river health to predict the river health at unmonitored sites. Such models should be generalizable to unseen data. Among ...various machine learning models, heterogeneous ensembles are known to be generalizable owing to their structural diversity. The present study compares the generalizability of heterogeneous ensembles with those of homogeneous ensembles and single models. The models classified five grades (very good to very poor) of river health indices (RHIs) for three taxa (benthic macroinvertebrates, fish, and diatoms) given various environmental factors (water quality, hydrology, meteorological, land cover, and stream properties) as inputs. The data were monitored at 2915 sites in the four major river watersheds in South Korea during the 2016–2021 period. The results indicated better generalizability of the heterogeneous and homogeneous ensembles than single models. Moreover, heterogeneous ensembles tended to show higher generalizability than homogeneous ensembles, although the differences were marginal. Weighted soft voting was the most generalizable of the heterogeneous ensembles, with losses ranging from 0.49 to 0.59 across the three taxa. Weighted soft voting also delivered acceptable classification performance on the test set, with accuracies ranging from 0.42 to 0.52 across the taxa. The relative contributions of the environmental factors to RHI predictions and the directions of their effects agreed with established knowledge, confirming the reliability of the predictions. However, as heterogeneous ensembles have been rarely applied to RHI prediction, the extent to which heterogeneous ensembles improve the generalizability of prediction must be investigated in future studies.
Display omitted
•Grades of river health indices were predicted using machine learning models.•Generalizability was evaluated based on variance–bias decomposition.•Ensemble models showed better generalizability than single models.•Weighted soft voting was the most generalizable of the ensembles.•Important environmental factors were identified using a post-hoc method.
Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled ...information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets. In this article, we focus on the problem of learning with noisy labels and introduce compression inductive bias to network architectures to alleviate this overfitting problem. More precisely, we revisit one classical regularization named Dropout and its variant Nested Dropout. Dropout can serve as a compression constraint for its feature dropping mechanism, while Nested Dropout further learns ordered feature representations with respect to feature importance. Moreover, the trained models with compression regularization are further combined with co-teaching for performance boost. Theoretically, we conduct bias variance decomposition of the objective function under compression regularization. We analyze it for both single model and co-teaching. This decomposition provides three insights: 1) it shows that overfitting is indeed an issue in learning with noisy labels; 2) through an information bottleneck formulation, it explains why the proposed feature compression helps in combating label noise; and 3) it gives explanations on the performance boost brought by incorporating compression regularization into co-teaching. Experiments show that our simple approach can have comparable or even better performance than the state-of-the-art methods on benchmarks with real-world label noise including Clothing1M and ANIMAL-10N. Our implementation is available at https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/ .
Neural network ensemble is a learning paradigm where many neural networks are jointly used to solve a problem. In this paper, the relationship between the ensemble and its component neural networks ...is analyzed from the context of both regression and classification, which reveals that it may be better to ensemble
many instead of
all of the neural networks at hand. This result is interesting because at present, most approaches ensemble
all the available neural networks for prediction. Then, in order to show that the appropriate neural networks for composing an ensemble can be effectively selected from a set of available neural networks, an approach named GASEN is presented. GASEN trains a number of neural networks at first. Then it assigns random weights to those networks and employs genetic algorithm to evolve the weights so that they can characterize to some extent the fitness of the neural networks in constituting an ensemble. Finally it selects some neural networks based on the evolved weights to make up the ensemble. A large empirical study shows that, compared with some popular ensemble approaches such as Bagging and Boosting, GASEN can generate neural network ensembles with far smaller sizes but stronger generalization ability. Furthermore, in order to understand the working mechanism of GASEN, the bias-variance decomposition of the error is provided in this paper, which shows that the success of GASEN may lie in that it can significantly reduce the bias as well as the variance.
In statistical prediction, classical approaches for model selection and model evaluation based on covariance penalties are still widely used. Most of the literature on this topic is based on what we ...call the "Fixed-X" assumption, where covariate values are assumed to be nonrandom. By contrast, it is often more reasonable to take a "Random-X" view, where the covariate values are independently drawn for both training and prediction. To study the applicability of covariance penalties in this setting, we propose a decomposition of Random-X prediction error in which the randomness in the covariates contributes to both the bias and variance components. This decomposition is general, but we concentrate on the fundamental case of ordinary least-squares (OLS) regression. We prove that in this setting the move from Fixed-X to Random-X prediction results in an increase in both bias and variance. When the covariates are normally distributed and the linear model is unbiased, all terms in this decomposition are explicitly computable, which yields an extension of Mallows' Cp that we call RCp. RCp also holds asymptotically for certain classes of nonnormal covariates. When the noise variance is unknown, plugging in the usual unbiased estimate leads to an approach that we call
, which is closely related to Sp, and generalized cross-validation (GCV). For excess bias, we propose an estimate based on the "shortcut-formula" for ordinary cross-validation (OCV), resulting in an approach we call RCp
+
. Theoretical arguments and numerical simulations suggest that RCp
+
is typically superior to OCV, though the difference is small. We further examine the Random-X error of other popular estimators. The surprising result we get for ridge regression is that, in the heavily regularized regime, Random-X variance is smaller than Fixed-X variance, which can lead to smaller overall Random-X error. Supplementary materials for this article are available online.
One of the worst illnesses in the world, melanoma has the potential to spread to many body sites if it is not detected early. Because of this, the use of automated diagnostic tools that may help ...doctors and even laypeople identify a certain illness has resulted in a huge advancement in the medical field. Create a hybrid method for analyzing suspicious lesions and melanoma skin cancer detection. The current scores' performances are heavily reliant on fine-tuned settings and architectures. Even in machine learning and computer vision studies, dynamic data augmentation studies are limited. This work proposes dynamic training/testing enhancements that can greatly enhance effectiveness of proposals. Traditional search strategies that require training new models for augmentations consume more GPU time than this work's proposed framework. The EOA (ensemble optimization algorithm), which does not require model training for every new augmentation technique, accelerates search. The effectiveness of this technique is evaluated against single and ensemble models using the ISIC dataset. Moreover, Efficient Net, a new, compact network design, serves as the system's backbone. This approach yields greater results, and this research also reveals the sought augmentation strategy that was used, which calls for an exceptional amount of resources. So, to enhance performance, other researchers may make advantage of the augmentation strategies found.
Genetic programming (GP) is a common method for performing symbolic regression that relies on the use of ephemeral random constants in order to adequately scale predictions. Suitable values for these ...constants must be drawn from appropriate, but typically unknown, distributions for the problem being modeled. While rarely used with GP, <inline-formula> <tex-math notation="LaTeX">Z </tex-math></inline-formula>-score standardization of feature and response spaces often significantly improves the predictive performance of GP by removing scale issues and reducing error due to bias. However, in some cases it is also associated with erratic error due to variance. This article demonstrates that this variance component increases in the presence of gaps at the boundaries of the training data explanatory variable intervals. An initial solution to this problem is proposed that augments training data with pseudo instances located at the boundaries of the intervals. When applied to benchmark problems, particularly with small training samples, this solution reduces error due to variance and, therefore, total error. Augmentation is shown to also stabilize error in larger problems; however, results suggest that standardized GP works well on such problems with little need for training data augmentation.
When choosing the optimal complexity of the method for constructing decision functions, an important tool is the decomposition of the quality criterion into bias and variance. It is generally assumed ...(and in practice this is most often true) that with increasing complexity of the method, the bias component monotonically decreases, and the variance component increases. The conducted research shows that in some cases this behavior is violated. In this paper, we obtain an expression for the variance component for the kNN method for the linear regression problem in the formulation when the “explanatory” features are random variables. In contrast to the well-known result obtained for non-random “explanatory” variables, in the considered case, the variance may increase with the growth of .
An error function can be used to select between candidate models but it does not provide a thorough understanding of the behavior of a model. A greater understanding of an algorithm can be obtained ...by performing a bias-variance decomposition. Splitting the error into bias and variance is effective for understanding a deterministic algorithm such as <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>-nearest neighbor, which provides the same predictions when performed multiple times using the same data. However, simply splitting the error into bias and variance is not sufficient for nondeterministic algorithms, such as genetic programming (GP), which potentially produces a different model each time it is run, even when using the same data. This article presents an extended bias-variance decomposition that decomposes error into bias, external variance (error attributable to limited sampling of the problem), and internal variance (error due to random actions performed in the algorithm itself). This decomposition is applied to GP to expose the three components of error, providing a unique insight into the role of maximum tree depth, number of generations, size/complexity of function set, and data standardization in influencing predictive performance. The proposed tool can be used to inform targeted improvements for reducing specific components of model error.
NIR spectroscopy is a non-destructive characterization tool for the blend uniformity (BU) assessment. However, NIR spectra of powder blends often contain overlapping physical and chemical information ...of the samples. Deconvoluting the information related to chemical properties from that associated with the physical effects is one of the major objectives of this work. We achieve this aim in two ways. Firstly, we identified various sources of variability that might affect the BU results. Secondly, we leverage the machine learning-based sophisticated data analytics processes. To accomplish the aforementioned objectives, calibration samples of amlodipine as an active pharmaceutical ingredient (API) with the concentrations ranging between 67 and 133% w/w (dose ~ 3.6% w/w), in powder blends containing excipients, were prepared using a gravimetric approach and assessed using NIR spectroscopic analysis, followed by HPLC measurements. The bias in NIR results was investigated by employing data quality metrics (DQM) and bias-variance decomposition (BVD). To overcome the bias, the clustered regression (non-parametric and linear) was applied. We assessed the model’s performance by employing the hold-out and k-fold internal cross-validation (CV). NIR-based blend homogeneity with low mean absolute error and an interval estimates of 0.674 (mean) ± 0.218 (standard deviation) w/w was established. Additionally, bootstrapping-based CV was leveraged as part of the NIR method lifecycle management that demonstrated the mean absolute error (MAE) of BU ± 3.5% w/w and BU ± 1.5% w/w for model generalizability and model transferability, respectively. A workflow integrating machine learning to NIR spectral analysis was established and implemented.
Graphical Abstract
Impact of various data learning approaches on NIR spectral data