Boosting has been shown to improve the performance of classifiers in many situations, including when data is imbalanced. There are, however, two possible implementations of boosting, and it is ...unclear which should be used. Boosting by reweighting is typically used, but can only be applied to base learners which are designed to handle example weights. On the other hand, boosting by resampling can be applied to any base learner. In this work, we empirically evaluate the differences between these two boosting implementations using imbalanced training data. Using 10 boosting algorithms, 4 learners and 15 datasets, we find that boosting by resampling performs as well as, or significantly better than, boosting by reweighting (which is often the default boosting implementation). We therefore conclude that in general, boosting by resampling is preferred over boosting by weighting.
We report on the measurement of the \(^{7}\)Be(\(n, p\))\(^{7}\)Li cross section from thermal to approximately 325 keV neutron energy, performed in the high-flux experimental area (EAR2) of the n_TOF ...facility at CERN. This reaction plays a key role in the lithium yield of the Big Bang Nucleosynthesis (BBN) for standard cosmology. The only two previous time-of-flight measurements performed on this reaction did not cover the energy window of interest for BBN, and showed a large discrepancy between each other. The measurement was performed with a Si-telescope, and a high-purity sample produced by implantation of a \(^{7}\)Be ion beam at the ISOLDE facility at CERN. While a significantly higher cross section is found at low-energy, relative to current evaluations, in the region of BBN interest the present results are consistent with the values inferred from the time-reversal \(^{7}\)Li(\(p, n\))\(^{7}\)Be reaction, thus yielding only a relatively minor improvement on the so-called Cosmological Lithium Problem (CLiP). The relevance of these results on the near-threshold neutron production in the p+\(^{7}\)Li reaction is also discussed.
Two common challenges data mining and machine learning practitioners face in many application domains are unequal classification costs and class imbalance. Most traditional data mining techniques ...attempt to maximize overall accuracy rather than minimize cost. When data is imbalanced, such techniques result in models that highly favor the over represented class, the class which typically carries a lower cost of misclassification. Two techniques that have been used to address both of these issues are cost sensitive learning and data sampling. In this work, we investigate the performance of two cost sensitive learning techniques and four data sampling techniques for minimizing classification costs when data is imbalanced. We present a comprehensive suite of experiments, utilizing 15 datasets with 10 cost ratios, which have been carefully designed to ensure conclusive, significant and reliable results.
Learning with limited minority class data Khoshgoftaar, T.M.; Seiffert, C.; Van Hulse, J. ...
Sixth International Conference on Machine Learning and Applications (ICMLA 2007),
2007-Dec.
Conference Proceeding
A practical problem in data mining and machine learning is the limited availability of data. For example, in a binary classification problem it is often the case that examples of one class are ...abundant, while examples of the other class are in short supply. Examples from one class, typically the positive class, can be limited due to the financial cost or time required to collect these examples. This work presents a comprehensive empirical study of learning when examples from one class are extremely rare, but examples of the other class(es) are plentiful. Specifically, we address the issue of how many examples from the abundant class should be used when training a classifier on data where one class is very rare. Nearly one million classifiers were built and evaluated to generate the results presented in this work. Our results demonstrate that the often used 'even distribution' is not optimal when dealing with such rare events.
Mining Data with Rare Events: A Case Study Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J. ...
19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007),
2007-Oct., Letnik:
2
Conference Proceeding
The performance of classification models can be negatively impacted if the data on which they are trained contains very rare events. While recent research has investigated the issue of class ...imbalance, few if any studies address issues related to the handling of extreme imbalance (rare events), where the minority class can account for as little as 0.1% of the training data. This work investigates the effect of dataset size and class distribution on classification performance when examples from the minority class are rare. In addition, we compare the performance improvement achieved by acquiring additional examples to that of applying data sampling. Our results demonstrate that data sampling is very effective at alleviating the problem of rare events.
The resonance ionization laser ion source (RILIS) is the principal ion source of the ISOLDE radioactive beam facility based at CERN. Using the method of in-source resonance ionization spectroscopy, ...an optimal three-step, three-resonance photo-ionization scheme has been developed for chromium. The scheme uses an ionizing transition to one of the 14 newly observed autoionizing states. This work increases the range of ISOLDE-RILIS ionized beams to 32 chemical elements. Details of the spectroscopic studies are described and the new ionization scheme is summarized. A link to the complete version of this document will be added here following publication: