Recent outbreaks of listeria, salmonella, and other pathogens have reinforced the need for more rigorous testing of food products. Millions are spent each year testing food. Certifying the safety of ...the food is a challenging task using traditional testing methods. Current methods require long incubation times before the first results are observed and still only represent a small fraction of the food that is sold. Long analysis methods also lead to loss of consumables. 18.9 billion pounds of produce are lost a year to spoilage. A fast and effective method is needed to decrease the amount of time necessary to test the safety of food. The goal is to provide accurate sample classification as quickly as possible, thus allowing pathogen-free product to be shipped to market with the shortest delay possible. An autonomous electrochemical sensor was combined with a powerful multi-class Probabilistic Neural Network (PNN) system to classify four species of organisms (E. Coli #25922, E. Coli # 11775, S. Epidermis #12228, or C. Albicans #10231). We used an evolutionary based kernel optimization algorithm to optimize the kernel parameters, and trained the system on data sampled from four different organisms. The trained and optimized model was validated on a set containing several samples that were not used to train the network. We showed that the network was able to correctly classify unknown samples in a shorter period than the industry standard of 24hours, thus providing a potential benefit to the agriculture industry.
The objective of this research is to develop a complex adaptive piecewise linear regression/probabilistic neural network (PNN) intelligent system for the rapid detection and classification of ...Escherichia coli (E.coli). The rapid detection and classification of E.coli is important because current methods require a long period of analysis before a classification can be determined. The objective of this paper is to describe the design and preliminarily evaluate an Intelligent Decision Support System (IDSS) that will validate the following hypotheses: an intelligent decision support system (IDSS) to allow the rapid collection and classification of E.coli can be designed and preliminarily evaluated, which will significantly decrease detection and classification times for E.coli bacteria, thereby addressing the food spoilage problem. The research in this paper provides a preliminary answer to: What performance improvement percentage can be realized against the 16 to 48hours required for the conventional multistep methods of detection of microorganisms (using E.coli data as a baseline)? For the 16 hour period we have a 6.7% reduction in the time-to-detect period ((16-15)/15 × 100% = 6.7%) and for the 48 hour period we have a 220% reduction in time ((48- 15)/15×100% = 220%).
In previous work, we applied an advanced genetic algorithm method for feature subset selection combined with noise perturbation in an attempt to overcome the over-fitting that is typical with ...microarray datasets. The method was applied to a dataset from Moffitt Cancer Center and the clinical outcome to be predicted was cancer recurrence in less than 5 years. By its nature, the method yields multiple gene signatures, each as small as possible and often these signatures will share one or more genes. The question is how to combine the predictions from multiple predictors. In the previous work, we produced an ensemble prediction by a simple majority vote rule, and observed that performance on a validation set was considerably worse than on the learning set. Our conclusion was that the training and validation sets were not equally representative of the same population, and therefore could not provide reliable gene signatures. Here we report on an effort to apply a more sophisticated ensemble method, the Generalized Regression Neural network (GRNN) Oracle, but this did not allow us to reverse our original conclusion.
Breast cancer screening has reference to screening of asymptomatic, generally healthy women for breast cancer, to identify those who should receive a follow up check. Early screening can detect ...non-invasive ductal carcinoma in situ (called “pre breast cancer”), which almost never forms a lump and is generally non-detectible, except by mammography. This paper will describe the design and preliminary evaluation of this PNN/GRNN ensemble pre-screener, in the context of a possible pre-screening protocol, which may, if required, include other data. The results show that using the ensemble technique provides almost a 20% AUC increase over the average standalone PNN and almost 10% over the best performing PNN.
New advances in medicine have led to a disparity between the existing information about patients and the ability of clinicians to utilize it. Lack of training and incompatibility with clinical ...techniques has made the use of the complex adaptive systems approach difficult. To avoid this, we used statistical learning theory as an inline preprocess between existing data collection methods and clinical analysis of data. Clinicians would be able to use this system without any changes to their techniques, while improving accuracy. We used data from CT scans of patients with metastatic carcinoma to predict prognosis. Specifically, we used the standard for evaluating response to treatment, RECIST, and new qualitative and quantitative features. An Evolutionary Programming trained Support Vector Machine (EP-SVM), was used to preprocess the data for two traditional survival analysis techniques: Cox Proportional Hazard Models and Kaplan Meier curves. This was compared to Logistic Regression (LR) and using cutoff points. Analyses were also done to compare different inputs and different radiologists. The EP-SVM outperformed both LR and the cutoff method significantly and allowed us to both intelligently combine data from multiple sources and identify the most predictive features without necessitating changes in clinical methods.
“Ensemble processing” combines the results (outputs) of several different models, each “looking at” a disease from a different perspective. A number of different methods are available to support ...ensemble processing: (1) averaging, (2) weighted-averaging,(3) Adaboost, and (4) other processing methods that use gate variables in forming a “tree structure”. Gate variables are used here as an integral part of the Expectation operation in a maximum likelihood estimator. This paper presents the application of a “Generalized Regression Neural Network Predictive Model,” called the “GRNN oracle,” that takes advantage(s) of correlation(s) (synergies) that exist between intelligent predictive input model outputs by combining them (at the variance level) for generating both clinical and microarray lung cancer data to improve cancer recurrence modeling and predictive performance, when compared to any one output taken alone. The hypothesis is: Given a validation data set that contains a sufficient sample size, then the GRNN oracle will provide a synergistic combination of output data which is superior in predictive performance accuracy (as measured by an ROC analysis) when compared to all input intelligent models, taken individually. This paper will discuss the results of our work in evaluating the validity of this hypothesis.
PLS initially creates uncorrelated latent variables which are linear combinations of the original input vectors Xi, where weights are used to determine linear combinations, which are proportional to ...the covariance. Secondly, a least squares regression is then performed on the subset of extracted latent variables that lead to a lower and biased variance on transformed data. This process, leads to a lower variance estimate of the regression coefficients when compared to the Ordinary Least Squares regression approach. Classical Principal Component Analysis (PCA), linear PLS and kernel ridge regression (KRR) techniques are well known shrinkage estimators designed to deal with multi- collinearity, which can be a serious problem. That is, multi-collinearity can dramatically influence the effectiveness of a regression model by changing the values and signs of estimated regression coefficients given different but similar data samples, thereby leading to a regression model which represents training data reasonably well, but generalizes poorly to validation and test data. We explain how to address these problems, which is followed by performing a PLS hypotheses driven preliminary research study and sensitivities analysis by not doing a combinatorial analysis as PLS will eliminate the unnecessary variables using a microarray colon cancer data set. Research studies as well as preliminary results are described in the results section.
The objective of this research is to develop a prototype Clinical Decision Support System (CDSS) to aid pathologists in correctly discriminating between reactive mesothelial cells and malignant ...epithelial cells. Currently, there is great difficulty in visually discriminating between cells that are malignant and cells that are otherwise reactive to antigens present in the effusion. Features have been identified, which can correctly discriminate between benign epithelial cells and malignant epithelial cells with a validation AZ accuracy of ∼ 0.934, training AZ of ∼ 0.937. Using these features, the system trained on visually known cases was shown to find discriminating information in the feature subset of the atypical cases by examining probabilities generated from subjecting the system to atypical cells. While these results are preliminary, they do demonstrate that an intelligent CDSS, which has the potential to discriminate between reactive mesothelial cells and malignant epithelial cells, designed using newly developed and/or revised statistical learning theory (SLT) algorithms, has the potential to be used as a second opinion diagnostic aid by physicians, as they deem appropriate.
An accurate prognostic model of a cancer patient after treatment can be useful in deciding the next course of treatment or efficacy of said treatment. Gene expression microarray data has been used to ...predict survival times 1, or to classify the patient as having a good/poor prognosis 2 by predicting whether the patient belongs to the class that will have a recurrence of cancer before or after a certain period, typically 3 or 5 years. Microarrays typically contain thousands of gene expression probes and a typical study may only contain a few hundred patients or less. Typical regression techniques will fail to generalize, suffering from the ‘Curse of Dimensionality’, resulting in an over-fitted model that performs very well on the training data, and very poorly or validation data. Various feature selection/reduction methods have been used to reduce the dimensionality of the data and improve or facilitate a solution 3. Gene expression is known to be modulated by the expression of other genes, forming a so-called gene network or pathway. Furthermore, several networks may affect the aggressiveness of the cancer simultaneously 4. While past studies have selected features based on statistical methods alone 5 or have simply included ‘known cancer genes’, none to our knowledge have used classification models based on ensembles of models based on multiple known gene networks. Based on the data presented in Shedden, et. al. 6, this study uses a General Regression Neural Network (GRNN) Oracle ensemble that combines several Partial least squares (PLS) models trained to predict recurrence times from 12 different gene networks. We hypothesize that it is possible to correctly classify recurrence by combining the results based on the gene network models.