Schaffer and Land14 described a method whereby a machine intelligence (MI) process can “know what it doesn’t know.” In this paper, the concept is illustrated by three examples: the GRNN oracle ...ensemble method that combines multiple SVM classifiers for detecting Alzheimer’s type dementia using features automatically extracted from a speech sample, an Evolutionary Programming and Adaptive Boosting hybrid and a Generalized Regression Neural Network hybrid for classifying breast cancer. The authors assert it is (1) applicable quite directly to a great many other learning classifier systems, and (2) provides an intuitive approach to comparing the performance of different classifiers on a given task using the size of the “area of uncertainty” as a measure of performance metric. This paper provides support for these assertions by describing the steps needed to apply it to a previously published study of breast cancer benign / malignancy prediction, and then illustrates how this “area of uncertainty” may be computed, which is a work in progress, using the GRNN oracle results and a resultant Bayesian network from the Alzheimer’s speech research study.
A useful feature for any predictive classifier is the ability to know when its predictions are unreliable. We present a general approach that should be applicable to any learning classifier system ...that has been trained on a set of known cases. The basic idea is simple, if for some of the training cases it knows its predictions are wrong, it can assess for any new case if it lies in the vicinity of one of these “trouble-makers.” The challenge is to quantify the degree of uncertainty and define a region of unreliability around each trouble-maker case. We provide specific algorithms to address these challenges and illustrate their use in the case of a GRNN oracle ensemble classifier that predicts the presence of Alzheimer’s disease from features extracted automatically from a sample of a person’s speech. One aspect of the challenge is that the distributions of training cases in the domain feature space is often quite poor because the training data sets are often feature-rich but case-poor. We show how the t-SNE algorithm can ameliorate this problem. We also provide an algorithm that can define a region of uncertainty based on linear interpolation of the error estimates among only those training cases that are “close enough.” No human input is needed.
Bayesian networks (BNs) have classically been designed by two methods: expert approach (ask an expert for nodes and links) and data driven approach (infer them from data). An unexpected by-product of ...previous Alzheimer's / dementia research (presented at CAS2015) was yet another approach where the results of a hybrid design were used to configure a BN. A complex adaptive systems approach, (e.g. GA-SVM-oracle hybrid) can sift through the combinatorics of feature subset selection, yielding a modest set of only the most influential features. Then using known likelihoods of demographics associated to dementia, and assuming direct and independent influence of dementia upon speech features, the BN is specified. The conditional probabilities needed can be estimated with far fewer data than the traditional BN data-driven approach. Although BNs have advantages (intuitive interpretation and graceful handling of missing data) they also have challenges. We report initial implementation results that suggest the need to reduce continuous variables to discrete categories, and the still-remaining need to estimate a substantial number of conditional probabilities, remain challenges for BNs. We suggest some ways forward in the application of BNs with the objective of improving / refining Alzheimer's / dementia detection using speech.
The GRNN oracle is an optimal estimator that provides the maximum likelihood unbiased estimate by combining a series of intelligent processing results, where those estimates with the smallest ...variance are weighted most highly. It is known that if the individual predictors in the ensemble are too similar, the oracle cannot provide much improvement. We have newly observed that if the predictions are characterized by class inhomogeneities, then the oracle can be limited in its ability to compensate. For some training cases, all models might provide incorrect predictions; let us call these cases “trouble makers.” To address this problem, the oracle theory was mathematically extended to provide estimates of the sensitivity of its predictions. These sensitivities provide a basis for declaring that certain of its predictions should be treated as untrustworthy. It then has information to flag them. This paper addresses that theoretical development and applies these extensions, to toy problems, with the future objective of application to real problems of detecting dementia / Alzheimer's in speech patterns.
We hereby report the design and implementation of an Autonomous Microbial Cell Culture and Classification (AMC(3)) system for rapid detection of food pathogens. Traditional food testing methods ...require multistep procedures and long incubation period, and are thus prone to human error. AMC(3) introduces a "one click approach" to the detection and classification of pathogenic bacteria. Once the cultured materials are prepared, all operations are automatic. AMC(3) is an integrated sensor array platform in a microbial fuel cell system composed of a multi-potentiostat, an automated data collection system (Python program, Yocto Maxi-coupler electromechanical relay module) and a powerful classification program. The classification scheme consists of Probabilistic Neural Network (PNN), Support Vector Machines (SVM) and General Regression Neural Network (GRNN) oracle-based system. Differential Pulse Voltammetry (DPV) is performed on standard samples or unknown samples. Then, using preset feature extractions and quality control, accepted data are analyzed by the intelligent classification system. In a typical use, thirty-two extracted features were analyzed to correctly classify the following pathogens: Escherichia coli ATCC#25922, Escherichia coli ATCC#11775, and Staphylococcus epidermidis ATCC#12228. 85.4% accuracy range was recorded for unknown samples, and within a shorter time period than the industry standard of 24 hours.