Wastewater surveillance has emerged as a crucial public health tool for population-level pathogen surveillance. Supported by funding from the American Rescue Plan Act of 2021, the FDA's genomic ...epidemiology program, GenomeTrakr, was leveraged to sequence SARS-CoV-2 from wastewater sites across the United States. This initiative required the evaluation, optimization, development, and publication of new methods and analytical tools spanning sample collection through variant analyses. Version-controlled protocols for each step of the process were developed and published on protocols.io. A custom data analysis tool and a publicly accessible dashboard were built to facilitate real-time visualization of the collected data, focusing on the relative abundance of SARS-CoV-2 variants and sub-lineages across different samples and sites throughout the project. From September 2021 through June 2023, a total of 3,389 wastewater samples were collected, with 2,517 undergoing sequencing and submission to NCBI under the umbrella BioProject, PRJNA757291. Sequence data were released with explicit quality control (QC) tags on all sequence records, communicating our confidence in the quality of data. Variant analysis revealed wide circulation of Delta in the fall of 2021 and captured the sweep of Omicron and subsequent diversification of this lineage through the end of the sampling period. This project successfully achieved two important goals for the FDA's GenomeTrakr program: first, contributing timely genomic data for the SARS-CoV-2 pandemic response, and second, establishing both capacity and best practices for culture-independent, population-level environmental surveillance for other pathogens of interest to the FDA.
This paper serves two primary objectives. First, it summarizes the genomic and contextual data collected during a Covid-19 pandemic response project, which utilized the FDA's laboratory network, traditionally employed for sequencing foodborne pathogens, for sequencing SARS-CoV-2 from wastewater samples. Second, it outlines best practices for gathering and organizing population-level next generation sequencing (NGS) data collected for culture-free, surveillance of pathogens sourced from environmental samples.
Symmetry detection has been shown to improve various machine learning tasks.
In the context of continuous symmetry detection, current state of the art
experiments are limited to the detection of ...affine transformations. Under the
manifold assumption, we outline a framework for discovering continuous symmetry
in data beyond the affine transformation group. We also provide a similar
framework for discovering discrete symmetry. We experimentally compare our
method to an existing method known as LieGAN and show that our method is
competitive at detecting affine symmetries for large sample sizes and superior
than LieGAN for small sample sizes. We also show our method is able to detect
continuous symmetries beyond the affine group and is generally more
computationally efficient than LieGAN.
Meta learning of bounds on the Bayes classifier error Moon, Kevin R.; Hero, Alfred O.; Delouille, Veronique
2015 IEEE Signal Processing and Signal Processing Education Workshop (SP/SPE),
2015-Aug.
Conference Proceeding
Odprti dostop
Meta learning uses information from base learners (e.g. classifiers or estimators) as well as information about the learning problem to improve upon the performance of a single base learner. For ...example, the Bayes error rate of a given feature space, if known, can be used to aid in choosing a classifier, as well as in feature selection and model selection for the base classifiers and the meta classifier. Recent work in the field of f-divergence functional estimation has led to the development of simple and rapidly converging estimators that can be used to estimate various bounds on the Bayes error. We estimate multiple bounds on the Bayes error using an estimator that applies meta learning to slowly converging plug-in estimators to obtain the parametric convergence rate. We compare the estimated bounds empirically on simulated data and then estimate the tighter bounds on features extracted from an image patch analysis of sunspot continuum and magnetogram images.
Many states are implementing direct writing assessments to assess student achievement. While much literature has investigated minimizing raters' effects on writing scores, little attention has been ...given to the type of model used to prepare raters to score direct writing assessments. This study reports on an investigation that occurred in a state‐mandated writing program when a scoring anomaly became apparent once assessments were put in operation. The study indicates that using a spiral model for training raters and scoring papers results in higher mean ratings than does using a sequential model for training and scoring. Findings suggest that making decisions about cut‐scores based on pilot data has important implications for program implementation.
Distributional functionals are integrals of functionals of probability densities and include functionals such as information divergence, mutual information, and entropy. Distributional functionals ...have many applications in the fields of information theory, statistics, signal processing, and machine learning. Many existing nonparametric distributional functional estimators have either unknown convergence rates or are difficult to implement. In this thesis, we consider the problem of nonparametrically estimating functionals of distributions when only a finite population of independent and identically distributed samples are available from each of the unknown, smooth, d-dimensional distributions. We derive mean squared error (MSE) convergence rates for leave-one-out kernel density plug-in estimators and k-nearest neighbor estimators of these functionals. We then extend the theory of optimally weighted ensemble estimation to obtain estimators that achieve the parametric MSE convergence rate when the densities are sufficiently smooth. These estimators are simple to implement and do not require knowledge of the densities’ support set, in contrast with many competing estimators. The asymptotic distribution of these estimators is also derived. The utility of these estimators is demonstrated through their application to sunspot image data and neural data measured from epilepsy patients. Sunspot images are clustered by estimating the divergence between the underlying probability distributions of image pixel patches. The problem of overfitting is also addressed in both applications by performing dimensionality reduction via intrinsic dimension estimation and by benchmarking classification via Bayes error estimation.
Ensemble estimation of mutual information Moon, Kevin R.; Sricharan, Kumar; Hero, Alfred O.
2017 IEEE International Symposium on Information Theory (ISIT),
2017-June
Conference Proceeding
We derive the mean squared error convergence rates of kernel density-based plug-in estimators of mutual information measures between two multidimensional random variables X and Y for two cases: 1) X ...and Y are both continuous; 2) X is continuous and Y is discrete. Using the derived rates, we propose an ensemble estimator of these information measures for the second case by taking a weighted sum of the plug-in estimators with varied bandwidths. The resulting ensemble estimator achieves the 1 /N parametric convergence rate when the conditional densities of the continuous variables are sufficiently smooth. To the best of our knowledge, this is the first nonparametric mutual information estimator known to achieve the parametric convergence rate for this case, which frequently arises in applications (e.g. variable selection in classification). The estimator is simple to implement as it uses the solution to an offline convex optimization problem and simple plug-in estimators. Ensemble estimators that achieve the parametric rate are also derived for the first case (X and Y are both continuous) and another case: 3) X and Y may have any mixture of discrete and continuous components.
Deep learning identification models have shown promise for identifying gas plumes in Longwave IR hyperspectral images of urban scenes, particularly when a large library of gases are being considered. ...Because many gases have similar spectral signatures, it is important to properly estimate the signal from a detected plume. Typically, a scene's global mean spectrum and covariance matrix are estimated to whiten the plume's signal, which removes the background's signature from the gas signature. However, urban scenes can have many different background materials that are spatially and spectrally heterogeneous. This can lead to poor identification performance when the global background estimate is not representative of a given local background material. We use image segmentation, along with an iterative background estimation algorithm, to create local estimates for the various background materials that reside underneath a gas plume. Our method outperforms global background estimation on a set of simulated and real gas plumes. This method shows promise in increasing deep learning identification confidence, while being simple and easy to tune when considering diverse plumes.