We demonstrate the ability of convolutional neural networks (CNNs) to mitigate systematics in the virial scaling relation and produce dynamical mass estimates of galaxy clusters with remarkably low ...bias and scatter. We present two models, CNN1D and CNN2D, which leverage this deep learning tool to infer cluster masses from distributions of member galaxy dynamics. Our first model, CNN1D, infers cluster mass directly from the distribution of member galaxy line-of-sight velocities. Our second model, CNN2D, extends the input space of CNN1D to learn on the joint distribution of galaxy line-of-sight velocities and projected radial distances. We train each model as a regression over cluster mass using a labeled catalog of realistic mock cluster observations generated from the MultiDark simulation and UniverseMachine catalog. We then evaluate the performance of each model on an independent set of mock observations selected from the same simulated catalog. The CNN models produce cluster mass predictions with lognormal residuals of scatter as low as 0.132 dex, greater than a factor of 2 improvement over the classical M- power-law estimator. Furthermore, the CNN model reduces prediction scatter relative to similar machine-learning approaches by up to 17% while executing in drastically shorter training and evaluation times (by a factor of 30) and producing considerably more robust mass predictions (improving prediction stability under variations in galaxy sampling rate by 30%).
We present an analysis of importance feature selection applied to photometric redshift estimation using the machine learning architecture Decision Trees with the ensemble learning routine adaboost ...(hereafter RDF). We select a list of 85 easily measured (or derived) photometric quantities (or ‘features’) and spectroscopic redshifts for almost two million galaxies from the Sloan Digital Sky Survey Data Release 10. After identifying which features have the most predictive power, we use standard artificial Neural Networks (aNNs) to show that the addition of these features, in combination with the standard magnitudes and colours, improves the machine learning redshift estimate by 18 per cent and decreases the catastrophic outlier rate by 32 per cent. We further compare the redshift estimate using RDF with those from two different aNNs, and with photometric redshifts available from the Sloan Digital Sky Survey (SDSS). We find that the RDF requires orders of magnitude less computation time than the aNNs to obtain a machine learning redshift while reducing both the catastrophic outlier rate by up to 43 per cent, and the redshift error by up to 25 per cent. When compared to the SDSS photometric redshifts, the RDF machine learning redshifts both decreases the standard deviation of residuals scaled by 1/(1+z) by 36 per cent from 0.066 to 0.041, and decreases the fraction of catastrophic outliers by 57 per cent from 2.32 to 0.99 per cent.
ABSTRACT
This work uses hierarchical logistic Gaussian processes to infer true redshift distributions of samples of galaxies, through their cross-correlations with spatially overlapping spectroscopic ...samples. We demonstrate that this method can accurately estimate these redshift distributions in a fully Bayesian manner jointly with galaxy-dark matter bias models. We forecast how systematic biases in the redshift-dependent galaxy-dark matter bias model affect redshift inference. Using published galaxy-dark matter bias measurements from the Illustris simulation, we compare these systematic biases with the statistical error budget from a forecasted weak gravitational lensing measurement. If the redshift-dependent galaxy-dark matter bias model is mis-specified, redshift inference can be biased. This can propagate into relative biases in the weak lensing convergence power spectrum on the 10–30 per cent level. We, therefore, showcase a methodology to detect these sources of error using Bayesian model selection techniques. Furthermore, we discuss the improvements that can be gained from incorporating prior information from Bayesian template fitting into the model, both in redshift prediction accuracy and in the detection of systematic modelling biases.
ABSTRACT
We present a new method to estimate redshift distributions and galaxy-dark matter bias parameters using correlation functions in a fully data driven and self-consistent manner. Unlike other ...machine learning, template, or correlation redshift methods, this approach does not require a reference sample with known redshifts. By measuring the projected cross- and auto-correlations of different galaxy sub-samples, e.g. as chosen by simple cells in colour–magnitude space, we are able to estimate the galaxy-dark matter bias model parameters, and the shape of the redshift distributions of each sub-sample. This method fully marginalizes over a flexible parametrization of the redshift distribution and galaxy-dark matter bias parameters of sub-samples of galaxies, and thus provides a general Bayesian framework to incorporate redshift uncertainty into the cosmological analysis in a data-driven, consistent, and reproducible manner. This result is improved by an order of magnitude by including cross-correlations with the cosmic microwave background and with galaxy–galaxy lensing. We showcase how this method could be applied to real galaxies. By using idealized data vectors, in which all galaxy-dark matter model parameters and redshift distributions are known, this method is demonstrated to recover unbiased estimates on important quantities, such as the offset Δz between the mean of the true and estimated redshift distribution and the 68 per cent, 95 per cent, and 99.5 per cent widths of the redshift distribution to an accuracy required by current and future surveys.
We present a new method to estimate redshift distributions and galaxy-dark matter bias parameters using correlation functions in a fully data driven and self-consistent manner. Unlike other machine ...learning, template, or correlation redshift methods, this approach does not require a reference sample with known redshifts. By measuring the projected cross- and auto-correlations of different galaxy sub-samples, e.g. as chosen by simple cells in colour–magnitude space, we are able to estimate the galaxy-dark matter bias model parameters, and the shape of the redshift distributions of each sub-sample. This method fully marginalizes over a flexible parametrization of the redshift distribution and galaxy-dark matter bias parameters of sub-samples of galaxies, and thus provides a general Bayesian framework to incorporate redshift uncertainty into the cosmological analysis in a data-driven, consistent, and reproducible manner. This result is improved by an order of magnitude by including cross-correlations with the cosmic microwave background and with galaxy–galaxy lensing. We showcase how this method could be applied to real galaxies. By using idealized data vectors, in which all galaxy-dark matter model parameters and redshift distributions are known, this method is demonstrated to recover unbiased estimates on important quantities, such as the offset Δz between the mean of the true and estimated redshift distribution and the 68 percent, 95 percent, and 99.5 percent widths of the redshift distribution to an accuracy required by current and future surveys.
We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. ...Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million ‘clean’ SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 ‘anomalous’ galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed ‘anomaly-removed’ sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80 per cent when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.
Probabilistic model for dynamic galaxy decomposition Jagvaral, Yesukhei; Campbell, Duncan; Mandelbaum, Rachel ...
Monthly Notices of the Royal Astronomical Society,
01/2022, Volume:
509, Issue:
2
Journal Article
Peer reviewed
Open access
ABSTRACT
In the era of precision cosmology and ever-improving cosmological simulations, a better understanding of different galaxy components such as bulges and discs will give us new insight into ...galactic formation and evolution. Based on the fact that the stellar populations of the constituent components of galaxies differ by their dynamical properties, we develop two simple models for galaxy decomposition using the TNG100 cosmological hydrodynamical simulation from the IllustrisTNG project. The first model uses a single dynamical parameter and can distinguish four components: thin disc, thick disc, counter-rotating disc, and bulge. The second model uses one more dynamical parameter, was defined in a probabilistic manner, and distinguishes two components: bulge and disc. We demonstrate the improved robustness of these models compared to a widely used method in literature involving cuts on the circularity parameter. The number fraction of disc-dominated galaxies at a given stellar mass obtained by our models agrees well with observations for masses exceeding log10(M*/M⊙) = 10. The galaxies classified as bulge-dominated by the second model are mostly red; however, the population classified as disc-dominated contains significant number of red galaxies alongside the blue population. The contributions of the different galaxy components to the total stellar mass budget exhibits similar trends with stellar mass compared to the observational data, although there is a quantitative disagreement at high and low masses. The Sérsic indices and half-mass radii for the bulge and disc components agree well with those of real galaxies.
This work uses hierarchical logistic Gaussian processes to infer true redshift distributions of samples of galaxies, through their cross-correlations with spatially overlapping spectroscopic samples. ...We demonstrate that this method can accurately estimate these redshift distributions in a fully Bayesian manner jointly with galaxy-dark matter bias models. We forecast how systematic biases in the redshift-dependent galaxy-dark matter bias model affect redshift inference. Using published galaxy-dark matter bias measurements from the Illustris simulation, we compare these systematic biases with the statistical error budget from a forecasted weak gravitational lensing measurement. If the redshift-dependent galaxy-dark matter bias model is mis-specified, redshift inference can be biased. This can propagate into relative biases in the weak lensing convergence power spectrum on the 10–30 per cent level. We, therefore, showcase a methodology to detect these sources of error using Bayesian model selection techniques. Furthermore, we discuss the improvements that can be gained from incorporating prior information from Bayesian template fitting into the model, both in redshift prediction accuracy and in the detection of systematic modelling biases.
ABSTRACT
Recovering credible cosmological parameter constraints in a weak lensing shear analysis requires an accurate model that can be used to marginalize over nuisance parameters describing ...potential sources of systematic uncertainty, such as the uncertainties on the sample redshift distribution n(z). Due to the challenge of running Markov chain Monte Carlo (MCMC) in the high-dimensional parameter spaces in which the n(z) uncertainties may be parametrized, it is common practice to simplify the n(z) parametrization or combine MCMC chains that each have a fixed n(z) resampled from the n(z) uncertainties. In this work, we propose a statistically principled Bayesian resampling approach for marginalizing over the n(z) uncertainty using multiple MCMC chains. We self-consistently compare the new method to existing ones from the literature in the context of a forecasted cosmic shear analysis for the HSC three-year shape catalogue, and find that these methods recover statistically consistent error bars for the cosmological parameter constraints for predicted HSC three-year analysis, implying that using the most computationally efficient of the approaches is appropriate. However, we find that for data sets with the constraining power of the full HSC survey data set (and, by implication, those upcoming surveys with even tighter constraints), the choice of method for marginalizing over n(z) uncertainty among the several methods from the literature may modify the 1σ uncertainties on Ωm–S8 constraints by ∼4 per cent, and a careful model selection is needed to ensure credible parameter intervals.