Context. Open clusters (OCs) are popular tracers of the structure and evolutionary history of the Galactic disc. The OC population is often considered to be complete within 1.8 kpc of the Sun. The ...recent Gaia Data Release 2 (DR2) allows the latter claim to be challenged. Aims. We perform a systematic search for new OCs in the direction of Perseus using precise and accurate astrometry from Gaia DR2. Methods. We implemented a coarse-to-fine search method. First, we exploited spatial proximity using a fast density-aware partitioning of the sky via a k-d tree in the spatial domain of Galactic coordinates, (l, b). Secondly, we employed a Gaussian mixture model in the proper motion space to tag fields quickly around OC candidates. Thirdly, we applied an unsupervised membership assignment method, UPMASK, to scrutinise the candidates. We visually inspected colour-magnitude diagrams to validate the detected objects. Finally, we performed a diagnostic to quantify the significance of each identified over-density in proper motion and in parallax space. Results. We report the discovery of 41 new stellar clusters. This represents an increment of at least 20% of the previously known OC population in this volume of the Milky Way. We also report on the clear identification of NGC 886, an object previously considered an asterism. This study challenges the previous claim of a near-complete sample of OCs up to 1.8 kpc. Our results reveal that this claim requires revision, and a complete census of nearby OCs is yet to be found.
ABSTRACT
Many scientific investigations of photometric galaxy surveys require redshift estimates, whose uncertainty properties are best encapsulated by photometric redshift (photo-z) posterior ...probability density functions (PDFs). A plethora of photo-z PDF estimation methodologies abound, producing discrepant results with no consensus on a preferred approach. We present the results of a comprehensive experiment comparing 12 photo-z algorithms applied to mock data produced for The Rubin Observatory Legacy Survey of Space and Time Dark Energy Science Collaboration. By supplying perfect prior information, in the form of the complete template library and a representative training set as inputs to each code, we demonstrate the impact of the assumptions underlying each technique on the output photo-z PDFs. In the absence of a notion of true, unbiased photo-z PDFs, we evaluate and interpret multiple metrics of the ensemble properties of the derived photo-z PDFs as well as traditional reductions to photo-z point estimates. We report systematic biases and overall over/underbreadth of the photo-z PDFs of many popular codes, which may indicate avenues for improvement in the algorithms or implementations. Furthermore, we raise attention to the limitations of established metrics for assessing photo-z PDF accuracy; though we identify the conditional density estimate loss as a promising metric of photo-z PDF performance in the case where true redshifts are available but true photo-z PDFs are not, we emphasize the need for science-specific performance metrics.
ABSTRACT
We present the v1.0 release of CLMM, an open source python library for the estimation of the weak lensing masses of clusters of galaxies. CLMM is designed as a stand-alone toolkit of ...building blocks to enable end-to-end analysis pipeline validation for upcoming cluster cosmology analyses such as the ones that will be performed by the Vera C. Rubin Legacy Survey of Space and Time-Dark Energy Science Collaboration (LSST-DESC). Its purpose is to serve as a flexible, easy-to-install, and easy-to-use interface for both weak lensing simulators and observers and can be applied to real and mock data to study the systematics affecting weak lensing mass reconstruction. At the core of CLMM are routines to model the weak lensing shear signal given the underlying mass distribution of galaxy clusters and a set of data operations to prepare the corresponding data vectors. The theoretical predictions rely on existing software, used as backends in the code, that have been thoroughly tested and cross-checked. Combined theoretical predictions and data can be used to constrain the mass distribution of galaxy clusters as demonstrated in a suite of example Jupyter Notebooks shipped with the software and also available in the extensive online documentation.
Modern galaxy surveys produce redshift probability density functions (PDFs) in addition to traditional photometric redshift (photo-z) point estimates. However, the storage of photo-z PDFs may present ...a challenge with increasingly large catalogs, as we face a trade-off between the accuracy of subsequent science measurements and the limitation of finite storage resources. This paper presents qp, a Python package for manipulating parameterizations of one-dimensional PDFs, as suitable for photo-z PDF compression. We use qp to investigate the performance of three simple PDF storage formats (quantiles, samples, and step functions) as a function of the number of stored parameters on two realistic mock data sets, representative of upcoming surveys with different data qualities. We propose some best practices for choosing a photo-z PDF approximation scheme and demonstrate the approach on a science case using performance metrics on both ensembles of individual photo-z PDFs and an estimator of the overall redshift distribution function. We show that both the properties of the set of PDFs we wish to approximate and the fidelity metric(s) chosen affect the optimal parameterization. Additionally, we find that quantiles and samples outperform step functions, and we encourage further consideration of these formats for PDF approximation.
Classification of transient and variable light curves is an essential step in using astronomical observations to develop an understanding of the underlying physical processes from which they arise. ...However, upcoming deep photometric surveys, including the Large Synoptic Survey Telescope (LSST), will produce a deluge of low signal-to-noise data for which traditional type estimation procedures are inappropriate. Probabilistic classification is more appropriate for such data but is incompatible with the traditional metrics used on deterministic classifications. Furthermore, large survey collaborations like LSST intend to use the resulting classification probabilities for diverse science objectives, indicating a need for a metric that balances a variety of goals. We describe the process used to develop an optimal performance metric for an open classification challenge that seeks to identify probabilistic classifiers that can serve many scientific interests. The Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC) aims to identify promising techniques for obtaining classification probabilities of transient and variable objects by engaging a broader community beyond astronomy. Using mock classification probability submissions emulating realistically complex archetypes of those anticipated of PLAsTiCC, we compare the sensitivity of two metrics of classification probabilities under various weighting schemes, finding that both yield results that are qualitatively consistent with intuitive notions of classification performance. We thus choose as a metric for PLAsTiCC a weighted modification of the cross-entropy because it can be meaningfully interpreted in terms of information content. Finally, we propose extensions of our methodology to ever more complex challenge goals and suggest some guiding principles for approaching the choice of a metric of probabilistic data products.
Abstract
Next-generation surveys like the Legacy Survey of Space and Time (LSST) on the Vera C. Rubin Observatory (Rubin) will generate orders of magnitude more discoveries of transients and variable ...stars than previous surveys. To prepare for this data deluge, we developed the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC), a competition that aimed to catalyze the development of robust classifiers under LSST-like conditions of a nonrepresentative training set for a large photometric test set of imbalanced classes. Over 1000 teams participated in PLAsTiCC, which was hosted in the Kaggle data science competition platform between 2018 September 28 and 2018 December 17, ultimately identifying three winners in 2019 February. Participants produced classifiers employing a diverse set of machine-learning techniques including hybrid combinations and ensemble averages of a range of approaches, among them boosted decision trees, neural networks, and multilayer perceptrons. The strong performance of the top three classifiers on Type Ia supernovae and kilonovae represent a major improvement over the current state of the art within astronomy. This paper summarizes the most promising methods and evaluates their results in detail, highlighting future directions both for classifier development and simulation needs for a next-generation PLAsTiCC data set.
We present a Bayesian approach to the redshift classification of emission-line galaxies when only a single emission line is detected spectroscopically. We consider the case of surveys for ...high-redshift Ly -emitting galaxies (LAEs), which have traditionally been classified via an inferred rest-frame equivalent width (EW; ) greater than 20 Å. Our Bayesian method relies on known prior probabilities in measured emission-line luminosity functions and EW distributions for the galaxy populations, and returns the probability that an object in question is an LAE given the characteristics observed. This approach will be directly relevant for the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), which seeks to classify ∼106 emission-line galaxies into LAEs and low-redshift emitters. For a simulated HETDEX catalog with realistic measurement noise, our Bayesian method recovers 86% of LAEs missed by the traditional > 20 Å cutoff over 2 < z < 3, outperforming the EW cut in both contamination and incompleteness. This is due to the method's ability to trade off between the two types of binary classification error by adjusting the stringency of the probability requirement for classifying an observed object as an LAE. In our simulations of HETDEX, this method reduces the uncertainty in cosmological distance measurements by 14% with respect to the EW cut, equivalent to recovering 29% more cosmological information. Rather than using binary object labels, this method enables the use of classification probabilities in large-scale structure analyses. It can be applied to narrowband emission-line surveys as well as upcoming large spectroscopic surveys including Euclid and WFIRST.
It is well known in astronomy that propagating non-Gaussian prediction uncertainty in photometric redshift estimates is key to reducing bias in downstream cosmological analyses. Similarly, ...likelihood-free inference approaches, which are beginning to emerge as a tool for cosmological analysis, require a characterization of the full uncertainty landscape of the parameters of interest given observed data. However, most machine learning (ML) or training-based methods with open-source software target point prediction or classification, and hence fall short in quantifying uncertainty in complex regression and parameter inference settings such as the applications mentioned above. As an alternative to methods that focus on predicting the response (or parameters) y from features x, we provide nonparametric conditional density estimation (CDE) tools for approximating and validating the entire probability density function (PDF) p(y|x) of y given (i.e., conditional on) x. This density approach offers a more nuanced accounting of uncertainty in situations with, e.g., nonstandard error distributions and multimodal or heteroskedastic response variables that are often present in astronomical data sets. As there is no one-size-fits-all CDE method, and the ultimate choice of model depends on the application and the training sample size, the goal of this work is to provide a comprehensive range of statistical tools and open-source software for nonparametric CDE and method assessment which can accommodate different types of settings – involving, e.g., mixed-type input from multiple sources, functional data, and images – and which in addition can easily be fit to the problem at hand. Specifically, we introduce four CDE software packages in Python and R based on ML prediction methods adapted and optimized for CDE: NNKCDE, RFCDE, FlexCode, and DeepCDE. Furthermore, we present the cdetools package with evaluation metrics. This package includes functions for computing a CDE loss function for tuning and assessing the quality of individual PDFs, together with diagnostic functions that probe the population-level performance of the PDFs. We provide sample code in Python and R as well as examples of applications to photometric redshift estimation and likelihood-free cosmological inference via CDE.
The Hobby-Eberly Dark Energy Experiment pilot survey identified 284 OII lambda3727 emitting galaxies in a 169 arcmin super(2) field of sky in the redshift range 0 < z < 0.57. This line flux limited ...sample provides a bridge between studies in the local universe and higher-redshift OII surveys. We present an analysis of the star formation rates (SFRs) of these galaxies as a function of stellar mass as determined via spectral energy distribution fitting. The OII emitters fall on the "main sequence" of star-forming galaxies with SFR decreasing at lower masses and redshifts. However, the slope of our relation is flatter than that found for most other samples, a result of the metallicity dependence of the OII star formation rate indicator. The mass-specific SFR is higher for lower mass objects, supporting the idea that massive galaxies formed more quickly and efficiently than their lower mass counterparts. This is confirmed by the fact that the equivalent widths of the OII emission lines trend smaller with larger stellar mass. Examination of the morphologies of the OII emitters reveals that their star formation is not a result of mergers, and the galaxies' half-light radii do not indicate evolution of physical sizes.
We compare the Hβ line strengths of 1.90 < z < 2.35 star-forming galaxies observed with the near-IR grism of the Hubble Space Telescope with ground-based measurements of Lyα from the HETDEX Pilot ...Survey and narrow-band imaging. By examining the line ratios of 73 galaxies, we show that most star-forming systems at this epoch have a Lyα escape fraction below ∼6%. We confirm this result by using stellar reddening to estimate the effective logarithmic extinction of the Hβ emission line (c {sub Hβ} = 0.5) and measuring both the Hβ and Lyα luminosity functions in a ∼100, 000 Mpc{sup 3} volume of space. We show that in our redshift window, the volumetric Lyα escape fraction is at most 4.4{sub −1.2}{sup +2.1}%, with an additional systematic ∼25% uncertainty associated with our estimate of extinction. Finally, we demonstrate that the bulk of the epoch's star-forming galaxies have Lyα emission line optical depths that are significantly greater than that for the underlying UV continuum. In our predominantly O III λ5007-selected sample of galaxies, resonant scattering must be important for the escape of Lyα photons.