ABSTRACT
We present results from applying the SNAD anomaly detection pipeline to the third public data release of the Zwicky Transient Facility (ZTF DR3). The pipeline is composed of three stages: ...feature extraction, search of outliers with machine learning algorithms, and anomaly identification with followup by human experts. Our analysis concentrates in three ZTF fields, comprising more than 2.25 million objects. A set of four automatic learning algorithms was used to identify 277 outliers, which were subsequently scrutinized by an expert. From these, 188 (68 per cent) were found to be bogus light curves – including effects from the image subtraction pipeline as well as overlapping between a star and a known asteroid, 66 (24 per cent) were previously reported sources whereas 23 (8 per cent) correspond to non-catalogued objects, with the two latter cases of potential scientific interest (e.g. one spectroscopically confirmed RS Canum Venaticorum star, four supernovae candidates, one red dwarf flare). Moreover, using results from the expert analysis, we were able to identify a simple bi-dimensional relation that can be used to aid filtering potentially bogus light curves in future studies. We provide a complete list of objects with potential scientific application so they can be further scrutinised by the community. These results confirm the importance of combining automatic machine learning algorithms with domain knowledge in the construction of recommendation systems for astronomy. Our code is publicly available.1
Anomaly detection in the Open Supernova Catalog Pruzhinskaya, M V; Malanchev, K L; Kornilov, M V ...
Monthly notices of the Royal Astronomical Society,
08/2019, Volume:
489, Issue:
3
Journal Article
Peer reviewed
Open access
ABSTRACT
In the upcoming decade, large astronomical surveys will discover millions of transients raising unprecedented data challenges in the process. Only the use of the machine learning algorithms ...can process such large data volumes. Most of the discovered transients will belong to the known classes of astronomical objects. However, it is expected that some transients will be rare or completely new events of unknown physical nature. The task of finding them can be framed as an anomaly detection problem. In this work, we perform for the first time an automated anomaly detection analysis in the photometric data of the Open Supernova Catalog (OSC), which serves as a proof of concept for the applicability of these methods to future large-scale surveys. The analysis consists of the following steps: (1) data selection from the OSC and approximation of the pre-processed data with Gaussian processes, (2) dimensionality reduction, (3) searching for outliers with the use of the isolation forest algorithm, and (4) expert analysis of the identified outliers. The pipeline returned 81 candidate anomalies, 27 (33 per cent) of which were confirmed to be from astrophysically peculiar objects. Found anomalies correspond to a selected sample of 1.4 per cent of the initial automatically identified data sample of approximately 2000 objects. Among the identified outliers we recognized superluminous supernovae, non-classical Type Ia supernovae, unusual Type II supernovae, one active galactic nucleus and one binary microlensing event. We also found that 16 anomalies classified as supernovae in the literature are likely to be quasars or stars. Our proposed pipeline represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire. All code and products of this investigation are made publicly available.1
Aims.
We present the first piece of evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets.
Methods.
Our method follows an ...active learning strategy where the learning algorithm chooses objects that can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine learning model, allowing its accuracy to evolve with each new piece of information. For the case of anomaly detection, the algorithm aims to maximize the number of scientifically interesting anomalies presented to the expert by slightly modifying the weights of a traditional isolation forest (IF) at each iteration. In order to demonstrate the potential of such techniques, we apply the Active Anomaly Discovery algorithm to two data sets: simulated light curves from the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC) and real light curves from the Open Supernova Catalog. We compare the Active Anomaly Discovery results to those of a static IF. For both methods, we performed a detailed analysis for all objects with the ∼2% highest anomaly scores.
Results.
We show that, in the real data scenario, Active Anomaly Discovery was able to identify ∼80% more true anomalies than the IF. This result is the first piece of evidence that active anomaly detection algorithms can play a central role in the search for new physics in the era of large-scale sky surveys.
Full text
Available for:
FMFMET, NUK, UL, UM, UPUK
Context. Open clusters (OCs) are popular tracers of the structure and evolutionary history of the Galactic disc. The OC population is often considered to be complete within 1.8 kpc of the Sun. The ...recent Gaia Data Release 2 (DR2) allows the latter claim to be challenged. Aims. We perform a systematic search for new OCs in the direction of Perseus using precise and accurate astrometry from Gaia DR2. Methods. We implemented a coarse-to-fine search method. First, we exploited spatial proximity using a fast density-aware partitioning of the sky via a k-d tree in the spatial domain of Galactic coordinates, (l, b). Secondly, we employed a Gaussian mixture model in the proper motion space to tag fields quickly around OC candidates. Thirdly, we applied an unsupervised membership assignment method, UPMASK, to scrutinise the candidates. We visually inspected colour-magnitude diagrams to validate the detected objects. Finally, we performed a diagnostic to quantify the significance of each identified over-density in proper motion and in parallax space. Results. We report the discovery of 41 new stellar clusters. This represents an increment of at least 20% of the previously known OC population in this volume of the Milky Way. We also report on the clear identification of NGC 886, an object previously considered an asterism. This study challenges the previous claim of a near-complete sample of OCs up to 1.8 kpc. Our results reveal that this claim requires revision, and a complete census of nearby OCs is yet to be found.
Full text
Available for:
FMFMET, NUK, UL, UM, UPUK
Abstract
We present ∼120,000 Spitzer/IRAC candidate young stellar objects (YSOs) based on surveys of the Galactic midplane between
ℓ
∼ 255° and 110°, including the GLIMPSE I, II, and 3D, Vela-Carina, ...Cygnus X, and SMOG surveys (613 square degrees), augmented by near-infrared catalogs. We employed a classification scheme that uses the flexibility of a tailored statistical learning method and curated YSO data sets to take full advantage of Spitzer’s spatial resolution and sensitivity in the mid-infrared ∼3–9
μ
m range. Multiwavelength color/magnitude distributions provide intuition about how the classifier separates YSOs from other red IRAC sources and validate that the sample is consistent with expectations for disk/envelope-bearing pre–main-sequence stars. We also identify areas of IRAC color space associated with objects with strong silicate absorption or polycyclic aromatic hydrocarbon emission. Spatial distributions and variability properties help corroborate the youthful nature of our sample. Most of the candidates are in regions with mid-IR nebulosity, associated with star-forming clouds, but others appear distributed in the field. Using Gaia DR2 distance estimates, we find groups of YSO candidates associated with the Local Arm, the Sagittarius–Carina Arm, and the Scutum–Centaurus Arm. Candidate YSOs visible to the Zwicky Transient Facility tend to exhibit higher variability amplitudes than randomly selected field stars of the same magnitude, with many high-amplitude variables having light-curve morphologies characteristic of YSOs. Given that no current or planned instruments will significantly exceed IRAC’s spatial resolution while possessing its wide-area mapping capabilities, Spitzer-based catalogs such as ours will remain the main resources for mid-infrared YSOs in the Galactic midplane for the near future.
We describe the simulated data sample for the Photometric Large Synoptic Survey Telescope (LSST) Astronomical Time Series Classification Challenge (PLAsTiCC), a publicly available challenge to ...classify transient and variable events that will be observed by the LSST, a new facility expected to start in the early 2020s. The challenge was hosted by Kaggle, ran from 2018 September 28 to December 17, and included 1094 teams competing for prizes. Here we provide details of the 18 transient and variable source models, which were not revealed until after the challenge, and release the model libraries at https://doi.org/10.5281/zenodo.2612896. We describe the LSST Operations Simulator used to predict realistic observing conditions, and we describe the publicly available SNANA simulation code used to transform the models into observed fluxes and uncertainties in the LSST passbands (ugrizy). Although PLAsTiCC has finished, the publicly available models and simulation tools are being used within the astronomy community to further improve classification, and to study contamination in photometrically identified samples of SN Ia used to measure properties of dark energy. Our simulation framework will continue serving as a platform to improve the PLAsTiCC models, and to develop new models.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, UL, UM, UPUK
We report a framework for spectroscopic follow-up design for optimizing supernova photometric classification. The strategy accounts for the unavoidable mismatch between spectroscopic and photometric ...samples, and can be used even in the beginning of a new survey – without any initial training set. The framework falls under the umbrella of active learning (AL), a class of algorithms that aims to minimize labelling costs by identifying a few, carefully chosen, objects that have high potential in improving the classifier predictions. As a proof of concept, we use the simulated data released after the SuperNova Photometric Classification Challenge (SNPCC) and a random forest classifier. Our results show that, using only 12 per cent the number of training objects in the SNPCC spectroscopic sample, this approach is able to double purity results. Moreover, in order to take into account multiple spectroscopic observations in the same night, we propose a semisupervised batch-mode AL algorithm that selects a set of N = 5 most informative objects at each night. In comparison with the initial state using the traditional approach, our method achieves 2.3 times higher purity and comparable figure of merit results after only 180 d of observation, or 800 queries (73 per cent of the SNPCC spectroscopic sample size). Such results were obtained using the same amount of spectroscopic time necessary to observe the original SNPCC spectroscopic sample, showing that this type of strategy is feasible with current available spectroscopic resources. The code used in this work is available in the COINtoolbox.
Detectability of the first cosmic explosions de Souza, R. S; Ishida, E. E. O; Johnson, J. L ...
Monthly notices of the Royal Astronomical Society,
12/2013, Volume:
436, Issue:
2
Journal Article
Peer reviewed
Open access
We present a fully self-consistent simulation of a synthetic survey of the furthermost cosmic explosions. The appearance of the first generation of stars (Population III) in the Universe represents a ...critical point during cosmic evolution, signalling the end of the dark ages, a period of absence of light sources. Despite their importance, there is no confirmed detection of Population III stars so far. A fraction of these primordial stars are expected to die as pair-instability supernovae (PISNe), and should be bright enough to be observed up to a few hundred million years after the big bang. While the quest for Population III stars continues, detailed theoretical models and computer simulations serve as a testbed for their observability. With the upcoming near-infrared missions, estimates of the feasibility of detecting PISNe are not only timely but imperative. To address this problem, we combine state-of-the-art cosmological and radiative simulations into a complete and self-consistent framework, which includes detailed features of the observational process. We show that a dedicated observational strategy using 8 per cent of the total allocation time of the James Webb Space Telescope mission can provide us with up to ∼9-15 detectable PISNe per year.
Abstract
Next-generation surveys like the Legacy Survey of Space and Time (LSST) on the Vera C. Rubin Observatory (Rubin) will generate orders of magnitude more discoveries of transients and variable ...stars than previous surveys. To prepare for this data deluge, we developed the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC), a competition that aimed to catalyze the development of robust classifiers under LSST-like conditions of a nonrepresentative training set for a large photometric test set of imbalanced classes. Over 1000 teams participated in PLAsTiCC, which was hosted in the Kaggle data science competition platform between 2018 September 28 and 2018 December 17, ultimately identifying three winners in 2019 February. Participants produced classifiers employing a diverse set of machine-learning techniques including hybrid combinations and ensemble averages of a range of approaches, among them boosted decision trees, neural networks, and multilayer perceptrons. The strong performance of the top three classifiers on Type Ia supernovae and kilonovae represent a major improvement over the current state of the art within astronomy. This paper summarizes the most promising methods and evaluates their results in detail, highlighting future directions both for classifier development and simulation needs for a next-generation PLAsTiCC data set.