Abstract
We introduce the Automatic Learning for the Rapid Classification of Events (ALeRCE) broker, an astronomical alert broker designed to provide a rapid and self-consistent classification of ...large etendue telescope alert streams, such as that provided by the Zwicky Transient Facility (ZTF) and, in the future, the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). ALeRCE is a Chilean-led broker run by an interdisciplinary team of astronomers and engineers working to become intermediaries between survey and follow-up facilities. ALeRCE uses a pipeline that includes the real-time ingestion, aggregation, cross-matching, machine-learning (ML) classification, and visualization of the ZTF alert stream. We use two classifiers: a stamp-based classifier, designed for rapid classification, and a light curve–based classifier, which uses the multiband flux evolution to achieve a more refined classification. We describe in detail our pipeline, data products, tools, and services, which are made public for the community (see
https://alerce.science
). Since we began operating our real-time ML classification of the ZTF alert stream in early 2019, we have grown a large community of active users around the globe. We describe our results to date, including the real-time processing of 1.5 × 10
8
alerts, the stamp classification of 3.4 × 10
7
objects, the light-curve classification of 1.1 × 10
6
objects, the report of 6162 supernova candidates, and different experiments using LSST-like alert streams. Finally, we discuss the challenges ahead in going from a single stream of alerts such as ZTF to a multistream ecosystem dominated by LSST.
ABSTRACT We present the first results of the High Cadence Transient Survey (HiTS), a survey for which the objective is to detect and follow-up optical transients with characteristic timescales from ...hours to days, especially the earliest hours of supernova (SN) explosions. HiTS uses the Dark Energy Camera and a custom pipeline for image subtraction, candidate filtering and candidate visualization, which runs in real-time to be able to react rapidly to the new transients. We discuss the survey design, the technical challenges associated with the real-time analysis of these large volumes of data and our first results. In our 2013, 2014, and 2015 campaigns, we detected more than 120 young SN candidates, but we did not find a clear signature from the short-lived SN shock breakouts (SBOs) originating after the core collapse of red supergiant stars, which was the initial science aim of this survey. Using the empirical distribution of limiting magnitudes from our observational campaigns, we measured the expected recovery fraction of randomly injected SN light curves, which included SBO optical peaks produced with models from Tominaga et al. (2011) and Nakar & Sari (2010). From this analysis, we cannot rule out the models from Tominaga et al. (2011) under any reasonable distributions of progenitor masses, but we can marginally rule out the brighter and longer-lived SBO models from Nakar & Sari (2010) under our best-guess distribution of progenitor masses. Finally, we highlight the implications of this work for future massive data sets produced by astronomical observatories, such as LSST.
ABSTRACT
Machine learning has achieved an important role in the automatic classification of variable stars, and several classifiers have been proposed over the last decade. These classifiers have ...achieved impressive performance in several astronomical catalogues. However, some scientific articles have also shown that the training data therein contain multiple sources of bias. Hence, the performance of those classifiers on objects not belonging to the training data is uncertain, potentially resulting in the selection of incorrect models. Besides, it gives rise to the deployment of misleading classifiers. An example of the latter is the creation of open-source labelled catalogues with biased predictions. In this paper, we develop a method based on an informative marginal likelihood to evaluate variable star classifiers. We collect deterministic rules that are based on physical descriptors of RR Lyrae stars, and then, to mitigate the biases, we introduce those rules into the marginal likelihood estimation. We perform experiments with a set of Bayesian logistic regressions, which are trained to classify RR Lyraes, and we found that our method outperforms traditional non-informative cross-validation strategies, even when penalized models are assessed. Our methodology provides a more rigorous alternative to assess machine learning models using astronomical knowledge. From this approach, applications to other classes of variable stars and algorithmic improvements can be developed.
Context. In the last six years, the VISTA Variable in the Vía Láctea (VVV) survey mapped 562 sq. deg. across the bulge and southern disk of the Galaxy. However, a detailed study of these regions, ...which includes ~36 globular clusters (GCs) and thousands of open clusters is by no means an easy challenge. High differential reddening and severe crowding along the line of sight makes highly hamper to reliably distinguish stars belonging to different populations and/or systems. Aims. The aim of this study is to separate stars that likely belong to the Galactic GC NGC 6544 from its surrounding field by means of proper motion (PM) techniques. Methods. This work was based upon a new astrometric reduction method optimized for images of the VVV survey. Results. PSF-fitting photometry over the six years baseline of the survey allowed us to obtain a mean precision of ~0.51 mas yr-1, in each PM coordinate, for stars with Ks< 15 mag. In the area studied here, cluster stars separate very well from field stars, down to the main sequence turnoff and below, allowing us to derive for the first time the absolute PM of NGC 6544. Isochrone fitting on the clean and differential reddening corrected cluster color magnitude diagram yields an age of ~11−13 Gyr, and metallicity Fe/H =−1.5 dex, in agreement with previous studies restricted to the cluster core. We were able to derive the cluster orbit assuming an axisymmetric model of the Galaxy and conclude that NGC 6544 is likely a halo GC. We have not detected tidal tail signatures associated to the cluster, but a remarkable elongation in the galactic center direction has been found. The precision achieved in the PM determination also allows us to separate bulge stars from foreground disk stars, enabling the kinematical selection of bona fide bulge stars across the whole survey area. Conclusions. Kinematical techniques are a fundamental step toward disentangling different stellar populations that overlap in a studied field. Our results show that VVV data is perfectly suitable for this kind of analysis.
Abstract
We report the observations of solar system objects during the 2015 campaign of the High cadence Transient Survey (HiTS). We found 5740 bodies (mostly Main Belt asteroids), 1203 of which were ...detected in different nights and in
g
′ and
r
′. Objects were linked in the barycenter system and their orbital parameters were computed assuming Keplerian motion. We identified 6 near Earth objects, 1738 Main Belt asteroids and 4 Trans-Neptunian objects. We did not find a
g
′−
r
′ color–size correlation for 14 <
H
g
′
< 18 (1 <
D
< 10 km) asteroids. We show asteroids’ colors are disturbed by HiTS’ 1.6 hr cadence and estimate that observations should be separated by at most 14 minutes to avoid confusion in future wide-field surveys like LSST. The size distribution for the Main Belt objects can be characterized as a simple power law with slope ∼0.9, steeper than in any other survey, while data from the 2014 HiTS campaign has a distribution consistent with previous ones (slopes ∼0.68 at the bright end and ∼0.34 at the faint end). This difference is likely due to the ecliptic distribution of the Main Belt since the 2015 campaign surveyed farther from the ecliptic than did 2014's and most previous surveys.
Abstract
We present the first version of the Automatic Learning for the Rapid Classification of Events (ALeRCE) broker light curve classifier. ALeRCE is currently processing the Zwicky Transient ...Facility (ZTF) alert stream, in preparation for the Vera C. Rubin Observatory. The ALeRCE light curve classifier uses variability features computed from the ZTF alert stream and colors obtained from AllWISE and ZTF photometry. We apply a balanced random forest algorithm with a two-level scheme where the top level classifies each source as periodic, stochastic, or transient, and the bottom level further resolves each of these hierarchical classes among 15 total classes. This classifier corresponds to the first attempt to classify multiple classes of stochastic variables (including core- and host-dominated active galactic nuclei, blazars, young stellar objects, and cataclysmic variables) in addition to different classes of periodic and transient sources, using real data. We created a labeled set using various public catalogs (such as the Catalina Surveys and Gaia DR2 variable stars catalogs, and the Million Quasars catalog), and we classify all objects with ≥6
g
-band or ≥6
r
-band detections in ZTF (868,371 sources as of 2020 June 9), providing updated classifications for sources with new alerts every day. For the top level we obtain macro-averaged precision and recall scores of 0.96 and 0.99, respectively, and for the bottom level we obtain macro-averaged precision and recall scores of 0.57 and 0.76, respectively. Updated classifications from the light curve classifier can be found at the ALeRCE Explorer website (
http://alerce.online
).
Aims.
We present a variability-, color-, and morphology-based classifier designed to identify multiple classes of transients and persistently variable and non-variable sources from the Zwicky ...Transient Facility (ZTF) Data Release 11 (DR11) light curves of extended and point sources. The main motivation to develop this model was to identify active galactic nuclei (AGN) at different redshift ranges to be observed by the 4MOST Chilean AGN/Galaxy Evolution Survey (ChANGES). That being said, it also serves as a more general time-domain astronomy study.
Methods.
The model uses nine colors computed from CatWISE and Pan-STARRS1 (PS1), a morphology score from PS1, and 61 single-band variability features computed from the ZTF DR11
g
and
r
light curves. We trained two versions of the model, one for each ZTF band, since ZTF DR11 treats the light curves observed in a particular combination of field, filter, and charge-coupled device (CCD) quadrant independently. We used a hierarchical local classifier per parent node approach-where each node is composed of a balanced random forest model. We adopted a taxonomy with 17 classes: non-variable stars, non-variable galaxies, three transients (SNIa, SN-other, and CV/Nova), five classes of stochastic variables (lowz-AGN, midz-AGN, highz-AGN, Blazar, and YSO), and seven classes of periodic variables (LPV, EA, EB/EW, DSCT, RRL, CEP, and Periodic-other).
Results.
The macro-averaged precision, recall, and F1-score are 0.61, 0.75, and 0.62 for the
g
-band model, and 0.60, 0.74, and 0.61, for the
r
-band model. When grouping the four AGN classes (lowz-AGN, midz-AGN, highz-AGN, and Blazar) into one single class, its precision-recall, and F1-score are 1.00, 0.95, and 0.97, respectively, for both the
g
and
r
bands. This demonstrates the good performance of the model in classifying AGN candidates. We applied the model to all the sources in the ZTF/4MOST overlapping sky (−28 ≤ Dec ≤ 8.5), avoiding ZTF fields that cover the Galactic bulge (|
gal_b
| ≤ 9 and
gal_l
≤ 50). This area includes 86 576 577 light curves in the
g
band and 140 409 824 in the
r
band with 20 or more observations and with an average magnitude in the corresponding band lower than 20.5. Only 0.73% of the
g
-band light curves and 2.62% of the
r
-band light curves were classified as stochastic, periodic, or transient with high probability (
P
init
≥ 0.9). Even though the metrics obtained for the two models are similar, we find that, in general, more reliable results are obtained when using the
g
-band model. With it, we identified 384 242 AGN candidates (including low-, mid-, and high-redshift AGN and Blazars), 287 156 of which have
P
init
≥ 0.9.
In recent decades, machine learning has provided valuable models and algorithms for processing and extracting knowledge from time-series surveys. Different classifiers have been proposed and ...performed to an excellent standard. Nevertheless, few papers have tackled the data shift problem in labeled training sets, which occurs when there is a mismatch between the data distribution in the training set and the testing set. This drawback can damage the prediction performance in unseen data. Consequently, we propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem during the training of a multi-layer perceptron for RR Lyrae classification. We collect ranges for characteristic features to construct a symbolic representation of prior knowledge, which was used to model the informative regularizer component. Simultaneously, we design a two-step back-propagation algorithm to integrate this knowledge into the neural network, whereby one step is applied in each epoch to minimize classification error, while another is applied to ensure regularization. Our algorithm defines a subset of parameters (a mask) for each loss function. This approach handles the forgetting effect, which stems from a trade-off between these loss functions (learning from data versus learning expert knowledge) during training. Experiments were conducted using recently proposed shifted benchmark sets for RR Lyrae stars, outperforming baseline models by up to 3% through a more reliable classifier. Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
•A scalable and adaptable regularization to include rule-based knowledge into ANNs.•More reliable RR Lyrae classifiers to mitigate the data shift problem.•A double back-propagation based on masks to ensure the knowledge injection.•An improvement with respect to baseline models in three complementary metrics.
We propose a new information theoretic metric for finding periodicities in stellar light curves. Light curves are astronomical time series of brightness over time, and are characterized as being ...noisy and unevenly sampled. The proposed metric combines correntropy (generalized correlation) with a periodic kernel to measure similarity among samples separated by a given period. The new metric provides a periodogram, called Correntropy Kernelized Periodogram (CKP), whose peaks are associated with the fundamental frequencies present in the data. The CKP does not require any resampling, slotting or folding scheme as it is computed directly from the available samples. CKP is the main part of a fully-automated pipeline for periodic light curve discrimination to be used in astronomical survey databases. We show that the CKP method outperformed the slotted correntropy, and conventional methods used in astronomy for periodicity discrimination and period estimation tasks, using a set of light curves drawn from the MACHO survey. The proposed metric achieved 97.2% of true positives with 0% of false positives at the confidence level of 99% for the periodicity discrimination task; and 88% of hits with 11.6% of multiples and 0.4% of misses in the period estimation task.