ABSTRACT
During the last decade, considerable effort has been made to perform automatic classification of variable stars using machine-learning techniques. Traditionally, light curves are represented ...as a vector of descriptors or features used as input for many algorithms. Some features are computationally expensive, cannot be updated quickly and hence for large data sets such as the LSST cannot be applied. Previous work has been done to develop alternative unsupervised feature extraction algorithms for light curves, but the cost of doing so still remains high. In this work, we propose an end-to-end algorithm that automatically learns the representation of light curves that allows an accurate automatic classification. We study a series of deep learning architectures based on recurrent neural networks and test them in automated classification scenarios. Our method uses minimal data pre-processing, can be updated with a low computational cost for new observations and light curves, and can scale up to massive data sets. We transform each light curve into an input matrix representation whose elements are the differences in time and magnitude, and the outputs are classification probabilities. We test our method in three surveys: OGLE-III, Gaia, and WISE. We obtain accuracies of about $95{{\ \rm per\ cent}}$ in the main classes and $75{{\ \rm per\ cent}}$ in the majority of subclasses. We compare our results with the Random Forest classifier and obtain competitive accuracies while being faster and scalable. The analysis shows that the computational complexity of our approach grows up linearly with the light-curve size, while the traditional approach cost grows as Nlog (N).
ABSTRACT
The immense amount of time series data produced by astronomical surveys has called for the use of machine learning algorithms to discover and classify several million celestial sources. In ...the case of variable stars, supervised learning approaches have become commonplace. However, this needs a considerable collection of expert-labelled light curves to achieve adequate performance, which is costly to construct. To solve this problem, we introduce two approaches. First, a semi-supervised hierarchical method, which requires substantially less trained data than supervised methods. Second, a clustering analysis procedure that finds groups that may correspond to classes or subclasses of variable stars. Both methods are primarily supported by dimensionality reduction of the data for visualization and to avoid the curse of dimensionality. We tested our methods with catalogues collected from the Optical Gravitational Lensing Experiment (OGLE), the Catalina Sky Survey (CSS), and the Gaia survey. The semi-supervised method reaches a performance of around 90 per cent for all of our three selected catalogues of variable stars using only $5{{\ \rm per\ cent}}$ of the data in the training. This method is suitable for classifying the main classes of variable stars when there is only a small amount of training data. Our clustering analysis confirms that most of the clusters found have a purity over 90 per cent with respect to classes and 80 per cent with respect to subclasses, suggesting that this type of analysis can be used in large-scale variability surveys as an initial step to identify which classes or subclasses of variable stars are present in the data and/or to build training sets, among many other possible applications.
Streaming classification of variable stars Zorich, L; Pichara, K; Protopapas, P
Monthly notices of the Royal Astronomical Society,
02/2020, Letnik:
492, Številka:
2
Journal Article
Recenzirano
Odprti dostop
ABSTRACT
In the last years, automatic classification of variable stars has received substantial attention. Using machine learning techniques for this task has proven to be quite useful. Typically, ...machine learning classifiers used for this task require to have a fixed training set, and the training process is performed offline. Upcoming surveys such as the Large Synoptic Survey Telescope will generate new observations daily, where an automatic classification system able to create alerts online will be mandatory. A system with those characteristics must be able to update itself incrementally. Unfortunately, after training, most machine learning classifiers do not support the inclusion of new observations in light curves, they need to re-train from scratch. Naively re-training from scratch is not an option in streaming settings, mainly because of the expensive pre-processing routines required to obtain a vector representation of light curves (features) each time we include new observations. In this work, we propose a streaming probabilistic classification model; it uses a set of newly designed features that work incrementally. With this model, we can have a machine learning classifier that updates itself in real time with new observations. To test our approach, we simulate a streaming scenario with light curves from Convention, Rotation and planetary Transits (CoRoT), Orbital Gravitational Lensing Experiment (OGLE), and Massive Compact Halo Object (MACHO) catalogues. Results show that our model achieves high classification performance, staying an order of magnitude faster than traditional classification approaches.
Abstract
We introduce the Automatic Learning for the Rapid Classification of Events (ALeRCE) broker, an astronomical alert broker designed to provide a rapid and self-consistent classification of ...large etendue telescope alert streams, such as that provided by the Zwicky Transient Facility (ZTF) and, in the future, the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). ALeRCE is a Chilean-led broker run by an interdisciplinary team of astronomers and engineers working to become intermediaries between survey and follow-up facilities. ALeRCE uses a pipeline that includes the real-time ingestion, aggregation, cross-matching, machine-learning (ML) classification, and visualization of the ZTF alert stream. We use two classifiers: a stamp-based classifier, designed for rapid classification, and a light curve–based classifier, which uses the multiband flux evolution to achieve a more refined classification. We describe in detail our pipeline, data products, tools, and services, which are made public for the community (see
https://alerce.science
). Since we began operating our real-time ML classification of the ZTF alert stream in early 2019, we have grown a large community of active users around the globe. We describe our results to date, including the real-time processing of 1.5 × 10
8
alerts, the stamp classification of 3.4 × 10
7
objects, the light-curve classification of 1.1 × 10
6
objects, the report of 6162 supernova candidates, and different experiments using LSST-like alert streams. Finally, we discuss the challenges ahead in going from a single stream of alerts such as ZTF to a multistream ecosystem dominated by LSST.
ABSTRACT
In the new era of very large telescopes, where data are crucial to expand scientific knowledge, we have witnessed many deep learning applications for the automatic classification of light ...curves. Recurrent neural networks (RNNs) are one of the models used for these applications, and the Long Short-Term Memory (LSTM) unit stands out for being an excellent choice for the representation of long time series. In general, RNNs assume observations at discrete times, which may not suit the irregular sampling of light curves. A traditional technique to address irregular sequences consists of adding the sampling time to the network’s input, but this is not guaranteed to capture sampling irregularities during training. Alternatively, the Phased LSTM (PLSTM) unit has been created to address this problem by updating its state using the sampling times explicitly. In this work, we study the effectiveness of the LSTM- and PLSTM-based architectures for the classification of astronomical light curves. We use seven catalogues containing periodic and non-periodic astronomical objects. Our findings show that LSTM outperformed PLSTM on six of seven data sets. However, the combination of both units enhances the results in all data sets.
We investigate the efficacy of recursive Bayesian estimation of regularized and irregular astrophysical time series using particle filters to understand latent dynamics. We begin by regularizing a ...MACHO (massive compact halo object) quasar light curve using linear interpolation techniques. This is subsequently modelled using a variety of autoregressive and autoregressive-integrated moving average models. We find that we can learn regularized astrophysical time series using particle filters. Motivated by this result, we proceed by working on raw, irregular light curves. Accurately modelling the underlying dynamics as a continuous autoregressive stochastic process, calibrated using an MCMC we find that the scale variable, τ, is in fact first-order stable across 55 MACHO quasar light curves and thus not correlated with the black hole mass. We show that particle filters can be used to learn regularized and irregular astrophysical light curves. These results can be used to inform classification systems of stellar type and further study variability characteristics of quasars.
We present visual-like morphologies over 16 photometric bands, from ultraviolet to near-infrared, for 8412 galaxies in the Cluster Lensing And Supernova survey with Hubble (CLASH) obtained using a ...convolutional neural network (ConvNet) model. Our model follows the Cosmic Assembly Near-IR Deep Extragalactic Legacy Survey (CANDELS) main morphological classification scheme, obtaining the probability for each galaxy at each CLASH band of being spheroid, disk, irregular, point source, or unclassifiable. Our catalog contains morphologies for each galaxy with Hmag < 24.5 in every filter where the galaxy is observed. We trained an initial ConvNet model using approximately 7500 expert eyeball labels from CANDELS. We created eyeball labels for 100 randomly selected galaxies per each of the 16-filter set of CLASH (1600 galaxy images in total), where each image was classified by at least five of us. We use these labels to fine-tune the network to accurately predict labels for the CLASH data and to evaluate the performance of our model. We achieve a root-mean-square error of 0.0991 on the test set. We show that our proposed fine-tuning technique reduces the number of labeled images needed for training, as compared to directly training over the CLASH data, and achieves a better performance. This approach is very useful to minimize eyeball labeling efforts when classifying unlabeled data from new surveys. This will become particularly useful for massive data sets such as those coming from near-future surveys such as EUCLID or the LSST. Our catalog consists of prediction of probabilities for each galaxy by morphology in their different bands and is made publicly available at http://www.inf.udec.cl/~guille/data/Deep-CLASH.csv.
We describe a method to estimate the mass distribution of a gravitational lens and the position of the sources from combined strong and weak lensing data. The algorithm combines weak and strong ...lensing data in a unified way producing a solution which is valid in both the weak and the strong lensing regimes. The method is non-parametric, allowing the mass to be located anywhere in the field of view. We study how the solution depends on the choice of basis used to represent the mass distribution. We find that combining weak and strong lensing information has two major advantages: it alleviates the need for priors and/or regularization schemes for the intrinsic size of the background galaxies (this assumption was needed in previous strong lensing algorithms) and it reduces (although does not remove) biases in the recovered mass in the outer regions where the strong lensing data are less sensitive. The code is implemented into a software package called Weak & Strong Lensing Analysis Package (wslap) which is publicly available at http://darwin.cfa.harvard.edu/SLAP/.
We present statistical characteristics of 1578 delta Scuti stars including nearby field stars and cluster member stars within the Milky Way. We obtained 46% of these stars (718 stars) from work by ...Rodriguez and collected the remaining 54% of stars (860 stars) from other literature. We updated the entries with the latest information of sky coordinates, color, rotational velocity, spectral type, period, amplitude, and binarity. The majority of our sample is well characterized in terms of typical period range (0.02-0.25 days), pulsation amplitudes (<0.5 mag), and spectral types (A-F type). Given this list of delta Scuti stars, we examined relations between their physical properties (i.e., periods, amplitudes, spectral types, and rotational velocities) for field stars and cluster members, and confirmed that the correlations of properties are not significantly different from those reported in Rodriguez's work. All the delta Scuti stars are cross-matched with several X-ray and UV catalogs, resulting in 27 X-ray and 41 UV-only counterparts. These counterparts are interesting targets for further study because of their uniqueness in showing delta Scuti-type variability and X-ray/UV emission at the same time. The compiled catalog can be accessed through the Web interface http://stardb.yonsei.ac.kr/DeltaScuti.
We present a new classification method for quasar identification in the EROS-2 and MACHO data sets based on a boosted version of a random forest classifier. We use a set of variability features ...including parameters of a continuous autoregressive model. We prove that continuous autoregressive parameters are very important discriminators in the classification process. We create two training sets (one for EROS-2 and one for MACHO data sets) using known quasars found in the Large Magellanic Cloud (LMC). Our model's accuracy in both EROS-2 and MACHO training sets is about 90 per cent precision and 86 per cent recall, improving the state-of-the-art models, accuracy in quasar detection. We apply the model on the complete, including 28 million objects, EROS-2 and MACHO LMC data sets, finding 1160 and 2551 candidates, respectively. To further validate our list of candidates, we cross-matched our list with 663 previously known strong candidates, getting 74 per cent of matches for MACHO and 40 per cent in EROS.
The main difference on matching level is because EROS-2 is a slightly shallower survey which translates to significantly lower signal-to-noise ratio light curves.