Anomaly detection in the Open Supernova Catalog Pruzhinskaya, M V; Malanchev, K L; Kornilov, M V ...
Monthly notices of the Royal Astronomical Society,
08/2019, Volume:
489, Issue:
3
Journal Article
Peer reviewed
Open access
ABSTRACT
In the upcoming decade, large astronomical surveys will discover millions of transients raising unprecedented data challenges in the process. Only the use of the machine learning algorithms ...can process such large data volumes. Most of the discovered transients will belong to the known classes of astronomical objects. However, it is expected that some transients will be rare or completely new events of unknown physical nature. The task of finding them can be framed as an anomaly detection problem. In this work, we perform for the first time an automated anomaly detection analysis in the photometric data of the Open Supernova Catalog (OSC), which serves as a proof of concept for the applicability of these methods to future large-scale surveys. The analysis consists of the following steps: (1) data selection from the OSC and approximation of the pre-processed data with Gaussian processes, (2) dimensionality reduction, (3) searching for outliers with the use of the isolation forest algorithm, and (4) expert analysis of the identified outliers. The pipeline returned 81 candidate anomalies, 27 (33 per cent) of which were confirmed to be from astrophysically peculiar objects. Found anomalies correspond to a selected sample of 1.4 per cent of the initial automatically identified data sample of approximately 2000 objects. Among the identified outliers we recognized superluminous supernovae, non-classical Type Ia supernovae, unusual Type II supernovae, one active galactic nucleus and one binary microlensing event. We also found that 16 anomalies classified as supernovae in the literature are likely to be quasars or stars. Our proposed pipeline represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire. All code and products of this investigation are made publicly available.1
Aims.
We present the first piece of evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets.
Methods.
Our method follows an ...active learning strategy where the learning algorithm chooses objects that can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine learning model, allowing its accuracy to evolve with each new piece of information. For the case of anomaly detection, the algorithm aims to maximize the number of scientifically interesting anomalies presented to the expert by slightly modifying the weights of a traditional isolation forest (IF) at each iteration. In order to demonstrate the potential of such techniques, we apply the Active Anomaly Discovery algorithm to two data sets: simulated light curves from the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC) and real light curves from the Open Supernova Catalog. We compare the Active Anomaly Discovery results to those of a static IF. For both methods, we performed a detailed analysis for all objects with the ∼2% highest anomaly scores.
Results.
We show that, in the real data scenario, Active Anomaly Discovery was able to identify ∼80% more true anomalies than the IF. This result is the first piece of evidence that active anomaly detection algorithms can play a central role in the search for new physics in the era of large-scale sky surveys.
ABSTRACT
We present results from applying the SNAD anomaly detection pipeline to the third public data release of the Zwicky Transient Facility (ZTF DR3). The pipeline is composed of three stages: ...feature extraction, search of outliers with machine learning algorithms, and anomaly identification with followup by human experts. Our analysis concentrates in three ZTF fields, comprising more than 2.25 million objects. A set of four automatic learning algorithms was used to identify 277 outliers, which were subsequently scrutinized by an expert. From these, 188 (68 per cent) were found to be bogus light curves – including effects from the image subtraction pipeline as well as overlapping between a star and a known asteroid, 66 (24 per cent) were previously reported sources whereas 23 (8 per cent) correspond to non-catalogued objects, with the two latter cases of potential scientific interest (e.g. one spectroscopically confirmed RS Canum Venaticorum star, four supernovae candidates, one red dwarf flare). Moreover, using results from the expert analysis, we were able to identify a simple bi-dimensional relation that can be used to aid filtering potentially bogus light curves in future studies. We provide a complete list of objects with potential scientific application so they can be further scrutinised by the community. These results confirm the importance of combining automatic machine learning algorithms with domain knowledge in the construction of recommendation systems for astronomy. Our code is publicly available.1
Context.
Type Ia supernovae (SNe Ia) are widely used to measure the expansion of the Universe. Improving distance measurements of SNe Ia is one technique to better constrain the acceleration of ...expansion and determine its physical nature.
Aims.
This document develops a new SNe Ia spectral energy distribution (SED) model, called the SUpernova Generator And Reconstructor (SUGAR), which improves the spectral description of SNe Ia, and consequently could improve the distance measurements.
Methods.
This model was constructed from SNe Ia spectral properties and spectrophotometric data from the Nearby Supernova Factory collaboration. In a first step, a principal component analysis-like method was used on spectral features measured at maximum light, which allowed us to extract the intrinsic properties of SNe Ia. Next, the intrinsic properties were used to extract the average extinction curve. Third, an interpolation using Gaussian processes facilitated using data taken at different epochs during the lifetime of an SN Ia and then projecting the data on a fixed time grid. Finally, the three steps were combined to build the SED model as a function of time and wavelength. This is the SUGAR model.
Results.
The main advancement in SUGAR is the addition of two additional parameters to characterize SNe Ia variability. The first is tied to the properties of SNe Ia ejecta velocity and the second correlates with their calcium lines. The addition of these parameters, as well as the high quality of the Nearby Supernova Factory data, makes SUGAR an accurate and efficient model for describing the spectra of normal SNe Ia as they brighten and fade.
Conclusions.
The performance of this model makes it an excellent SED model for experiments like the Zwicky Transient Facility, the Large Synoptic Survey Telescope, or the Wide Field Infrared Survey Telescope.
Abstract
We show how spectra of Type Ia supernovae (SNe Ia) at maximum light can be used to improve cosmological distance estimates. In a companion article, we used manifold learning to build a ...three-dimensional parameterization of the intrinsic diversity of SNe Ia at maximum light that we call the “Twins Embedding.” In this article, we discuss how the Twins Embedding can be used to improve the standardization of SNe Ia. With a single spectrophotometrically calibrated spectrum near maximum light, we can standardize our sample of SNe Ia with an rms of 0.101 ± 0.007 mag, which corresponds to 0.084 ± 0.009 mag if peculiar velocity contributions are removed and to 0.073 ± 0.008 mag if a larger reference sample were obtained. Our techniques can standardize the full range of SNe Ia, including those typically labeled as peculiar and often rejected from other analyses. We find that traditional light-curve width + color standardization such as SALT2 is not sufficient. The Twins Embedding identifies a subset of SNe Ia, including, but not limited to, 91T-like SNe Ia whose SALT2 distance estimates are biased by 0.229 ± 0.045 mag. Standardization using the Twins Embedding also significantly decreases host-galaxy correlations. We recover a host mass step of 0.040 ± 0.020 mag compared to 0.092 ± 0.026 mag for SALT2 standardization on the same sample of SNe Ia. These biases in traditional standardization methods could significantly impact future cosmology analyses if not properly taken into account.
Abstract
We study the spectral diversity of Type Ia supernovae (SNe Ia) at maximum light using high signal-to-noise spectrophotometry of 173 SNe Ia from the Nearby Supernova Factory. We decompose the ...diversity of these spectra into different extrinsic and intrinsic components, and we construct a nonlinear parameterization of the intrinsic diversity of SNe Ia that preserves pairings of “twin” SNe Ia. We call this parameterization the “Twins Embedding.” Our methodology naturally handles highly nonlinear variability in spectra, such as changes in the photosphere expansion velocity, and uses the full spectrum rather than being limited to specific spectral line strengths, ratios, or velocities. We find that the time evolution of SNe Ia near maximum light is remarkably similar, with 84.6% of the variance in common to all SNe Ia. After correcting for brightness and color, the intrinsic variability of SNe Ia is mostly restricted to specific spectral lines, and we find intrinsic dispersions as low as ∼0.02 mag between 6600 and 7200 Å. With a nonlinear three-dimensional model plus one dimension for color, we can explain 89.2% of the intrinsic diversity in our sample of SNe Ia, which includes several different kinds of “peculiar” SNe Ia. A linear model requires seven dimensions to explain a comparable fraction of the intrinsic diversity. We show how a wide range of previously established indicators of diversity in SNe Ia can be recovered from the Twins Embedding. In a companion article, we discuss how these results can be applied to the standardization of SNe Ia for cosmology.
Abstract
We calibrate spectrophotometric optical spectra of 32 stars commonly used as standard stars, referenced to 14 stars already on the Hubble Space Telescope–based CALSPEC flux system. ...Observations of CALSPEC and non-CALSPEC stars were obtained with the SuperNova Integral Field Spectrograph over the wavelength range 3300–9400 Å as calibration for the Nearby Supernova Factory cosmology experiment. In total, this analysis used 4289 standard-star spectra taken on photometric nights. As a modern cosmology analysis, all presubmission methodological decisions were made with the flux scale and external comparison results blinded. The large number of spectra per star allows us to treat the wavelength-by-wavelength calibration for all nights simultaneously with a Bayesian hierarchical model, thereby enabling a consistent treatment of the Type Ia supernova cosmology analysis and the calibration on which it critically relies. We determine the typical per-observation repeatability (median 14 mmag for exposures ≳5 s), the Maunakea atmospheric transmission distribution (median dispersion of 7 mmag with uncertainty 1 mmag), and the scatter internal to our CALSPEC reference stars (median of 8 mmag). We also check our standards against literature filter photometry, finding generally good agreement over the full 12 mag range. Overall, the mean of our system is calibrated to the mean of CALSPEC at the level of ∼3 mmag. With our large number of observations, careful cross-checks, and 14 reference stars, our results are the best calibration yet achieved with an integral-field spectrograph, and among the best calibrated surveys.
Bump Morphology of the CMAGIC Diagram Aldoroty, L.; Wang, L.; Hoeflich, P. ...
The Astrophysical journal,
05/2023, Volume:
948, Issue:
1
Journal Article
Peer reviewed
Open access
Abstract
We apply the color–magnitude intercept calibration method (CMAGIC) to the Nearby Supernova Factory SNe Ia spectrophotometric data set. The currently existing CMAGIC parameters are the slope ...and intercept of a straight line fit to the linear region in the color–magnitude diagram, which occurs over a span of approximately 30 days after maximum brightness. We define a new parameter,
ω
XY
, the size of the “bump” feature near maximum brightness for arbitrary filters
X
and
Y
. We find a significant correlation between the slope of the linear region,
β
XY
, in the CMAGIC diagram and
ω
XY
. These results may be used to our advantage, as they are less affected by extinction than parameters defined as a function of time. Additionally,
ω
XY
is computed independently of templates. We find that current empirical templates are successful at reproducing the features described in this work, particularly SALT3, which correctly exhibits the negative correlation between slope and “bump” size seen in our data. In 1D simulations, we show that the correlation between the size of the “bump” feature and
β
XY
can be understood as a result of chemical mixing due to large-scale Rayleigh–Taylor instabilities.
Abstract
We construct a physically parameterized probabilistic autoencoder (PAE) to learn the intrinsic diversity of Type Ia supernovae (SNe Ia) from a sparse set of spectral time series. The PAE is ...a two-stage generative model, composed of an autoencoder that is interpreted probabilistically after training using a normalizing flow. We demonstrate that the PAE learns a low-dimensional latent space that captures the nonlinear range of features that exists within the population and can accurately model the spectral evolution of SNe Ia across the full range of wavelength and observation times directly from the data. By introducing a correlation penalty term and multistage training setup alongside our physically parameterized network, we show that intrinsic and extrinsic modes of variability can be separated during training, removing the need for the additional models to perform magnitude standardization. We then use our PAE in a number of downstream tasks on SNe Ia for increasingly precise cosmological analyses, including the automatic detection of SN outliers, the generation of samples consistent with the data distribution, and solving the inverse problem in the presence of noisy and incomplete data to constrain cosmological distance measurements. We find that the optimal number of intrinsic model parameters appears to be three, in line with previous studies, and show that we can standardize our test sample of SNe Ia with an rms of 0.091 ± 0.010 mag, which corresponds to 0.074 ± 0.010 mag if peculiar velocity contributions are removed. Trained models and codes are released at
https://github.com/georgestein/suPAErnova.
Abstract Intra-uterine growth restriction (IUGR) is defined by a restriction of fetal growth during gestation. It is a prevalent significant public health problem that jeopardizes neonatal health but ...also that can have deleterious consequences later in adult life. Cullins constitute a family of seven proteins involved in cell scaffold and in selective proteolysis via the ubiquitin-proteasome system. Most Cullins are critical for early embryonic development and mutations in some Cullin genes have been identified in human syndromes including growth retardation. Our work hypothesis is that Cullins, particularly CUL4B and CUL7, are involved in placental diseases and especially in IUGR. Thus, expression of Cullins and their cofactors was analyzed in normal and pathological placentas. We show that they present a constant significant over-expression in IUGR placentas, whose extent is dependent on the position of the interrogated fragment along the cDNAs, suggesting the existence of different isoforms of the genes. Particularly, the CUL7 gene is up-regulated up to 10 times in IUGR and 15 times in preeclampsia associated with IUGR. The expression of cofactors of Cullins participating to functional complexes has also been evaluated and showed a similar significant increase in IUGR. Promoters of Cullin genes appeared to be under the control of the SP1 transcription factor. Finally, methylation levels of the CUL7 promoter in placental tissues are modulated according to the pathological conditions, with a significant hypomethylation in IUGR. These results concur to pinpoint the Cullin family as a new set of markers of IUGR.