How can we discover objects we did not know existed within the large data sets that now abound in astronomy? We present an outlier detection algorithm that we developed, based on an unsupervised ...Random Forest. We test the algorithm on more than two million galaxy spectra from the Sloan Digital Sky Survey and examine the 400 galaxies with the highest outlier score. We find objects which have extreme emission line ratios and abnormally strong absorption lines, objects with unusual continua, including extremely reddened galaxies. We find galaxy-galaxy gravitational lenses, double-peaked emission line galaxies and close galaxy pairs. We find galaxies with high ionization lines, galaxies that host supernovae and galaxies with unusual gas kinematics. Only a fraction of the outliers we find were reported by previous studies that used specific and tailored algorithms to find a single class of unusual objects. Our algorithm is general and detects all of these classes, and many more, regardless of what makes them peculiar. It can be executed on imaging, time series and other spectroscopic data, operates well with thousands of features, is not sensitive to missing values and is easily parallelizable.
Abstract
The scaling relations between supermassive black holes and their host galaxy properties are of fundamental importance in the context black hole-host galaxy co-evolution throughout cosmic ...time. In this work, we use a novel algorithm that identifies smooth trends in complex data sets and apply it to a sample of 2000 type 1 active galactic nuclei (AGNs) spectra. We detect a sequence in emission line shapes and strengths which reveals a correlation between the narrow L(O iii)/L(H β) line ratio and the width of the broad H α. This scaling relation ties the kinematics of the gas clouds in the broad line region to the ionization state of the narrow line region, connecting the properties of gas clouds kiloparsecs away from the black hole to material gravitationally bound to it on sub-parsec scales. This relation can be used to estimate black hole masses from narrow emission lines only. It therefore enables black hole mass estimation for obscured type 2 AGNs and allows us to explore the connection between black holes and host galaxy properties for thousands of objects, well beyond the local Universe. Using this technique, we present the MBH–σ and MBH–M* scaling relations for a sample of about 10 000 type 2 AGNs from Sloan Digital Sky Survey. These relations are remarkably consistent with those observed for type 1 AGNs, suggesting that this new method may perform as reliably as the classical estimate used in non-obscured type 1 AGNs. These findings open a new window for studies of black hole-host galaxy co-evolution throughout cosmic time.
Poststarburst galaxies are believed to be in a rapid transition between major merger starbursts and quiescent ellipticals, where active galactic nucleus (AGN) feedback is suggested as one of the ...processes responsible for the quenching. To study the role of AGN feedback, we constructed a sample of poststarburst candidates with AGN and indications of ionized outflows in optical. We use MUSE/VLT observations to spatially resolve the properties of the stars and multiphase gas in five of them. All galaxies show signatures of interaction/merger in their stellar or gas properties, with some at an early stage of interaction with companions ∼50 kpc, suggesting that optical poststarburst signatures may be present well before the final starburst and coalescence. We detect narrow and broad kinematic components in multiple transitions in all the galaxies. Our detailed analysis of their kinematics and morphology suggests that, contrary to our expectation, the properties of the broad kinematic components are inconsistent with AGN-driven winds in three out of five galaxies. The two exceptions are also the only galaxies in which spatially resolved NaID P-Cygni profiles are detected. In some cases, the observations are more consistent with interaction-induced galactic-scale flows, an often overlooked process. These observations raise the question of how to interpret broad kinematic components in interacting and perhaps also in active galaxies, in particular when spatially resolved observations are not available or cannot rule out merger-induced galactic-scale motions. We suggest that NaID P-Cygni profiles are more effective outflow tracers, and use them to estimate the energy that is carried by the outflow.
Machine learning (ML) algorithms have become increasingly important in the analysis of astronomical data. However, because most ML algorithms are not designed to take data uncertainties into account, ...ML-based studies are mostly restricted to data with high signal-to-noise ratios. Astronomical data sets of such high quality are uncommon. In this work, we modify the long-established Random Forest (RF) algorithm to take into account uncertainties in measurements (i.e., features) as well as in assigned classes (i.e., labels). To do so, the Probabilistic Random Forest (PRF) algorithm treats the features and labels as probability distribution functions, rather than deterministic quantities. We perform a variety of experiments where we inject different types of noise into a data set and compare the accuracy of the PRF to that of RF. The PRF outperforms RF in all cases, with a moderate increase in running time. We find an improvement in classification accuracy of up to 10% in the case of noisy features, and up to 30% in the case of noisy labels. The PRF accuracy decreased by less then 5% for a data set with as many as 45% misclassified objects, compared to a clean data set. Apart from improving the prediction accuracy in noisy data sets, the PRF naturally copes with missing values in the data, and outperforms RF when applied to a data set with different noise characteristics in the training and test sets, suggesting that it can be used for transfer learning.
Abstract
Scientists aim to extract simplicity from observations of the complex world. An important component of this process is the exploration of data in search of trends. In practice, however, this ...tends to be more of an art than a science. Among all trends existing in the natural world, one-dimensional trends, often called sequences, are of particular interest, as they provide insights into simple phenomena. However, some are challenging to detect, as they may be expressed in complex manners. We present the Sequencer, an algorithm designed to generically identify the main trend in a data set. It does so by constructing graphs describing the similarities between pairs of observations, computed with a set of metrics and scales. Using the fact that continuous trends lead to more elongated graphs, the algorithm can identify which aspects of the data are relevant in establishing a global sequence. Such an approach can be used beyond the proposed algorithm and can optimize the parameters of any dimensionality reduction technique. We demonstrate the power of the Sequencer using real-world data from astronomy, geology, and images from the natural world. We show that, in a number of cases, it outperforms the popular t-Distributed Stochastic Neighbor Embedding and Uniform Manifold Approximation and Projection dimensionality reduction techniques. This approach to exploratory data analysis, which does not rely on training or tuning any parameter, has the potential to enable discoveries in a wide range of scientific domains. The source code is available on GitHub, and we provide an online interface at
http://sequencer.org
.
In this talk I will show that multi-wavelength observations can provide novel constraints on the properties of ionized gas outflows in AGN. I will present evidence that the infrared emission in ...active galaxies includes a contribution from dust which is mixed with the outflow and is heated by the AGN. We detect this infrared component in thousands of AGN for the first time, and use it to constrain the outflow location. By combining this with optical emission lines, we constrain the mass outflow rates and energetics in a sample of 234 type II AGN, the largest such sample to date. The key ingredient of our new outflow measurements is a novel method to estimate the electron density using the ionization parameter and location of the flow. The inferred electron densities, ∼104.5 cm−3, are two orders of magnitude larger than found in most other cases of ionized outflows. We argue that the discrepancy is due to the fact that the commonly-used SII-based method underestimates the true density by a large factor. As a result, the inferred mass outflow rates and kinetic coupling efficiencies are 1–2 orders of magnitude lower than previous estimates, and 3–4 orders of magnitude lower than the typical requirement in hydrodynamic cosmological simulations. These results have significant implications for the relative importance of ionized outflows feedback in this population.
The scaling relations between supermassive black holes and their host galaxy properties are of fundamental importance in the context black hole-host galaxy co-evolution throughout cosmic time. Beyond ...the local universe, such relations are based on black hole mass estimates in type I AGN. Unfortunately, for this type of objects the host galaxy properties are more difficult to obtain since the AGN dominates the observed flux in most wavelength ranges. In this poster I will present a new correlation we discovered between the narrow L(OIII)/L(Hβ) line ratio and the FWHM(broad Hα). This scaling relation ties the kinematics of the gas clouds in the broad line region to the ionization state of gas in the narrow line region, connecting the properties of gas clouds kiloparsecs away from the black hole to material gravitationally bound to it on sub-parsec scales. This relation can be used to estimate black hole masses from narrow emission lines only, and thus brings the missing piece required to estimate black hole masses in obscured type II AGN. Using this technique, we estimate the black hole mass of about 10,000 type II AGN, and present, for the first time, M(BH)-sigma and M(BH)-M(stars) scaling relations for this population. These relations are remarkably consistent with those observed for type I AGN, suggesting that this new method may perform as reliably as the classical estimate used in non-obscured type I AGN. These findings open a new window for studies of black hole-host galaxy co-evolution throughout cosmic time.
Abstract
In this work, we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure ...for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the data set, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the data set for objects allows us to find objects that are impossible to find using their best-fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the data set, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data.