Abstract
Two of the main problems encountered in the development and accurate validation of photometric redshift (photo-z) techniques are the lack of spectroscopic coverage in the feature space (e.g. ...colours and magnitudes) and the mismatch between the photometric error distributions associated with the spectroscopic and photometric samples. Although these issues are well known, there is currently no standard benchmark allowing a quantitative analysis of their impact on the final photo-z estimation. In this work, we present two galaxy catalogues, Teddy and Happy, built to enable a more demanding and realistic test of photo-z methods. Using photometry from the Sloan Digital Sky Survey and spectroscopy from a collection of sources, we constructed data sets that mimic the biases between the underlying probability distribution of the real spectroscopic and photometric sample. We demonstrate the potential of these catalogues by submitting them to the scrutiny of different photo-z methods, including machine learning (ML) and template fitting approaches. Beyond the expected bad results from most ML algorithms for cases with missing coverage in the feature space, we were able to recognize the superiority of global models in the same situation and the general failure across all types of methods when incomplete coverage is convoluted with the presence of photometric errors – a data situation which photo-z methods were not trained to deal with up to now and which must be addressed by future large-scale surveys. Our catalogues represent the first controlled environment allowing a straightforward implementation of such tests. The data are publicly available within the COINtoolbox (https://github.com/COINtoolbox/photoz_catalogues).
1 University Clinic of Tübingen, Department of Internal Medicine II, Division of Hematology, Oncology, and Immunology, Tübingen
2 University of Tübingen, Institute of Anatomy, Department of ...Experimental Embryology, Division of Tissue Engineering, Tübingen
3 University Clinic of Tübingen, Childrens Hospital, Department of General Pediatrics, Division of Hematology and Oncology, Tübingen
4 Leiden University Medical Center, Department of Immunohematology and Blood Transfusion, Center for Stem Cell Therapy, Leiden, The Netherlands and
5 Hospital for Workers Compensation Tübingen, Department of Orthopedic Surgery, Tübingen, Germany
Correspondence: Hans-Jörg Bühring, Ph.D., University of Tübingen, Department of Internal Medicine II, Medical, Otfried-Müller-Str. 10, 72076, Tübingen, Germany., E-mail: hans-joerg.buehring{at}uni-tuebingen.de
Background: Conventionally, mesenchymal stem cells are functionally isolated from primary tissue based on their capacity to adhere to a plastic surface. This isolation procedure is hampered by the unpredictable influence of co-cultured hematopoietic and/or other unrelated cells and/or by the elimination of a late adhering mesenchymal stem cells subset during removal of undesired cells. To circumvent these limitations, several antibodies have been developed to facilitate the prospective isolation of mesenchymal stem cells. Recently, we described a panel of monoclonal antibodies with superior selectivity for mesenchymal stem cells, including the monoclonal antibodies W8B2 against human mesenchymal stem cell antigen-1 (MSCA-1) and 39D5 against a CD56 epitope, which is not expressed on natural killer cells.
Design and Methods: Bone marrow derived mesenchymal stem cells from healthy donors were analyzed and isolated by flow cytometry using a large panel of antibodies against surface antigens including CD271, MSCA-1, and CD56. The growth of mesenchymal stem cells was monitored by colony formation unit fibroblast (CFU-F) assays. The differentiation of mesenchymal stem cells into defined lineages was induced by culture in appropriate media and verified by immunostaining.
Results: Multicolor cell sorting and CFU-F assays showed that mesenchymal stem cells were ~90-fold enriched in the MSCA-1 + CD56 – fraction and ~180-fold in the MSCA-1 + CD56 + fraction. Phenotype analysis revealed that the expression of CD10, CD26, CD106, and CD146 was restricted to the MSCA-1 + CD56 – mesenchymal stem cells subset and CD166 to MSCA-1 + CD56 ± mesenchymal stem cells. Further differentiation of these subsets showed that chondrocytes and pancreatic-like islets were predominantly derived from MSCA-1 + CD56 ± cells whereas adipocytes emerged exclusively from MSCA-1 + CD56 – cells. The culture of single sorted MSCA-1 + CD56 + cells resulted in the appearance of phenotypically heterogeneous clones with distinct proliferation and differentiation capacities.
Conclusions: Novel mesenchymal stem cells subsets with distinct phenotypic and functional properties were identified. Our data suggest that the MSCA-1 + CD56 + subset is an attractive starting population for autologous chondrocyte transplantation.
Key words: mesenchymal stem cells, CD56, MSCA-1.
Abstract
We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional emission-line classification schemes of galaxy ionization sources: the Baldwin–Phillips–Terlevich (BPT) and W
H α ...versus N ii/H α (WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey Data Release 7 and SEAGal/STARLIGHT data sets. We apply a GMM to empirically define classes of galaxies in a three-dimensional space spanned by the log O iii/H β, log N ii/H α and log EW(H α) optical parameters. The best-fitting GMM based on several statistical criteria suggests a solution around four Gaussian components (GCs), which are capable to explain up to 97 per cent of the data variance. Using elements of information theory, we compare each GC to their respective astronomical counterpart. GC1 and GC4 are associated with star-forming galaxies, suggesting the need to define a new starburst subgroup. GC2 is associated with BPT's active galactic nuclei (AGN) class and WHAN's weak AGN class. GC3 is associated with BPT's composite class and WHAN's strong AGN class. Conversely, there is no statistical evidence – based on four GCs – for the existence of a Seyfert/low-ionization nuclear emission-line region (LINER) dichotomy in our sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The GC5 appears associated with the LINER and passive galaxies on the BPT and WHAN diagrams, respectively. This indicates that if the Seyfert/LINER dichotomy is there, it does not account significantly to the global data variance and may be overlooked by standard metrics of goodness of fit. Subtleties aside, we demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical data sets, without lacking the ability to convey physically interpretable results. The probabilistic classifications from the GMM analysis are publicly available within the COINtoolbox at https://cointoolbox.github.io/GMM_Catalogue/.
Return of the features D’Isanto, A.; Cavuoti, S.; Gieseke, F. ...
Astronomy and astrophysics (Berlin),
08/2018, Volume:
616
Journal Article
Peer reviewed
Open access
Context. The explosion of data in recent years has generated an increasing need for new analysis techniques in order to extract knowledge from massive data-sets. Machine learning has proved ...particularly useful to perform this task. Fully automatized methods (e.g. deep neural networks) have recently gathered great popularity, even though those methods often lack physical interpretability. In contrast, feature based approaches can provide both well-performing models and understandable causalities with respect to the correlations found between features and physical processes. Aims. Efficient feature selection is an essential tool to boost the performance of machine learning models. In this work, we propose a forward selection method in order to compute, evaluate, and characterize better performing features for regression and classification problems. Given the importance of photometric redshift estimation, we adopt it as our case study. Methods. We synthetically created 4520 features by combining magnitudes, errors, radii, and ellipticities of quasars, taken from the Sloan Digital Sky Survey (SDSS). We apply a forward selection process, a recursive method in which a huge number of feature sets is tested through a k-Nearest-Neighbours algorithm, leading to a tree of feature sets. The branches of the feature tree are then used to perform experiments with the random forest, in order to validate the best set with an alternative model. Results. We demonstrate that the sets of features determined with our approach improve the performances of the regression models significantly when compared to the performance of the classic features from the literature. The found features are unexpected and surprising, being very different from the classic features. Therefore, a method to interpret some of the found features in a physical context is presented. Conclusions. The feature selection methodology described here is very general and can be used to improve the performance of machine learning models for any regression or classification task.
Full text
Available for:
FMFMET, NUK, UL, UM, UPUK
The existence of multiple subclasses of Type Ia supernovae (SNe Ia) has been the subject of great debate in the last decade. One major challenge inevitably met when trying to infer the existence of ...one or more subclasses is the time consuming, and subjective, process of subclass definition. In this work, we show how machine learning tools facilitate identification of subtypes of SNe Ia through the establishment of a hierarchical group structure in the continuous space of spectral diversity formed by these objects. Using deep learning, we were capable of performing such identification in a four-dimensional feature space (+1 for time evolution), while the standard principal component analysis barely achieves similar results using 15 principal components. This is evidence that the progenitor system and the explosion mechanism can be described by a small number of initial physical parameters. As a proof of concept, we show that our results are in close agreement with a previously suggested classification scheme and that our proposed method can grasp the main spectral features behind the definition of such subtypes. This allows the confirmation of the velocity of lines as a first-order effect in the determination of SN Ia subtypes, followed by 91bg-like events. Given the expected data deluge in the forthcoming years, our proposed approach is essential to allow a quick and statistically coherent identification of SNe Ia subtypes (and outliers). All tools used in this work were made publicly available in the python package Dimensionality Reduction And Clustering for Unsupervised Learning in Astronomy (dracula) and can be found within COINtoolbox (https://github.com/COINtoolbox/DRACULA).
Multipotent mesenchymal stromal cells (MSC) have become important tools in regenerative and transplantation medicine. Rapidly increasing numbers of patients are receiving in vitro-expanded MSC. ...Culture conditions typically include FSC because human serum does not fully support growth of human MSC in vitro (MSCFCS). Concerns regarding BSE, other infectious complications and host immune reactions have fueled investigation of alternative culture supplements.
As PDGF has long been identified as a growth factor for MSC, we tested media supplementation with platelet lysate for support of MSC proliferation.
We found that primary cultures of BM-derived MSC can be established with animal serum-free media containing fresh frozen plasma and platelets (MSCFFPP). Moreover, MSCFFPP showed vigorous proliferation that was superior to classical culture conditions containing FCS. MSCFFPP morphology was equivalent to MSCFCS, and MSCFFPP expressed CD73, CD90, CD105, CD106, CD146 and HLA-ABC while being negative for CD34, CD45 and surface HLA-DR, as expected. In addition to being phenotypically identical, MSCFFPP could efficiently differentiate into adipocytes and osteoblasts. In terms of immune regulatory properties, MSCFFPP were indistinguishable from MSCFCS. Proliferation of PBMC induced by IL-2 in combination with OKT-3 or by PHA was inhibited in the presence of MSCFFPP.
Taken together, FCS can be replaced safely by FFPP in cultures of MSC for clinical purposes.
Full text
Available for:
DOBA, GEOZS, IJS, IMTLJ, IZUM, KILJ, KISLJ, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBJE, UILJ, UKNU, UL, UM, UPCLJ, UPUK
Deep-Learning the Time Domain Mahabal, A.; Sheth, K.; Gieseke, F. ...
Proceedings of the International Astronomical Union,
11/2017, Volume:
14, Issue:
S339
Journal Article
Peer reviewed
“Deep learning” is finding more and more applications everywhere, and astronomy is not an exception. This talk described the application of convolutional neural networks to time-domain astronomy, ...specifically to light-curves of sources. The work that is discussed is based on a published paper to which reference can be made for more detail. The talk finished with a note cautioning new practitioners about the pitfalls lurking in out-of-the-box use of deep-learning techniques.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK
10.
Return of the features D’Isanto, A.; Cavuoti, S.; Gieseke, F. ...
Astronomy and astrophysics (Berlin),
08/2018, Volume:
616
Journal Article
Peer reviewed
Open access
Context
. The explosion of data in recent years has generated an increasing need for new analysis techniques in order to extract knowledge from massive data-sets. Machine learning has proved ...particularly useful to perform this task. Fully automatized methods (e.g. deep neural networks) have recently gathered great popularity, even though those methods often lack physical interpretability. In contrast, feature based approaches can provide both well-performing models and understandable causalities with respect to the correlations found between features and physical processes.
Aims
. Efficient feature selection is an essential tool to boost the performance of machine learning models. In this work, we propose a forward selection method in order to compute, evaluate, and characterize better performing features for regression and classification problems. Given the importance of photometric redshift estimation, we adopt it as our case study.
Methods
. We synthetically created 4520 features by combining magnitudes, errors, radii, and ellipticities of quasars, taken from the Sloan Digital Sky Survey (SDSS). We apply a forward selection process, a recursive method in which a huge number of feature sets is tested through a k-Nearest-Neighbours algorithm, leading to a tree of feature sets. The branches of the feature tree are then used to perform experiments with the random forest, in order to validate the best set with an alternative model.
Results
. We demonstrate that the sets of features determined with our approach improve the performances of the regression models significantly when compared to the performance of the classic features from the literature. The found features are unexpected and surprising, being very different from the classic features. Therefore, a method to interpret some of the found features in a physical context is presented.
Conclusions
. The feature selection methodology described here is very general and can be used to improve the performance of machine learning models for any regression or classification task.
Full text
Available for:
FMFMET, NUK, UL, UM, UPUK