ABSTRACT
We present sofia 2, the fully automated 3D source finding pipeline for the WALLABY extragalactic H i survey with the Australian SKA Pathfinder (ASKAP). sofia 2 is a reimplementation of parts ...of the original sofia pipeline in the c programming language and makes use of OpenMP for multithreading of the most time-critical algorithms. In addition, we have developed a parallel framework called sofia-X that allows the processing of large data cubes to be split across multiple computing nodes. As a result of these efforts, sofia 2 is substantially faster and comes with a much reduced memory footprint compared to its predecessor, thus allowing the large WALLABY data volumes of hundreds of gigabytes of imaging data per epoch to be processed in real time. The source code has been made publicly available to the entire community under an open-source licence. Performance tests using mock galaxies injected into genuine ASKAP data suggest that in the absence of significant imaging artefacts sofia 2 is capable of achieving near-100 per cent completeness and reliability above an integrated signal-to-noise ratio (SNR) of about 5–6. We also demonstrate that sofia 2 generally recovers the location, integrated flux, and w20 line width of galaxies with high accuracy. Other parameters, including the peak flux density and w50 line width, are more strongly biased due to the influence of the noise on the measurement. In addition, very faint galaxies below an integrated SNR of about 10 may get broken up into multiple components, thus requiring a strategy to identify fragmented sources and ensure that they do not affect the integrity of any scientific analysis based on the sofia 2 output.
Why data analytics is an art Charles, Vincent; Emrouznejad, Ali; Gherman, Tatiana ...
Significance,
December 2022, 2022-12-01, 20221201, Letnik:
19, Številka:
6
Journal Article
Odprti dostop
Data analytics projects can be like throwing darts in the dark. Problem‐centric thinking is vital, argue Vincent Charles, Ali Emrouznejad, Tatiana Gherman, and James Cochran
Data analytics projects ...can be like throwing darts in the dark. Problem‐centric thinking is vital, argue Vincent Charles, Ali Emrouznejad, Tatiana Gherman, and James Cochran
The growing field of large-scale time domain astronomy requires methods for probabilistic data analysis that are computationally tractable, even with large data sets. Gaussian processes (GPs) are a ...popular class of models used for this purpose, but since the computational cost scales, in general, as the cube of the number of data points, their application has been limited to small data sets. In this paper, we present a novel method for GPs modeling in one dimension where the computational requirements scale linearly with the size of the data set. We demonstrate the method by applying it to simulated and real astronomical time series data sets. These demonstrations are examples of probabilistic inference of stellar rotation periods, asteroseismic oscillation spectra, and transiting planet parameters. The method exploits structure in the problem when the covariance function is expressed as a mixture of complex exponentials, without requiring evenly spaced observations or uniform noise. This form of covariance arises naturally when the process is a mixture of stochastically driven damped harmonic oscillators-providing a physical motivation for and interpretation of this choice-but we also demonstrate that it can be a useful effective model in some other cases. We present a mathematical description of the method and compare it to existing scalable GP methods. The method is fast and interpretable, with a range of potential applications within astronomical data analysis and beyond. We provide well-tested and documented open-source implementations of this method in C++, Python, and Julia.
Here, we present two galaxy shape catalogues from the Dark Energy Survey Year 1 data set, covering 1500 square degrees with a median redshift of 0:59. The catalogues cover two main fields: Stripe 82, ...and an area overlapping the South Pole Telescope survey region. We also describe our data analysis process and in particular our shape measurement using two independent shear measurement pipelines, METACALIBRATION and IM3SHAPE. The METACALIBRATION catalogue uses a Gaussian model with an innovative internal calibration scheme, and was applied to riz bands, yielding 34.8M objects. The IM3SHAPE catalogue uses a maximum-likelihood bulge/disc model calibrated using simulations, and was applied to r-band data, yielding 21.9M objects. Both catalogues pass a suite of null tests that demonstrate their fitness for use in weak lensing science. Finally, we estimated the 1 uncertainties in multiplicative shear calibration to be 0.013 and 0.025 for the METACALIBRATION and IM3SHAPE catalogues, respectively.
Data Feminism D'Ignazio, Catherine; Klein, Lauren F
The MIT Press eBooks,
03/2020
eBook
Odprti dostop
A new way of thinking about data science and data ethics that is informed by the ideas of intersectional feminism.
The open access edition of this book was made possible by generous funding from the ...MIT Libraries.
Today, data science is a form of power. It has been used to expose injustice, improve health outcomes, and topple governments. But it has also been used to discriminate, police, and surveil. This potential for good, on the one hand, and harm, on the other, makes it essential to ask: Data science by whom? Data science for whom? Data science with whose interests in mind? The narratives around big data and data science are overwhelmingly white, male, and techno-heroic. In Data Feminism, Catherine D'Ignazio and Lauren Klein present a new way of thinking about data science and data ethics—one that is informed by intersectional feminist thought.
Illustrating data feminism in action, D'Ignazio and Klein show how challenges to the male/female binary can help challenge other hierarchical (and empirically wrong) classification systems. They explain how, for example, an understanding of emotion can expand our ideas about effective data visualization, and how the concept of invisible labor can expose the significant human efforts required by our automated systems. And they show why the data never, ever “speak for themselves.”
Data Feminism offers strategies for data scientists seeking to learn how feminism can help them work toward justice, and for feminists who want to focus their efforts on the growing field of data science. But Data Feminism is about much more than gender. It is about power, about who has it and who doesn't, and about how those differentials of power can be challenged and changed.
The Lomb-Scargle periodogram is a well-known algorithm for detecting and characterizing periodic signals in unevenly sampled data. This paper presents a conceptual introduction to the Lomb-Scargle ...periodogram and important practical considerations for its use. Rather than a rigorous mathematical treatment, the goal of this paper is to build intuition about what assumptions are implicit in the use of the Lomb-Scargle periodogram and related estimators of periodicity, so as to motivate important practical considerations required in its proper application and interpretation.