Zero-inflated models are generally aimed to addressing the problem that arises from having two different sources that generate the zero values observed in a distribution. In practice, this is due to ...the fact that the population studied actually consists of two subpopulations: one in which the value zero is by default (structural zero) and the other is circumstantial (sample zero).
This work proposes a new methodology to fit zero inflated Bernoulli data from a Bayesian approach, able to distinguish between two potential sources of zeros (structural and non-structural).
The proposed methodology performance has been evaluated through a comprehensive simulation study, and it has been compiled as an R package freely available to the community. Its usage is illustrated by means of a real example from the field of occupational health as the phenomenon of sickness presenteeism, in which it is reasonable to think that some individuals will never be at risk of suffering it because they have not been sick in the period of study (structural zeros). Without separating structural and non-structural zeros one would be studying jointly the general health status and the presenteeism itself, and therefore obtaining potentially biased estimates as the phenomenon is being implicitly underestimated by diluting it into the general health status.
The proposed methodology is able to distinguish two different sources of zeros (structural and non-structural) from dichotomous data with or without covariates in a Bayesian framework, and has been made available to any interested researcher in the form of the bayesZIB R package ( https://cran.r-project.org/package=bayesZIB ).
zfit: scalable pythonic fitting Eschle, Jonas; Navarro Puig, Albert; Silva Coutinho, Rafael ...
EPJ Web of Conferences,
2020, Letnik:
245
Journal Article, Conference Proceeding
Recenzirano
Odprti dostop
Statistical modeling and fitting is a key element in most HEP analyses. This task is usually performed in the C++ based framework ROOT/RooFit. Recently the HEP community started shifting more to the ...Python language, which the tools above are only loose integrated into, and a lack of stable, native Python based toolkits became clear. We presented zfit, a project that aims at building a fitting ecosystem by providing a carefully designed, stable API and a workflow for libraries to communicate together with an implementation fully integrated into the Python ecosystem. It is built on top of one of the state-of-theart industry tools, TensorFlow, which is used the main computational backend. zfit provides data loading, extensive model building capabilities, loss creation, minimization and certain error estimation. Each part is also provided with convenient base classes built for customizability and extendability.
Statistical modeling is a key element in many scientific fields and especially in High-Energy Physics (HEP) analysis. The standard framework to perform this task in HEP is the C++ ROOT/RooFit ...toolkit; with Python bindings that are only loosely integrated into the scientific Python ecosystem. In this paper, zfit, a new alternative to RooFit written in pure Python, is presented. Most of all, zfit provides a well defined high-level API and workflow for advanced model building and fitting, together with an implementation on top of TensorFlow, allowing a transparent usage of CPUs and GPUs. It is designed to be extendable in a very simple fashion, allowing the usage of cutting-edge developments from the scientific Python ecosystem in a transparent way. The main features of zfit are introduced, and its extension to data analysis, especially in the context of HEP experiments, is discussed.
The need for good software training is essential in the HEP community. Unfortunately, current training is non-homogeneous and the definition of a common baseline is unclear, making it difficult for ...newcomers to proficiently join large collaborations such as ALICE or LHCb. In the last years, both collaborations have started separate efforts to tackle this issue through training workshops, via Analysis Tutorials (organized by the ALICE Juniors since 2014) and the Starterkit (organized by LHCb students since 2015). In 2017, ALICE and LHCb have for the first time joined efforts to provide combined training by identifying common topics, such as version control systems (Git) and programming languages (e.g. Python). Given the positive experience and feedback, this collaboration will be repeated in the future. We will illustrate the teaching methods, experience and feedback from our first common training workshop. We will also discuss our efforts to extend our format to other HEP experiments for future iterations.
ALP–mediated decays and other as-yet unobserved B decays to di-photon
final states are a challenge to select in hadron collider environments
due to the large backgrounds that come directly from the ...pp collision.
We present the strategy implemented by the LHCb experiment in 2018 to
efficiently select such photon pairs. A fast neural network topology,
implemented in the LHCb real-time selection framework achieves high
efficiency across a mass range of
4–20GeV/c
^2
2
.
We discuss implications and future prospects for the LHCb
experiment.
The LHCb Turbo stream Puig, A.
Nuclear instruments & methods in physics research. Section A, Accelerators, spectrometers, detectors and associated equipment,
07/2016, Letnik:
824
Journal Article
Recenzirano
Odprti dostop
The LHCb experiment will record an unprecedented dataset of beauty and charm hadron decays during Run II of the LHC, set to take place between 2015 and 2018. A key computing challenge is to store and ...process this data, which limits the maximum output rate of the LHCb trigger. So far, LHCb has written out a few kHz of events containing the full raw sub-detector data, which are passed through a full offline event reconstruction before being considered for physics analysis. Charm physics in particular is limited by trigger output rate constraints. A new streaming strategy includes the possibility to perform the physics analysis with candidates reconstructed in the trigger, thus bypassing the offline reconstruction. In the Turbo stream the trigger will write out a compact summary of physics objects containing all information necessary for analyses. This will allow an increased output rate and thus higher average efficiencies and smaller selection biases. This idea will be commissioned and developed during 2015 with a selection of physics analyses. It is anticipated that the turbo stream will be adopted by an increasing number of analyses during the remainder of LHC Run II (2015–2018) and ultimately in Run III (starting in 2020) with the upgraded LHCb detector.
The LHCb experiment has performed very well during the Run I of the LHC, producing a large number of relevant physics results on a wide range of topics. The preparation and commissioning of the LHCb ...experiment for Run II is discussed here, with special emphasis on the changes in the trigger strategy and the addition of a new sub-detector to improve the physics reach of the experiment. An overview of the commissioning with the first collisions delivered by the LHC is also included.
The LHCb experiment at the LHC accelerator at CERN will collide particle bunches at 40 MHz. After a first level of hardware trigger with output at 1 MHz, the physically interesting collisions will be ...selected by running dedicated trigger algorithms, the High Level Trigger (HLT), in the Online computing farm. This farm consists of 16000 CPU cores and 40 TB of storage space. Although limited by environmental constraints, the computing power is equivalent to that provided by all Tier-1's to LHCb. The HLT duty cycle follows the LHC collisions, thus it has several months of winter shutdown, as well as several shorter machine and experiment downtime periods. This work describes the strategy for using these idle resources for event reconstruction. Due to the specific features of the Online Farm, typical processing à la Tier-1 (1 file per core) is not feasible. A radically different approach has been chosen, based on parallel processing the data in farm slices of (1000) cores. Single events are read from the input files, distributed to the cluster and merged back into files once they have been processed. A detailed description of this architectural solution, the obtained performance and how it will be connected to the LHCb production system will be described.
The trigger of the LHCb experiment consists of two stages: an initial hardware trigger, and a high-level trigger implemented in a farm of CPUs. It reduces the event rate from an input of 15 MHz to ...around 5 kHz. To maximize efficiencies and minimize biases, the trigger is designed around inclusive selection algorithms, culminating in a novel boosted decision tree which enables the efficient selection of beauty hadron decays based on a robust partial reconstruction of their decay products. The design and performance of these selection algorithms will be discussed in the context of the 2012 data taking. In order to improve performance, the LHCb upgrade aims to significantly increase the rate at which the detector will be read out, and hence shift more of the workload onto the high-level trigger. It is demonstrated that the current high-level trigger architecture will be able to meet this challenge, and the expected efficiencies in several key channels are discussed in context of the LHCb upgrade.