Bayesian experimental design is a fast growing area of research with many real-world applications. As computational power has increased over the years, so has the development of simulation-based ...design methods, which involve a number of algorithms, such as Markov chain Monte Carlo, sequential Monte Carlo and approximate Bayes methods, facilitating more complex design problems to be solved. The Bayesian framework provides a unified approach for incorporating prior information and/or uncertainties regarding the statistical model with a utility function which describes the experimental aims. In this paper, we provide a general overview on the concepts involved in Bayesian experimental design, and focus on describing some of the more commonly used Bayesian utility functions and methods for their estimation, as well as a number of algorithms that are used to search over the design space to find the Bayesian optimal design. We also discuss other computational strategies for further research in Bayesian optimal design.
Within the 21 APEC economies alone, there are an estimated 200 million individuals living with a rare disease. As such, health data on these individuals, and hence patient registries, are vital. ...However, registries can come in many different forms and operating models in different jurisdictions. They possess a varying degree of functionality and are used for a variety of purposes. For instance registries can facilitate service planning as well as underpin public health and clinical research by providing de-identified data to researchers. Furthermore, registries may be used to create and disseminate new knowledge to inform clinical best practice and care, to identify and enrol participants for clinical trials, and to enable seamless integration of patient data for diagnostic testing and cascade screening. Registries that add capability such as capturing patient reported outcomes enable patients, and their carers, to become active partners in their care, rapidly furthering research and ensuring up-to-date practice-based evidence. Typically, a patient registry centres around the notion of health data 'capture', usually for only one or a small subset of the functions outlined above, thereby creating fragmented datasets that, despite the best efforts and intentions, make it difficult to exchange the right data for the right purpose to the right stakeholder under appropriate governance arrangements. Trying to incorporate maximum functionality into a registry is an obvious strategy, but monolithic software solutions are not desirable. As an alternative, we propose that it is important to incorporate analytics as core to a patient registry, rather than just utilising registries as a 'data capture' solution. We contend that embracing an analytics-centric focus makes it reasonable to imagine a future where it will be possible to evaluate the individual outcomes of health interventions in real time. The purposeful and, importantly, the repurposable application of health data will allow stakeholders to extract, create and reuse knowledge to improve health outcomes, assist clinical decision making, and improve health service design and delivery. To realise this vision, we introduce and describe the concept of a Rare Disease Registry and Analytics Platform (RD-RAP); one that we hope will make a meaningful difference to the lives of those living with a rare disease.
Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data ...scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six modern opportunities and challenges in applied Bayesian statistics: intelligent data collection, new data sources, federated analysis, inference for implicit models, model transfer and purposeful software products. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Strength auditing of European honey bee (Apis mellifera Linnaeus, 1758 Hymenoptera: Apidae) colonies is critical for apiarists to manage colony health and meet pollination contracts conditions. ...Colony strength assessments used during pollination servicing in Australia typically use a frame-top cluster-count (Number of Frames) inspection. Sensing technology has potential to improve auditing processes, and commercial temperature sensors are widely available. We evaluate the use and placement of temperature sensing technology in colony strength assessment and identify key parameters linking temperature to colony strength. Custom-built temperature sensors measured hive temperature across the top of hive brood boxes. A linear mixed-effect model including harmonic sine and cosine curves representing diurnal temperature fluctuations in hives was used to compare Number of Frames with temperature sensor data. There was a significant effect of presence of bees on hive temperature and range: hives without bees recorded a 5.5°C lower mean temperature and greater temperature ranges than hives containing live bees. Hives without bees reach peak temperature earlier than hives with bees, regardless of colony strength. Sensor placement across the width of the hive was identified as an important factor when linking sensor data with colony strength. Data from sensors nearest to the hive geometric center were found to be more closely linked to colony strength. Furthermore, a one unit increase in Number of Frames was significantly associated with a mean temperature increase of 0.36°C. This demonstrates that statistical models that account for diurnal temperature patterns could be used to predict colony strength from temperature sensor data.
In today’s modern era of big data, computationally efficient and scalable methods are needed to support timely insights and informed decision making. One such method is subsampling, where a subset of ...the big data is analysed and used as the basis for inference rather than considering the whole data set. A key question when applying subsampling approaches is how to select an informative subset based on the questions being asked of the data. A recent approach for this has been proposed based on determining subsampling probabilities for each data point, but a limitation of this approach is that the appropriate subsampling probabilities rely on an assumed model for the big data. In this article, to overcome this limitation, we propose a model robust approach where a set of models is considered, and the subsampling probabilities are evaluated based on the weighted average of probabilities that would be obtained if each model was considered singularly. Theoretical results are derived to inform such an approach. Our model robust subsampling approach is applied in a simulation study and in two real-world applications where performance is compared to current subsampling practices. The results show that our model robust approach outperforms alternative methods.
In this paper, production scheduling for rotomoulded plastics manufacturing in a multi-machine environment is considered. The objective is to minimise total tardiness. The problem has some ...commonality with hybrid flow shop scheduling with batching, where additional constraints are needed to control which machines may be used at each stage. The problem is shown to be NP-hard and is formulated as a mixed integer program. Given consequently large solve times to obtain optimal solutions, simulated annealing and tabu search algorithms were developed alongside a constructive heuristic to obtain near-optimal solutions within a practical time-frame. The solution algorithms were tuned and tested using randomly generated problem instances. The best results in terms of solution quality were generally obtained by simulated annealing. The problem instances were generated to be representative of a real production environment located in Queensland, Australia.
Streams and rivers are biodiverse and provide valuable ecosystem services. Maintaining these ecosystems is an important task, so organisations often monitor the status and trends in stream condition ...and biodiversity using field sampling and, more recently, autonomous in-situ sensors. However, data collection is often costly, so effective and efficient survey designs are crucial to maximise information while minimising costs. Geostatistics and optimal and adaptive design theory can be used to optimise the placement of sampling sites in freshwater studies and aquatic monitoring programs. Geostatistical modelling and experimental design on stream networks pose statistical challenges due to the branching structure of the network, flow connectivity and directionality, and differences in flow volume. Geostatistical models for stream network data and their unique features already exist. Some basic theory for experimental design in stream environments has also previously been described. However, open source software that makes these design methods available for aquatic scientists does not yet exist. To address this need, we present SSNdesign, an R package for solving optimal and adaptive design problems on stream networks that integrates with existing open-source software. We demonstrate the mathematical foundations of our approach, and illustrate the functionality of SSNdesign using two case studies involving real data from Queensland, Australia. In both case studies we demonstrate that the optimal or adaptive designs outperform random and spatially balanced survey designs implemented in existing open-source software packages. The SSNdesign package has the potential to boost the efficiency of freshwater monitoring efforts and provide much-needed information for freshwater conservation and management.
Background
Endocrinopathic laminitis is common in horses and ponies, but the recurrence rate of the disease is poorly defined.
Objectives
To determine the incidence of, and risk factors for, the ...recurrence of endocrinopathic laminitis.
Animals
Privately owned horses and ponies with acute laminitis (n = 317, of which 276 cases with endocrinopathic laminitis were followed up to study completion).
Methods
This prospective cohort study collected data on veterinary‐diagnosed cases of acute laminitis for 2 years. Each case was classified on acceptance to the study as endocrinopathic or non‐endocrinopathic using data collected in a questionnaire completed by the animal's veterinarian. Follow‐up data were collected at regular intervals to determine whether the laminitis recurred in the 2‐year period after diagnosis.
Results
The recurrence rate for endocrinopathic laminitis was 34.1%. The risk of recurrence during the 2‐year study period increased with basal, fasted serum insulin concentration (P ≤ .05), with the probability of recurrence increasing markedly as the insulin concentration increased beyond the normal range (0‐20 μIU/mL) to over the threshold for normal (up to approximately 45 μIU/mL). Being previously diagnosed with laminitis (before the study; P = .05) was also a risk factor for recurrent laminitis. Cases with a higher Obel grade of laminitis were likely (P = .05) to recur sooner.
Conclusions and clinical importance
Knowing that hyperinsulinemia and being previously diagnosed with laminitis are significant risk factors for recurrence will enable clinicians to proactively address these factors, thereby potentially reducing the risk of recurrence of laminitis.
Monitoring the water quality of rivers is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values or trends. However, the data are confounded by ...anomalies caused by technical issues, for which the volume and velocity of data preclude manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data collected from rivers flowing into the Great Barrier Reef. After identifying end-user needs and defining anomalies, we ranked anomaly importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. However, incorporation of multiple water-quality variables as covariates reduced performance due to complex relationships among variables. Classifications of drift and periods of anomalously low or high variability were more often correct when we applied mitigation, which replaces anomalous measurements with forecasts for further forecasting, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies and were similarly less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, however, all feature-based methods produced low false positive rates and have the benefit of not requiring training or optimization. Rule-based methods successfully detected a subset of lower priority anomalies, specifically impossible values and missing observations. We therefore suggest that a combination of methods will provide optimal performance in terms of correct anomaly detection, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users and anomaly detection developers for optimal outcomes with respect to both detection performance and end-user application. To this end, our framework has high transferability to other types of high frequency time-series data and anomaly detection applications.
The ten-step Anomaly Detection (AD) framework for high frequency water-quality data, which includes ranking the importance of different anomaly types (e.g. sudden spikes A, sudden shifts D, anomalously high variation type E), based on end-user needs and data characteristics, to inform algorithm choice, implementation and performance evaluation. Framework numbers indicate the order of steps taken. Arrows indicate directions of influence between steps. Display omitted
•High frequency water-quality data requires automated anomaly detection (AD).•Rule-based methods detected all missing, out-of-range and impossible values.•Regression and feature-based methods detected sudden spikes and level shifts well.•High false negative rates were associated with other types of anomalies, e.g. drift.•Our transferable framework selects and compares AD methods for end-user needs.
IntroductionFetal alcohol spectrum disorder (FASD) is a neurodevelopmental disorder caused by alcohol exposure during pregnancy. FASD is associated with neurodevelopmental deviations, and 50%–94% of ...children with FASD meet the Diagnostic and Statistical Manual of Mental Disorders-fifth edition diagnostic criteria for attention deficit hyperactivity disorder (ADHD). There is a paucity of evidence around medication efficacy for ADHD symptoms in children with FASD. This series of N-of-1 trials aims to provide pilot data on the feasibility of conducting N-of-1 trials in children with FASD and ADHD.Methods and analysisA pilot N-of-1 randomised trial design with 20 cycles of stimulant and placebo (four cycles of 2-week duration) for each child will be conducted (n=20) in Melbourne, Australia.Feasibility and tolerability will be assessed using recruitment and retention rates, protocol adherence, adverse events and parent ratings of side effects. Each child’s treatment effect will be determined by analysing teacher ADHD ratings across stimulant and placebo conditions (Wilcoxon rank). N-of-1 data will be aggregated to provide an estimate of the cohort treatment effect as well as individual-level treatment effects. We will assess the sample size and number of cycles required for a future trial. Potential mediating factors will be explored to identify variables that might be associated with treatment response variability.Ethics and disseminationThe study was approved by the Hospital and Health Service Human Research Ethics Committee (HREC/74678/MonH-2021-269029), Monash (protocol V6, 25 June 2023).Individual outcome data will be summarised and provided to participating carers and practitioners to enhance care. Group-level findings will be presented at a local workshop to engage stakeholders. Findings will be presented at national and international conferences and published in peer-reviewed journals. All results will be reported so that they can be used to inform prior information for future trials.Trial registration number NCT04968522.