Null hypothesis significance testing (NHST) is among the most frequently employed methods in the biomedical sciences. However, the problems of NHST and p-values have been discussed widely and various ...Bayesian alternatives have been proposed. Some proposals focus on equivalence testing, which aims at testing an interval hypothesis instead of a precise hypothesis. An interval hypothesis includes a small range of parameter values instead of a single null value and the idea goes back to Hodges and Lehmann. As researchers can always expect to observe some (although often negligibly small) effect size, interval hypotheses are more realistic for biomedical research. However, the selection of an equivalence region (the interval boundaries) often seems arbitrary and several Bayesian approaches to equivalence testing coexist.
A new proposal is made how to determine the equivalence region for Bayesian equivalence tests based on objective criteria like type I error rate and power. Existing approaches to Bayesian equivalence testing in the two-sample setting are discussed with a focus on the Bayes factor and the region of practical equivalence (ROPE). A simulation study derives the necessary results to make use of the new method in the two-sample setting, which is among the most frequently carried out procedures in biomedical research.
Bayesian Hodges-Lehmann tests for statistical equivalence differ in their sensitivity to the prior modeling, power, and the associated type I error rates. The relationship between type I error rates, power and sample sizes for existing Bayesian equivalence tests is identified in the two-sample setting. Results allow to determine the equivalence region based on the new method by incorporating such objective criteria. Importantly, results show that not only can prior selection influence the type I error rate and power, but the relationship is even reverse for the Bayes factor and ROPE based equivalence tests.
Based on the results, researchers can select between the existing Bayesian Hodges-Lehmann tests for statistical equivalence and determine the equivalence region based on objective criteria, thus improving the reproducibility of biomedical research.
Geophysical processes of the pre-earthquake activities are difficult to be determined since less pre-seismic signal is observed directly. Crustal density changes derived from the periodical ...terrestrial gravimetry may provide meaningful deep information for the pre-earthquake cue. In this study, the crustal density changes following the 2016 MS6.4 Menyuan earthquake are estimated using ground-based gravity-change data from 2011 to 2015 in the northeastern Tibetan Plateau. The results show that negative density changes dominate the region between the South Longshou Mountain fault and the Daban Mountain fault except the southeast of this region (the seismic region) during 2011–2012. Positive density changes appeared in the middle crust near the epicenter during 2012–2013 and in the upper and middle crust east of the epicenter approximately 1.5 years before the earthquake (2013–2014), and then negative density changes appeared under and northeast of the epicenter approximately four months before the earthquake (2014–2015). The state of the crustal materials near the seismic region changed from convergence to expansion, in turn, indicating that the characteristics of the deep seismogenic process was corresponding to Amos Nur’s 1974 dilatancy-fluid diffusion model.
The aims are to compare the results of presumptive drug testing with confirmation of positives vs. direct-to-definitive drug testing, combined with investigation of urine vs. oral fluid as test ...matrices. Methods: Paired oral fluid and urine specimens were collected voluntarily and anonymously from 1098 individuals applying for methadone treatment in 11 clinics across 7 U.S. states. All specimens were analyzed by immunoassay (IA) and liquid chromatography-tandem mass spectrometry (LC-MS-MS). Results: Confirmed IA prevalences for urine were significantly higher than for oral fluid for 7 out of 10 drug classes - benzodiazepines, cannabis, cocaine, methadone, opiates, oxycodone and tramadol. Drug prevalences by direct-to-definitive LC-MS-MS were either the same or higher than prevalences by confirmed IA. Drug prevalences by LC-MS-MS were higher in urine for two drug classes (cocaine, methadone) and higher in oral fluid for two drug classes (buprenorphine, tramadol), but were equivalent in urine and oral fluid when averaged over all 10 drug classes. Certain drugs of special concern such as heroin and buprenorphine were more frequently detected in oral fluid than urine. Conclusions: Urine analysis showed some technical advantage over oral fluid in sensitivity to several drug classes within a confirmed IA testing protocol, but this may be outweighed if there is reason to believe that tampering with urine specimens is a significant problem. Overall drug detection by direct-to-definitive testing was similar for oral fluid and urine, but one matrix may be preferable if there is a particular drug of clinical or epidemiological interest.
•Urine analysis was more sensitive than oral fluid to seven out of 10 classes of drugs within a confirmed immunoassay (IA) testing protocol.•Direct-to-definitive testing yielded higher prevalences of use for most drug classes than confirmed IA; this was most evident for oral fluid.•Direct-to-definitive testing yielded slightly higher prevalences for urine for two drug classes and for oral fluid for two drug classes.•The study will assist treatment administrators, clinicians and researchers to choose optimal drug testing for their specific environments.
In this paper, a Conductivity Invariance Phenomenon with a controlled lift-off is discovered and studied. It is found that at certain lift-off, the effect of conductivity influence on inductance is ...eliminated/reduced. Based on this phenomenon, a novel permeability measurement approach is proposed. The proposed approach was verified by both simulation and experimental data. And the permeability can be estimated in a reasonable accuracy (with an error of 2.86%) by the proposed approach without the influence of its conductivity.
Summary
An experimental investigation of hospital building equipment is presented. Dynamic properties and seismic performance of typical ambulatory freestanding cabinets are assessed by ...unidirectional and bidirectional shake table tests, also considering the presence of internal partitions and cabinet contents. Vulnerability analysis is performed according to the most recent and reliable assessment methods, evaluating the influence of different parameters of the sample cabinets. The performance criteria referred within this research are the limit states reached by the specimens (ie, rocking and overturning) and by their contents (ie, overturning and breaking). Fragility curves are evaluated for the components and the contents, considering both acceleration and velocity intensity measures, and also using dimensionless intensity measures developed in recent studies. The outcomes of the present study confirm the findings of previous laboratory tests and numerical simulations carried out by the same authors and provide a further insight for the reliable seismic performance assessment of hospital cabinets and their contents.
Test Selection for Deep Learning Systems Ma, Wei; Papadakis, Mike; Tsakmalis, Anestis ...
ACM transactions on software engineering and methodology,
03/2021, Volume:
30, Issue:
2
Journal Article
Peer reviewed
Open access
Testing of deep learning models is challenging due to the excessive number and complexity of the computations involved. As a result, test data selection is performed manually and in an ad hoc way. ...This raises the question of how we can automatically select candidate data to test deep learning models. Recent research has focused on defining metrics to measure the thoroughness of a test suite and to rely on such metrics to guide the generation of new tests. However, the problem of selecting/prioritising test inputs (e.g., to be labelled manually by humans) remains open. In this article, we perform an in-depth empirical comparison of a set of test selection metrics based on the notion of model uncertainty (model confidence on specific inputs). Intuitively, the more uncertain we are about a candidate sample, the more likely it is that this sample triggers a misclassification. Similarly, we hypothesise that the samples for which we are the most uncertain are the most informative and should be used in priority to improve the model by retraining. We evaluate these metrics on five models and three widely used image classification problems involving real and artificial (adversarial) data produced by five generation algorithms. We show that uncertainty-based metrics have a strong ability to identify misclassified inputs, being three times stronger than surprise adequacy and outperforming coverage-related metrics. We also show that these metrics lead to faster improvement in classification accuracy during retraining: up to two times faster than random selection and other state-of-the-art metrics on all models we considered.
This paper presents the results and interpretations of static and dynamic tests that were executed on a newly built cable-stayed steel-concrete composite bridge during the final proof testing. A ...brief description of the structure, the testing methodology, and the used instrumentation are presented. Then, the test results are widely discussed and interpreted in order to evaluate the bridge performance during the proof test and also to understand the usefulness of each performed test in a proof test framework. All the collected experimental data are also compared to the numerical ones that were obtained through a refined finite element model, in order to check the behavior of the structure. The outcomes of the present work can offer references for the proof testing and monitoring of cable-stayed bridges.
Damage is an inevitable occurrence in metallic structures and when unchecked could result in a catastrophic breakdown of structural assets. Non-destructive evaluation (NDE) is adopted in industries ...for assessment and health inspection of structural assets. Prominent among the NDE techniques is guided wave ultrasonic testing (GWUT). This method is cost-effective and possesses an enormous capability for long-range inspection of corroded structures, detection of sundries of crack and other metallic damage structures at low frequency and energy attenuation. However, the parametric features of the GWUT are affected by structural and environmental operating conditions and result in masking damage signal. Most studies focused on identifying individual damage under varying conditions while combined damage phenomena can coexist in structure and hasten its deterioration. Hence, it is an impending task to study the effect of combined damage on a structure under varying conditions and correlate it with GWUT parametric features. In this respect, this work reviewed the literature on UGWs, damage inspection, severity, temperature influence on the guided wave and parametric characteristics of the inspecting wave. The review is limited to the piezoelectric transduction unit. It was keenly observed that no significant work had been done to correlate the parametric feature of GWUT with combined damage effect under varying conditions. It is therefore proposed to investigate this impending task.
The ultimate strength of most structural materials is mainly limited by the presence of microscopic imperfections serving as nuclei of the fracture process. Since these nuclei are considerably ...shorter than the acoustic wavelength at the frequencies normally used in ultrasonic nondestructive evaluation (NDE), linear acoustic characteristics are not sufficiently sensitive to this kind of microscopic degradation of the material's integrity. On the other hand, even very small imperfections can produce very significant excess nonlinearity which can be orders of magnitude higher than the intrinsic nonlinearity of the intact material. The excess nonlinearity is produced mainly by the strong local nonlinearity of microcracks whose opening is smaller than the particle displacement. Parametric modulation via crack-closure significantly increases the stress-dependence of fatigued materials. A special experimental technique was introduced to measure the second-order acousto-elastic coefficient in a great variety of materials including plastics, metals, composites and adhesives. Experimental results are presented to illustrate that the nonlinear acoustic parameters are earlier and more sensitive indicators of fatigue damage than their linear counterparts.
Test cases are crucial to help developers preventing the introduction of software faults. Unfortunately, not all the tests are properly designed or can effectively capture faults in production code. ...Some measures have been defined to assess test-case effectiveness: the most relevant one is the mutation score, which highlights the quality of a test by generating the so-called mutants , i.e., variations of the production code that make it faulty and that the test is supposed to identify. However, previous studies revealed that mutation analysis is extremely costly and hard to use in practice. The approaches proposed by researchers so far have not been able to provide practical gains in terms of mutation testing efficiency. This leaves the problem of efficiently assessing test-case effectiveness as still open. In this paper, we investigate a novel, orthogonal, and lightweight methodology to assess test-case effectiveness: in particular, we study the feasibility to exploit production and test-code-quality indicators to estimate the mutation score of a test case. We first select a set of 67 factors and study their relation with test-case effectiveness. Then, we devise a mutation score estimation model exploiting such factors and investigate its performance as well as its most relevant features. The key results of the study reveal that our estimation model only based on static features has 86 percent of both F-Measure and AUC-ROC. This means that we can estimate the test-case effectiveness, using source-code-quality indicators, with high accuracy and without executing the tests. As a consequence, we can provide a practical approach that is beyond the typical limitations of current mutation testing techniques.