Since its earliest days, the field of behavioral medicine has leveraged technology to increase the reach and effectiveness of its interventions. Here, we highlight key areas of opportunity and ...recommend next steps to further advance intervention development, evaluation, and commercialization with a focus on three technologies: mobile applications (apps), social media, and wearable devices. Ultimately, we argue that future of digital health behavioral science research lies in finding ways to advance more robust academic-industry partnerships. These include academics consciously working towards preparing and training the work force of the twenty first century for digital health, actively working towards advancing methods that can balance the needs for efficiency in industry with the desire for rigor and reproducibility in academia, and the need to advance common practices and procedures that support more ethical practices for promoting healthy behavior.
Autoantibodies are present in healthy individuals and altered in chronic diseases. We used repeated samples collected from participants in the NYU Women's Health Study to assess autoantibody ...reproducibility and repertoire stability over a one-year period using the HuProt array. We included two samples collected one year apart from each of 46 healthy women (92 samples). We also included eight blinded replicate samples to assess laboratory reproducibility. A total of 21,211 IgG and IgM autoantibodies were interrogated. Of those, 86% of IgG (n = 18,303) and 34% of IgM (n = 7,242) autoantibodies showed adequate lab reproducibility (coefficient of variation CV < 20%). Intraclass correlation coefficients (ICCs) were estimated to assess temporal reproducibility. A high proportion of both IgG and IgM autoantibodies with CV < 20% (76% and 98%, respectively) showed excellent temporal reproducibility (ICC > 0.8). Temporal reproducibility was lower after using quantile normalization suggesting that batch variability was not an important source of error, and that normalization removed some informative biological information. To our knowledge this study is the largest in terms of sample size and autoantibody numbers to assess autoantibody reproducibility in healthy women. The results suggest that for many autoantibodies a single measurement may be used to rank individuals in studies of autoantibodies as etiologic markers of disease.
This study aimed to analyze the agreement between five bar velocity monitoring devices, currently used in resistance training, to determine the most reliable device based on reproducibility ...(between-device agreement for a given trial) and repeatability (between-trial variation for each device). Seventeen resistance-trained men performed duplicate trials against seven increasing loads (20-30-40-50-60-70-80 kg) while obtaining mean, mean propulsive and peak velocity outcomes in the bench press, full squat and prone bench pull exercises. Measurements were simultaneously registered by two linear velocity transducers (LVT), two linear position transducers (LPT), two optoelectronic camera-based systems (OEC), two smartphone video-based systems (VBS) and one accelerometer (ACC). A comprehensive set of statistics for assessing reliability was used. Magnitude of errors was reported both in absolute (m s
−1
) and relative terms (%1RM), and included the smallest detectable change (SDC) and maximum errors (MaxError). LVT was the most reliable and sensitive device (SDC 0.02–0.06 m s
−1
, MaxError 3.4–7.1% 1RM) and the preferred reference to compare with other technologies. OEC and LPT were the second-best alternatives (SDC 0.06–0.11 m s
−1
), always considering the particular margins of error for each exercise and velocity outcome. ACC and VBS are not recommended given their substantial errors and uncertainty of the measurements (SDC > 0.13 m s
−1
).
When data are not normally distributed, researchers are often uncertain whether it is legitimate to use tests that assume Gaussian errors, or whether one has to either model a more specific error ...structure or use randomization techniques. Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation. We find that Gaussian models are robust to non-normality over a wide range of conditions, meaning that
p
values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also performed well in terms of power across all simulated scenarios. Parameter estimates were mostly unbiased and precise except if sample sizes were small or the distribution of the predictor was highly skewed. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data. Hence, newly developed statistical methods not only bring new opportunities, but they can also pose new threats to reliability. We argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and particularly difficult to check during peer review. Scientists and reviewers who are not fully aware of the risks might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data.
There is broad interest to improve the reproducibility of published research. We developed a survey tool to assess the availability of digital research artifacts published alongside peer-reviewed ...journal articles (e.g. data, models, code, directions for use) and reproducibility of article results. We used the tool to assess 360 of the 1,989 articles published by six hydrology and water resources journals in 2017. Like studies from other fields, we reproduced results for only a small fraction of articles (1.6% of tested articles) using their available artifacts. We estimated, with 95% confidence, that results might be reproduced for only 0.6% to 6.8% of all 1,989 articles. Unlike prior studies, the survey tool identified key bottlenecks to making work more reproducible. Bottlenecks include: only some digital artifacts available (44% of articles), no directions (89%), or all artifacts available but results not reproducible (5%). The tool (or extensions) can help authors, journals, funders, and institutions to self-assess manuscripts, provide feedback to improve reproducibility, and recognize and reward reproducible articles as examples for others.
Correction for 'The value of universally available raw NMR data for transparency, reproducibility, and integrity in natural product research' by James B. McAlpine
et al.
,
Nat. Prod. Rep.
, 2018, ...DOI:
10.1039/c7np00064b
.
Abstract
Objectives
We assessed the interobserver and interantibody reproducibility of HER2 immunohistochemical scoring in an enriched HER2-low–expressing breast cancer cohort.
Methods
A total of 114 ...breast cancer specimens were stained by HercepTest (Agilent Dako) and PATHWAY anti-HER2 (4B5) (Ventana) antibody assays and scored by 6 breast pathologists independently using current HER2 guidelines. Level of agreement was evaluated by Cohen κ analysis.
Results
Although the interobserver agreement rate for both antibodies achieved substantial agreement, the average rate of agreement for HercepTest was significantly higher than that for the 4B5 clone (74.3% vs 65.1%; P = .002). The overall interantibody agreement rate between the 2 antibodies was 57.8%. Complete interobserver concordance was achieved in 44.7% of cases by HercepTest and 45.6% of cases by 4B5. Absolute agreement rates increased from HER2 0-1+ cases (78.1% by HercepTest and 72.2% by 4B5; moderate agreement) to 2-3+ cases (91.9% by HercepTest and 86.3% by 4B5; almost perfect agreement).
Conclusions
Our results demonstrated notable interobserver and interantibody variation on evaluating HER2 immunohistochemistry, especially in cases with scores of 0-1+, although the performance was much more improved among breast-specialized pathologists with the awareness of HER2-low concept. More accurate and reproducible methods are needed for selecting patients who may benefit from the newly approved HER2-targeting agent on HER2-low breast cancers.
Radiomics is an active area of research in medical image analysis, however poor reproducibility of radiomics has hampered its application in clinical practice. This issue is especially prominent when ...radiomic features are calculated from noisy images, such as low dose computed tomography (CT) scans. In this article, we investigate the possibility of improving the reproducibility of radiomic features calculated on noisy CTs by using generative models for denoising. Our work concerns two types of generative models-encoder-decoder network (EDN) and conditional generative adversarial network (CGAN). We then compared their performance against a more traditional 'non-local means' denoising algorithm. We added noise to sinograms of full dose CTs to mimic low dose CTs with two levels of noise: low-noise CT and high-noise CT. Models were trained on high-noise CTs and used to denoise low-noise CTs without re-training. We tested the performance of our model in real data, using a dataset of same-day repeated low dose CTs in order to assess the reproducibility of radiomic features in denoised images. EDN and the CGAN achieved similar improvements on the concordance correlation coefficients (CCC) of radiomic features for low-noise images from 0.87 95%CI, (0.833, 0.901) to 0.92 95%CI, (0.909, 0.935) and for high-noise images from 0.68 95%CI, (0.617, 0.745) to 0.92 95%CI, (0.909, 0.936), respectively. The EDN and the CGAN improved the test-retest reliability of radiomic features (mean CCC increased from 0.89 95%CI, (0.881, 0.914) to 0.94 95%CI, (0.927, 0.951)) based on real low dose CTs. These results show that denoising using EDN and CGANs could be used to improve the reproducibility of radiomic features calculated from noisy CTs. Moreover, images at different noise levels can be denoised to improve the reproducibility using the above models without need for re-training, provided the noise intensity is not excessively greater that of the high-noise CTs. To the authors' knowledge, this is the first effort to improve the reproducibility of radiomic features calculated on low dose CT scans by applying generative models.
Issues are still raised even now in the 21st century by the persistent concern with achieving rigor in qualitative research. There is also a continuing debate about the analogous terms reliability ...and validity in naturalistic inquiries as opposed to quantitative investigations. This article presents the concept of rigor in qualitative research using a phenomenological study as an exemplar to further illustrate the process. Elaborating on epistemological and theoretical conceptualizations by Lincoln and Guba, strategies congruent with qualitative perspective for ensuring validity to establish the credibility of the study are described. A synthesis of the historical development of validity criteria evident in the literature during the years is explored. Recommendations are made for use of the term rigor instead of trustworthiness and the reconceptualization and renewed use of the concept of reliability and validity in qualitative research, that strategies for ensuring rigor must be built into the qualitative research process rather than evaluated only after the inquiry, and that qualitative researchers and students alike must be proactive and take responsibility in ensuring the rigor of a research study. The insights garnered here will move novice researchers and doctoral students to a better conceptual grasp of the complexity of reliability and validity and its ramifications for qualitative inquiry.
Here, the present research situation of all inorganic thermometers based on fluorescence intensity ratio (FIR) technology is reviewed. The thermometers are classified in detail based on type of ...luminescence center, and the principle equations of the thermometers are derived. The results show that the temperature sensing principles of single emission center and dual emission centers are similar. Further, the dual emission centers thermometers are classified into four different types and their characteristics are analyzed. The performance parameters of the thermometer, absolute sensitivity, relative sensitivity, resolution and repeatability have been discussed, respectively. The analysis results of a large number of studies show that the sensitivity is affected by the matrix phonon energy, crystal coordination environment, material size, doping concentration, etc. Inorganic optical thermometers show great potential in non-contact temperature sensing due to the excellent repeatability of inorganic materials. We summarize the current difficulties and look forward to the future of thermometers. Therefore, the review has positive effect on the development of inorganic FIR thermometers towards excellent performance.