Risk-of-bias assessments are now a standard component of systematic reviews. At present, reviewers need to manually identify relevant parts of research articles for a set of methodological elements ...that affect the risk of bias, in order to make a risk-of-bias judgement for each of these elements. We investigate the use of text mining methods to automate risk-of-bias assessments in systematic reviews. We aim to identify relevant sentences within the text of included articles, to rank articles by risk of bias and to reduce the number of risk-of-bias assessments that the reviewers need to perform by hand.
We use supervised machine learning to train two types of models, for each of the three risk-of-bias properties of sequence generation, allocation concealment and blinding. The first model predicts whether a sentence in a research article contains relevant information. The second model predicts a risk-of-bias value for each research article. We use logistic regression, where each independent variable is the frequency of a word in a sentence or article, respectively.
We found that sentences can be successfully ranked by relevance with area under the receiver operating characteristic (ROC) curve (AUC) > 0.98. Articles can be ranked by risk of bias with AUC > 0.72. We estimate that more than 33% of articles can be assessed by just one reviewer, where two reviewers are normally required.
We show that text mining can be used to assist risk-of-bias assessments.
Abstract
Background
Measurement error in exposures and confounders can bias exposure–outcome associations but is rarely considered. We aimed to assess random measurement error of all continuous ...variables in UK Biobank and explore approaches to mitigate its impact on exposure–outcome associations.
Methods
Random measurement error was assessed using intraclass correlation coefficients (ICCs) for all continuous variables with repeat measures. Regression calibration was used to correct for random error in exposures and confounders, using the associations of red blood cell distribution width (RDW), C-reactive protein (CRP) and 25-hydroxyvitamin D 25(OH)D with mortality as illustrative examples.
Results
The 2858 continuous variables with repeat measures varied in sample size from 109 to 49 121. They fell into three groups: (i) baseline visit 529 variables; median (interquartile range) ICC = 0.64 (0.57, 0.83); (ii) online diet by 24-h recall 22 variables; 0.35 (0.30, 0.40) and (iii) imaging measures 2307 variables; 0.85 (0.73, 0.94). Highest ICCs were for anthropometric and medical history measures, and lowest for dietary and heart magnetic resonance imaging.
The ICCs (95% confidence interval) for RDW, CRP and 25(OH)D were 0.52 (0.51, 0.53), 0.29 (0.27, 0.30) and 0.55 (0.54, 0.56), respectively. Higher RDW and levels of CRP were associated with higher risk of all-cause mortality, and higher concentration of 25(OH)D with lower risk. After correction for random measurement error in the main exposure, the associations all strengthened. Confounder correction did not influence estimates.
Conclusions
Random measurement error varies widely and is often non-negligible. For UK Biobank we provide relevant statistics and adaptable code to help other researchers explore and correct for this.
Observational cohort studies can provide rich datasets with a diverse range of phenotypic variables. However, hypothesis-driven epidemiological analyses by definition only test particular hypotheses ...chosen by researchers. Furthermore, observational analyses may not provide robust evidence of causality, as they are susceptible to confounding, reverse causation and measurement error. Using body mass index (BMI) as an exemplar, we demonstrate a novel extension to the phenome-wide association study (pheWAS) approach, using automated screening with genotypic instruments to screen for causal associations amongst any number of phenotypic outcomes. We used a sample of 8,121 children from the ALSPAC dataset, and tested the linear association of a BMI-associated allele score with 172 phenotypic outcomes (with variable sample sizes). We also performed an instrumental variable analysis to estimate the causal effect of BMI on each phenotype. We found 21 of the 172 outcomes were associated with the allele score at an unadjusted p < 0.05 threshold, and use Bonferroni corrections, permutation testing and estimates of the false discovery rate to consider the strength of results given the number of tests performed. The most strongly associated outcomes included leptin, lipid profile, and blood pressure. We also found novel evidence of effects of BMI on a global self-worth score.
AbstractObjectivesTo investigate whether the association between subjective wellbeing (subjective happiness and life satisfaction) and cardiometabolic health is causal.DesignTwo sample, bidirectional ...mendelian randomisation study.SettingGenetic data taken from various cohorts comprised of the general population (mostly individuals of European ancestry, plus a small proportion of other ancestries); follow-up analysis included individuals from the United Kingdom.ParticipantsSummary data were used from previous genome wide association studies (number of participants ranging from 83 198 to 339 224), which investigated traits related to cardiovascular or metabolic health, had the largest sample sizes, and consisted of the most similar populations while minimising sample overlap. A follow-up analysis included 337 112 individuals from the UK Biobank (54% female (n=181 363), mean age 56.87 years (standard deviation 8.00) at recruitment).Main outcome measuresSubjective wellbeing and 11 measures of cardiometabolic health (coronary artery disease; myocardial infarction; total, high density lipoprotein, and low density lipoprotein cholesterol; diastolic and systolic blood pressure; body fat; waist to hip ratio; waist circumference; and body mass index).ResultsEvidence of a causal effect of body mass index on subjective wellbeing was seen; each 1 kg/m2 increase in body mass index caused a −0.045 (95% confidence interval −0.084 to −0.006, P=0.02) standard deviation reduction in subjective wellbeing. Follow-up analysis of this association in an independent sample from the UK Biobank provided strong evidence of an effect of body mass index on satisfaction with health (β=−0.035 unit decrease in health satisfaction (95% confidence interval −0.043 to −0.027) per standard deviation increase in body mass index, P<0.001). No clear evidence of a causal effect was seen between subjective wellbeing and the other cardiometabolic health measures, in either direction.ConclusionsThese results suggest that a higher body mass index is associated with a lower subjective wellbeing. A follow-up analysis confirmed this finding, suggesting that the effect in middle aged people could be driven by satisfaction with health. Body mass index is a modifiable determinant, and therefore, this study provides further motivation to tackle the obesity epidemic because of the knock-on effects of higher body mass index on subjective wellbeing.
Abstract
Summary
Existing ways of accessing data from the Reactome database are limited. Either a researcher is restricted to particular queries defined by a web application programming interface ...(API) or they have to download the whole database. Reactome Pengine is a web service providing a logic programming-based API to the human reactome. This gives researchers greater flexibility in data access than existing APIs, as users can send their own small programs (alongside queries) to Reactome Pengine.
Availability and implementation
The server and an example notebook can be found at https://apps.nms.kcl.ac.uk/reactome-pengine. Source code is available at https://github.com/samwalrus/reactome-pengine and a Docker image is available at https://hub.docker.com/r/samneaves/rp4/.
Supplementary information
Supplementary data are available at Bioinformatics online.
Analysis of physical activity usually focuses on a small number of summary statistics derived from accelerometer recordings: average counts per minute and the proportion of time spent in ...moderate-vigorous physical activity or in sedentary behaviour. We show how bigrams, a concept from the field of text mining, can be used to describe how a person's activity levels change across (brief) time points. These variables can, for instance, differentiate between two people spending the same time in moderate activity, where one person often stays in moderate activity from one moment to the next and the other does not.
We use data on 4810 participants of the Avon Longitudinal Study of Parents and Children (ALSPAC). We generate a profile of bigram frequencies for each participant and test the association of each frequency with body mass index (BMI), as an exemplar.
We found several associations between changes in bigram frequencies and BMI. For instance, a one standard deviation decrease in the number of adjacent minutes in sedentary then moderate activity (or vice versa), with a corresponding increase in the number of adjacent minutes in moderate then vigorous activity (or vice versa), was associated with a 2.36 kg/m2 lower BMI 95% confidence interval (CI): -3.47, -1.26, after accounting for the time spent in sedentary, low, moderate and vigorous activity.
Activity bigrams are novel variables that capture how a person's activity changes from one moment to the next. These variables can be used to investigate how sequential activity patterns associate with other traits.
Non-random selection of analytic subsamples could introduce selection bias in observational studies. We explored the potential presence and impact of selection in studies of SARS-CoV-2 infection and ...COVID-19 prognosis.
We tested the association of a broad range of characteristics with selection into COVID-19 analytic subsamples in the Avon Longitudinal Study of Parents and Children (ALSPAC) and UK Biobank (UKB). We then conducted empirical analyses and simulations to explore the potential presence, direction and magnitude of bias due to this selection (relative to our defined UK-based adult target populations) when estimating the association of body mass index (BMI) with SARS-CoV-2 infection and death-with-COVID-19.
In both cohorts, a broad range of characteristics was related to selection, sometimes in opposite directions (e.g. more-educated people were more likely to have data on SARS-CoV-2 infection in ALSPAC, but less likely in UKB). Higher BMI was associated with higher odds of SARS-CoV-2 infection and death-with-COVID-19. We found non-negligible bias in many simulated scenarios.
Analyses using COVID-19 self-reported or national registry data may be biased due to selection. The magnitude and direction of this bias depend on the outcome definition, the true effect of the risk factor and the assumed selection mechanism; these are likely to differ between studies with different target populations. Bias due to sample selection is a key concern in COVID-19 research based on national registry data, especially as countries end free mass testing. The framework we have used can be applied by other researchers assessing the extent to which their results may be biased for their research question of interest.
Sex hormone-binding globulin (SHBG) is a circulating glycoprotein and a regulator of sex hormone levels, which has been shown to influence various traits and diseases. The molecular nature of SHBG ...makes it a feasible target for preventative or therapeutic interventions. A systematic study of its effects across the human phenome may uncover novel associations.
We used a Mendelian randomization phenome-wide association study (MR-pheWAS) approach to systematically appraise the potential functions of SHBG while reducing potential biases such as confounding and reverse causation common to the literature. We searched for potential causal effects of SHBG in UK Biobank (N = 334 977) and followed-up our top findings using two-sample MR analyses to evaluate whether estimates may be biased due to horizontal pleiotropy.
Results of the MR-pheWAS across over 21 000 outcome phenotypes identified 12 phenotypes associated with genetically elevated SHBG after Bonferroni correction for multiple testing. Follow-up analysis using two-sample MR indicated the associations of increased natural log SHBG with higher impedance of the arms and whole body, lower pulse rate, lower bone density, higher odds of hip replacement, lower odds of high cholesterol or cholesterol medication use and higher odds of gallbladder removal.
Our systematic MR-pheWAS of SHBG, which was comprehensive to the range of phenotypes available in UK Biobank, suggested that higher circulating SHBG affects the body impedance, bone density and cholesterol levels, among others. These phenotypes should be prioritized in future studies aiming to investigate the biological effects of SHBG or develop targets for therapeutic intervention.
Abstract
Continuous glucose monitors (CGM) record interstitial glucose levels ‘continuously’, producing a sequence of measurements for each participant (e.g. the average glucose level every 5 min ...over several days, both day and night). To analyse these data, researchers tend to derive summary variables such as the area under the curve (AUC), to then use in subsequent analyses. To date, a lack of consistency and transparency of precise definitions used for these summary variables has hindered interpretation, replication and comparison of results across studies. We present GLU, an open-source software package for deriving a consistent set of summary variables from CGM data. GLU performs quality control of each CGM sample (e.g. addressing missing data), derives a diverse set of summary variables (e.g. AUC and proportion of time spent in hypo-, normo- and hyper- glycaemic levels) covering six broad domains, and outputs these (with quality control information) to the user. GLU is implemented in R and is available on GitHub at https://github.com/MRCIEU/GLU. Git tag v0.2 corresponds to the version presented here.