In this paper, we mine full mtDNA sequences from an exome capture data set of 2000 Danes, showing that it is possible to get high-quality full-genome sequences of the mitochondrion from this ...resource. The sample includes 1000 individuals with type 2 diabetes and 1000 controls. We characterise the variation found in the mtDNA sequence in Danes and relate the variation to diabetes risk as well as to several blood phenotypes of the controls but find no significant associations. We report 2025 polymorphisms, of which 393 have not been reported previously. These 393 mutations are both very rare and estimated to be caused by very recent mutations but individuals with type 2 diabetes do not possess more of these variants. Population genetics analysis using Bayesian skyline plot shows a recent history of rapid population growth in the Danish population in accordance with the fact that >40% of variable sites are observed as singletons.
Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority ...of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non‐model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low‐depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low‐depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.
Cytokine response plays a vital role in various human lipopolysaccharide (LPS) infectious and inflammatory diseases. This study aimed to find genetic variants that might affect the levels of ...LPS-induced interleukin (IL)-6, IL-8, IL-10, IL-1ra and tumor necrosis factor (TNF)-α cytokine production.
We performed an initial genome-wide association study using Affymetrix Human Mapping 500 K GeneChip® to screen 130 healthy individuals of Danish descent. The levels of IL-6, IL-8, IL-10, IL-1ra and TNF-α in 24-hour LPS-stimulated whole blood samples were compared within different genotypes. The 152 most significant SNPs were replicated using Illumina Golden Gate® GeneChip in an independent cohort of 186 Danish individuals. Next, 9 of the most statistical significant SNPs were replicated using PCR-based genotyping in an independent cohort of 400 Danish individuals. All results were analyzed in a combined study among the 716 Danish individuals.
Only one marker of the 500 K Gene Chip in the discovery study showed a significant association with LPS-induced IL-1ra cytokine levels after Bonferroni correction (P<10(-7)). However, this SNP was not associated with the IL-1ra cytokine levels in the replication dataset. No SNPs reached genome-wide significance for the five cytokine levels in the combined analysis of all three stages.
The associations between the genetic variants and the LPS-induced IL-6, IL-8, IL-10, IL-1ra and TNF-α cytokine levels were not significant in the meta-analysis. This present study does not support a strong genetic effect of LPS-stimulated cytokine production; however, the potential for type II errors should be considered.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract
Motivation
Principal component analysis (PCA) is a commonly used tool in genetics to capture and visualize population structure. Due to technological advances in sequencing, such as the ...widely used non-invasive prenatal test, massive datasets of ultra-low coverage sequencing are being generated. These datasets are characterized by having a large amount of missing genotype information.
Results
We present EMU, a method for inferring population structure in the presence of rampant non-random missingness. We show through simulations that several commonly used PCA methods cannot handle missing data arisen from various sources, which leads to biased results as individuals are projected into the PC space based on their amount of missingness. In terms of accuracy, EMU outperforms an existing method that also accommodates missingness while being competitively fast. We further tested EMU on around 100K individuals of the Phase 1 dataset of the Chinese Millionome Project, that were shallowly sequenced to around 0.08×. From this data we are able to capture the population structure of the Han Chinese and to reproduce previous analysis in a matter of CPU hours instead of CPU years. EMU’s capability to accurately infer population structure in the presence of missingness will be of increasing importance with the rising number of large-scale genetic datasets.
Availability and implementation
EMU is written in Python and is freely available at https://github.com/rosemeis/emu.
Supplementary information
Supplementary data are available at Bioinformatics online.
In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate ...machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Population genetic studies usually consist of individuals of diverse ancestries, and inference of population structure therefore plays an important role in population genetics and association ...studies. structure importance. Here we present PCAngsd, a framework for analyzing low depth next-generation sequencing (NGS) data in heterogeneous populations using principal component analysis (PCA). NGS methods provide large amounts of genetic data but are associated with statistical uncertainty for low depth sequencing data which is used in large-scale population studies due to cost limitations. Probabilistic methods have therefore been developed to take this uncertainty into account when estimating population genetic parameters by using genotype likelihoods and external information. We have developed two new methods for inferring population structure. The first method is using an Empirical Bayes method to estimate individual allele frequencies based on genotype dosages in an iterative approach of inferring population structure. The estimated individual allele frequencies are then used as prior information to estimate a covariance matrix and perform PCA. The second method uses the estimated individual allele frequencies of the first method to estimate admixture proportions based on a fast non-negative matrix factorization (NMF) algorithm. The method for performing PCA outperforms existing methods in both simulated and real low depth NGS datasets, while the method for estimating admixture proportions produces comparable results to other methods with shorter run-times.
There has recently been considerable interest in detecting natural selection in the human genome. Selection will usually tend to increase identity-by-descent (IBD) among individuals in a population, ...and many methods for detecting recent and ongoing positive selection indirectly take advantage of this. In this article we show that excess IBD sharing is a general property of natural selection and we show that this fact makes it possible to detect several types of selection including a type that is otherwise difficult to detect: selection acting on standing genetic variation. Motivated by this, we use a recently developed method for identifying IBD sharing among individuals from genome-wide data to scan populations from the new HapMap phase 3 project for regions with excess IBD sharing in order to identify regions in the human genome that have been under strong, very recent selection. The HLA region is by far the region showing the most extreme signal, suggesting that much of the strong recent selection acting on the human genome has been immune related and acting on HLA loci. As equilibrium overdominance does not tend to increase IBD, we argue that this type of selection cannot explain our observations.
Chip-based high-throughput genotyping has facilitated genome-wide studies of genetic diversity. Many studies have utilized these large data sets to make inferences about the demographic history of ...human populations using measures of genetic differentiation such as F(ST) or principal component analyses. However, the single nucleotide polymorphism (SNP) chip data suffer from ascertainment biases caused by the SNP discovery process in which a small number of individuals from selected populations are used as discovery panels. In this study, we investigate the effect of the ascertainment bias on inferences regarding genetic differentiation among populations in one of the common genome-wide genotyping platforms. We generate SNP genotyping data for individuals that previously have been subject to partial genome-wide Sanger sequencing and compare inferences based on genotyping data to inferences based on direct sequencing. In addition, we also analyze publicly available genome-wide data. We demonstrate that the ascertainment biases will distort measures of human diversity and possibly change conclusions drawn from these measures in some times unexpected ways. We also show that details of the genotyping calling algorithms can have a surprisingly large effect on population genetic inferences. We not only present a correction of the spectrum for the widely used Affymetrix SNP chips but also show that such corrections are difficult to generalize among studies.
Model based methods for genetic clustering of individuals, such as those implemented in structure or ADMIXTURE, allow the user to infer individual ancestries and study population structure. The ...underlying model makes several assumptions about the demographic history that shaped the analysed genetic data. One assumption is that all individuals are a result of K homogeneous ancestral populations that are all well represented in the data, while another assumption is that no drift happened after the admixture event. The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading. We propose a method to evaluate the fit of admixture models based on estimating the correlation of the residual difference between the true genotypes and the genotypes predicted by the model. When the model assumptions are not violated, the residuals from a pair of individuals are not correlated. In the case of a bad fitting admixture model, individuals with similar demographic histories have a positive correlation of their residuals. Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and nondiscrete ancestral populations. We have implemented the method as an open source software that can be applied to both unphased genotypes and low depth sequencing data.
The growth hormone secretagogue receptor (GHSR) is mediating hunger sensation when stimulated by its natural ligand ghrelin. In the present study, we tested the hypothesis that common and rare ...variation in the GHSR locus are related to increased prevalence of obesity and overweight among Whites.
In a population-based study sample of 15,854 unrelated, middle-aged Danes, seven variants were genotyped to capture common variation in an 11 kbp region including GHSR. These were investigated for their individual and haplotypic association with obesity. None of these analyses revealed consistent association with measures of obesity. A -151C/T promoter mutation in the GHSR was found in two unrelated obese patients. One family presented with complete co-segregation, but the other with incomplete co-segregation. The mutation resulted in an increased transcriptional activity (p<0.02) and introduction of a specific binding for Sp-1-like nuclear extracts relative to the wild type. The -151C/T mutation was genotyped in the 15,854 Danes with a minor allele frequency of 0.01%. No association with obesity in carriers (mean BMI: 27+/-4 kg/m(2)) versus non-carriers (mean BMI: 28+/-5 kg/m(2)) (p>0.05) could be shown.
In a population-based study sample of 15,854 Danes no association between GHSR genotypes and measures of obesity and overweight was found. Also, analyses of GHSR haplotypes lack consistent associations with obesity related traits. A rare functional GHSR promoter mutation variant was identified, yet there was no consistent relationship with obesity in neither family- nor population-based studies.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK