Pairwise relatedness estimation is important in many contexts such as disease mapping and population genetics. However, all existing estimation methods are based on called genotypes, which is not ...ideal for next-generation sequencing (NGS) data of low depth from which genotypes cannot be called with high certainty.
We present a software tool, NgsRelate, for estimating pairwise relatedness from NGS data. It provides maximum likelihood estimates that are based on genotype likelihoods instead of genotypes and thereby takes the inherent uncertainty of the genotypes into account. Using both simulated and real data, we show that NgsRelate provides markedly better estimates for low-depth NGS data than two state-of-the-art genotype-based methods.
NgsRelate is implemented in C++ and is available under the GNU license at www.popgen.dk/software.
A number of different statistics are used for detecting natural selection using DNA sequencing data, including statistics that are summaries of the frequency spectrum, such as Tajima's D. These ...statistics are now often being applied in the analysis of Next Generation Sequencing (NGS) data. However, estimates of frequency spectra from NGS data are strongly affected by low sequencing coverage; the inherent technology dependent variation in sequencing depth causes systematic differences in the value of the statistic among genomic regions.
We have developed an approach that accommodates the uncertainty of the data when calculating site frequency based neutrality test statistics. A salient feature of this approach is that it implicitly solves the problems of varying sequencing depth, missing data and avoids the need to infer variable sites for the analysis and thereby avoids ascertainment problems introduced by a SNP discovery process.
Using an empirical Bayes approach for fast computations, we show that this method produces results for low-coverage NGS data comparable to those achieved when the genotypes are known without uncertainty. We also validate the method in an analysis of data from the 1000 genomes project. The method is implemented in a fast framework which enables researchers to perform these neutrality tests on a genome-wide scale.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The indigenous people of Greenland, the Inuit, have lived for a long time in the extreme conditions of the Arctic, including low annual temperatures, and with a specialized diet rich in protein and ...fatty acids, particularly omega-3 polyunsaturated fatty acids (PUFAs). A scan of Inuit genomes for signatures of adaptation revealed signals at several loci, with the strongest signal located in a cluster of fatty acid desaturases that determine PUFA levels. The selected alleles are associated with multiple metabolic and anthropometric phenotypes and have large effect sizes for weight and height, with the effect on height replicated in Europeans. By analyzing membrane lipids, we found that the selected alleles modulate fatty acid composition, which may affect the regulation of growth hormones. Thus, the Inuit have genetic and physiological adaptations to a diet rich in PUFAs.
The origin of contemporary Europeans remains contentious. We obtained a genome sequence from Kostenki 14 in European Russia dating from 38,700 to 36,200 years ago, one of the oldest fossils of ...anatomically modern humans from Europe. We find that Kostenki 14 shares a close ancestry with the 24,000-year-old Mal'ta boy from central Siberia, European Mesolithic hunter-gatherers, some contemporary western Siberians, and many Europeans, but not eastern Asians. Additionally, the Kostenki 14 genome shows evidence of shared ancestry with a population basal to all Eurasians that also relates to later European Neolithic farmers. We find that Kostenki 14 contains more Neandertal DNA that is contained in longer tracts than present Europeans. Our findings reveal the timing of divergence of western Eurasians and East Asians to be more than 36,200 years ago and that European genomic structure today dates back to the Upper Paleolithic and derives from a metapopulation that at times stretched from Europe to central Asia.
We have identified a variant in ADCY3 (encoding adenylate cyclase 3) associated with markedly increased risk of obesity and type 2 diabetes in the Greenlandic population. The variant disrupts a ...splice acceptor site, and carriers have decreased ADCY3 RNA expression. Additionally, we observe an enrichment of rare ADCY3 loss-of-function variants among individuals with type 2 diabetes in trans-ancestry cohorts. These findings provide new information on disease etiology relevant for future treatment strategies.
A timely update of a highly popular handbook on statistical genomics This new, two-volume edition of a classic text provides a thorough introduction to statistical genomics, a vital resource for ...advanced graduate students, early-career researchers and new entrants to the field. It introduces new and updated information on developments that have occurred since the 3rd edition. Widely regarded as the reference work in the field, it features new chapters focusing on statistical aspects of data generated by new sequencing technologies, including sequence-based functional assays. It expands on previous coverage of the many processes between genotype and phenotype, including gene expression and epigenetics, as well as metabolomics. It also examines population genetics and evolutionary models and inference, with new chapters on the multi-species coalescent, admixture and ancient DNA, as well as genetic association studies including causal analyses and variant interpretation. The Handbook of Statistical Genomics focuses on explaining the main ideas, analysis methods and algorithms, citing key recent and historic literature for further details and references. It also includes a glossary of terms, acronyms and abbreviations, and features extensive cross-referencing between chapters, tying the different areas together. With heavy use of up-to-date examples and references to web-based resources, this continues to be a must-have reference in a vital area of research. Provides much-needed, timely coverage of new developments in this expanding area of study Numerous, brand new chapters, for example covering bacterial genomics, microbiome and metagenomics Detailed coverage of application areas, with chapters on plant breeding, conservation and forensic genetics Extensive coverage of human genetic epidemiology, including ethical aspects Edited by one of the leading experts in the field along with rising stars as his co-editors Chapter authors are world-renowned experts in the field, and newly emerging leaders. The Handbook of Statistical Genomics is an excellent introductory text for advanced graduate students and early-career researchers involved in statistical genetics.
Understanding the physiology and genetics of human hypoxia tolerance has important medical implications, but this phenomenon has thus far only been investigated in high-altitude human populations. ...Another system, yet to be explored, is humans who engage in breath-hold diving. The indigenous Bajau people (“Sea Nomads”) of Southeast Asia live a subsistence lifestyle based on breath-hold diving and are renowned for their extraordinary breath-holding abilities. However, it is unknown whether this has a genetic basis. Using a comparative genomic study, we show that natural selection on genetic variants in the PDE10A gene have increased spleen size in the Bajau, providing them with a larger reservoir of oxygenated red blood cells. We also find evidence of strong selection specific to the Bajau on BDKRB2, a gene affecting the human diving reflex. Thus, the Bajau, and possibly other diving populations, provide a new opportunity to study human adaptation to hypoxia tolerance.
Display omitted
Display omitted
•The Bajau, or “Sea Nomads,” have engaged in breath-hold diving for thousands of years•Selection has increased Bajau spleen size, providing an oxygen reservoir for diving•We find evidence of additional diving-related phenotypes under selection•These findings have implications for hypoxia research, a pertinent medical issue
Genetic and physiological adaptations enable the remarkable breath-holding ability of marine nomads.
During the last decade genome‐wide association studies have proven to be a powerful approach to identifying disease‐causing variants. However, for admixed populations, most current methods for ...association testing are based on the assumption that the effect of a genetic variant is the same regardless of its ancestry. This is a reasonable assumption for a causal variant but may not hold for the genetic variants that are tested in genome‐wide association studies, which are usually not causal. The effects of noncausal genetic variants depend on how strongly their presence correlate with the presence of the causal variant, which may vary between ancestral populations because of different linkage disequilibrium patterns and allele frequencies. Motivated by this, we here introduce a new statistical method for association testing in recently admixed populations, where the effect size is allowed to depend on the ancestry of a given allele. Our method does not rely on accurate inference of local ancestry, yet using simulations we show that in some scenarios it gives a substantial increase in statistical power to detect associations. In addition, the method allows for testing for difference in effect size between ancestral populations, which can be used to help determine if a given genetic variant is causal. We demonstrate the usefulness of the method on data from the Greenlandic population.
A better understanding of the biological factors underlying antidepressant treatment in patients with major depressive disorder (MDD) is needed. We perform gene expression analyses and explore ...sources of variability in peripheral blood related to antidepressant treatment and treatment response in patients suffering from recurrent MDD at baseline and after 8 weeks of treatment. The study includes 281 patients, which were randomized to 8 weeks of treatment with vortioxetine (N = 184) or placebo (N = 97). To our knowledge, this is the largest dataset including both gene expression in blood and placebo-controlled treatment response measured by a clinical scale in a randomized clinical trial. We identified three novel genes whose RNA expression levels at baseline and week 8 are significantly (FDR < 0.05) associated with treatment response after 8 weeks of treatment. Among these genes were SOCS3 (FDR = 0.0039) and PROK2 (FDR = 0.0028), which have previously both been linked to depression. Downregulation of these genes was associated with poorer treatment response. We did not identify any genes that were differentially expressed between placebo and vortioxetine groups at week 8 or between baseline and week 8 of treatment. Nor did we replicate any genes identified in previous peripheral blood gene expression studies examining treatment response. Analysis of genome-wide expression variability showed that type of treatment and treatment response explains very little of the variance, a median of <0.0001% and 0.05% in gene expression across all genes, respectively. Given the relatively large size of the study, the limited findings suggest that peripheral blood gene expression might not be the best approach to explore the biological factors underlying antidepressant treatment.
Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied ...as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.