The original version of this Article contained an error in Fig. 2. In panel a, the two legend items "rare" and "common" were inadvertently swapped. This has been corrected in both the PDF and HTML ...versions of the Article.
Pairwise relatedness plays an important role in a range of genetic research fields. However, currently only few estimators exist for individuals that are admixed, i.e. have ancestry from more than ...one population, and these estimators fail in some situations.
We present a new software tool, RelateAdmix, for obtaining maximum likelihood estimates of pairwise relatedness from genetic data between admixed individuals. We show using simulated data that it gives rise to better estimates than three state-of-the-art software tools, REAP, KING and Plink, while still being fast enough to be applicable to large datasets.
The software tool, implemented in C and R, is freely available from www.popgen.dk/software.
Gorongosa National Park in Mozambique hosts a large population of baboons, numbering over 200 troops. Gorongosa baboons have been tentatively identified as part of Papio ursinus on the basis of ...previous limited morphological analysis and a handful of mitochondrial DNA sequences. However, a recent morphological and morphometric analysis of Gorongosa baboons pinpointed the occurrence of several traits intermediate between P. ursinus and P. cynocephalus, leaving open the possibility of past and/or ongoing gene flow in the baboon population of Gorongosa National Park. In order to investigate the evolutionary history of baboons in Gorongosa, we generated high and low coverage whole genome sequence data of Gorongosa baboons and compared it to available Papio genomes.
We confirmed that P. ursinus is the species closest to Gorongosa baboons. However, the Gorongosa baboon genomes share more derived alleles with P. cynocephalus than P. ursinus does, but no recent gene flow between P. ursinus and P. cynocephalus was detected when available Papio genomes were analyzed. Our results, based on the analysis of autosomal, mitochondrial and Y chromosome data, suggest complex, possibly male-biased, gene flow between Gorongosa baboons and P. cynocephalus, hinting to direct or indirect contributions from baboons belonging to the "northern" Papio clade, and signal the presence of population structure within P. ursinus.
The analysis of genome data generated from baboon samples collected in central Mozambique highlighted a complex set of evolutionary relationships with other baboons. Our results provided new insights in the population dynamics that have shaped baboon diversity.
Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority ...of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non‐model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low‐depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low‐depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.
In studies of gene regulation the efficient computational detection of over-represented transcription factor binding sites is an increasingly important aspect. Several published methods can be used ...for testing whether a set of hypothesised co-regulated genes share a common regulatory regime based on the occurrence of the modelled transcription factor binding sites. However there is little or no information available for guiding the end users choice of method. Furthermore it would be necessary to obtain several different software programs from various sources to make a well-founded choice.
We introduce a software package, Asap, for fast searching with position weight matrices that include several standard methods for assessing over-representation. We have compared the ability of these methods to detect over-represented transcription factor binding sites in artificial promoter sequences. Controlling all aspects of our input data we are able to identify the optimal statistics across multiple threshold values and for sequence sets containing different distributions of transcription factor binding sites.
We show that our implementation is significantly faster than more naïve scanning algorithms when searching with many weight matrices in large sequence sets. When comparing the various statistics, we show that those based on binomial over-representation and Fisher's exact test performs almost equally good and better than the others. An online server is available at http://servers.binf.ku.dk/asap/.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
There has recently been considerable interest in detecting natural selection in the human genome. Selection will usually tend to increase identity-by-descent (IBD) among individuals in a population, ...and many methods for detecting recent and ongoing positive selection indirectly take advantage of this. In this article we show that excess IBD sharing is a general property of natural selection and we show that this fact makes it possible to detect several types of selection including a type that is otherwise difficult to detect: selection acting on standing genetic variation. Motivated by this, we use a recently developed method for identifying IBD sharing among individuals from genome-wide data to scan populations from the new HapMap phase 3 project for regions with excess IBD sharing in order to identify regions in the human genome that have been under strong, very recent selection. The HLA region is by far the region showing the most extreme signal, suggesting that much of the strong recent selection acting on the human genome has been immune related and acting on HLA loci. As equilibrium overdominance does not tend to increase IBD, we argue that this type of selection cannot explain our observations.
A recent study conducted the first genome-wide scan for selection in Inuit from Greenland using single nucleotide polymorphism chip data. Here, we report that selection in the region with the second ...most extreme signal of positive selection in Greenlandic Inuit favored a deeply divergent haplotype that is closely related to the sequence in the Denisovan genome, and was likely introgressed from an archaic population. The region contains two genes, WARS2 and TBX15, and has previously been associated with adipose tissue differentiation and body-fat distribution in humans. We show that the adaptively introgressed allele has been under selection in a much larger geographic region than just Greenland. Furthermore, it is associated with changes in expression of WARS2 and TBX15 in multiple tissues including the adrenal gland and subcutaneous adipose tissue, and with regional DNA methylation changes in TBX15.
Abstract
Motivation
Inference of identity-by-descent (IBD) sharing along the genome between pairs of individuals has important uses. But all existing inference methods are based on genotypes, which ...is not ideal for low-depth Next Generation Sequencing (NGS) data from which genotypes can only be called with high uncertainty.
Results
We present a new probabilistic software tool, LocalNgsRelate, for inferring IBD sharing along the genome between pairs of individuals from low-depth NGS data. Its inference is based on genotype likelihoods instead of genotypes, and thereby it takes the uncertainty of the genotype calling into account. Using real data from the 1000 Genomes project, we show that LocalNgsRelate provides more accurate IBD inference for low-depth NGS data than two state-of-the-art genotype-based methods, Albrechtsen et al. (2009) and hap-IBD. We also show that the method works well for NGS data down to a depth of 2×.
Availability and implementation
LocalNgsRelate is freely available at https://github.com/idamoltke/LocalNgsRelate.
Supplementary information
Supplementary data are available at Bioinformatics online.
The estimation of relatedness between pairs of possibly inbred individuals from high-throughput sequencing (HTS) data has previously not been possible for samples where we cannot obtain reliable ...genotype calls, as in the case of low-coverage data.
We introduce ngsRelateV2, a major revision of ngsRelateV1, a program that originally allowed for estimation of relatedness from HTS data among non-inbred individuals only. The new revised version takes into account the possibility of individuals being inbred by estimating the 9 condensed Jacquard coefficients along with various other relatedness statistics. The program is threaded and scales linearly with the number of cores allocated to the process.
The program is available as an open source C/C++ program under the GPL license and hosted at https://github.com/ANGSD/ngsRelate. To facilitate easy analysis, the program is able to work directly on the most commonly used container formats for raw sequence (BAM/CRAM) and summary data (VCF/BCF).