Pedigree inference, for example determining whether two persons are second cousins or unrelated, can be done by comparing their genotypes at a selection of genetic markers. When the data for one or ...more of the persons is from low-coverage next generation sequencing (lcNGS), currently available computational methods either ignore genetic linkage or do not take advantage of the probabilistic nature of lcNGS data, relying instead on first estimating the genotype. We provide a method and software (see familias.name/lcNGS) bridging the above gap. Simulations indicate how our results are considerably more accurate compared to some previously available alternatives. Our method, utilizing a version of the Lander-Green algorithm, uses a group of symmetries to speed up calculations. This group may be of further interest in other calculations involving linked loci.
Single nucleotide polymorphism (SNP) data generated with microarray technologies have been used to solve murder cases via investigative leads obtained from identifying relatives of the unknown ...perpetrator included in accessible genomic databases, an approach referred to as investigative genetic genealogy (IGG). However, SNP microarrays were developed for relatively high input DNA quantity and quality, while DNA typically obtainable from crime scene stains is of low DNA quantity and quality, and SNP microarray data obtained from compromised DNA are largely missing. By applying the Illumina Global Screening Array (GSA) to 264 DNA samples with systematically altered quantity and quality, we empirically tested the impact of SNP microarray analysis of compromised DNA on kinship classification success, as relevant in IGG. Reference data from manufacturer-recommended input DNA quality and quantity were used to estimate genotype accuracy in the compromised DNA samples and for simulating data of different degree relatives. Although stepwise decrease of input DNA amount from 200 ng to 6.25 pg led to decreased SNP call rates and increased genotyping errors, kinship classification success did not decrease down to 250 pg for siblings and 1st cousins, 1 ng for 2nd cousins, while at 25 pg and below kinship classification success was zero. Stepwise decrease of input DNA quality via increased DNA fragmentation resulted in the decrease of genotyping accuracy as well as kinship classification success, which went down to zero at the average DNA fragment size of 150 base pairs. Combining decreased DNA quantity and quality in mock casework and skeletal samples further highlighted possibilities and limitations. Overall, GSA analysis achieved maximal kinship classification success from 800 to 200 times lower input DNA quantities than manufacturer-recommended, although DNA quality plays a key role too, while compromised DNA produced false negative kinship classifications rather than false positive ones.
•SNP microarrays used for kinship classification in investigative genetic genealogy.•First systematic SNP microarray study of quality and quantity compromised DNA for kinship classification.•Successful kinship classification from 250 pg input DNA, 800 times lower than array-manufacturer recommended.•Input DNA quality plays key role too in SNP microarray-based kinship classification success.•Compromised DNA leads to false negative kinship classifications rather than false positive ones.
•We illustrate the potential of dense sets of SNP markers to resolve distant relationships.•We study the impact of linkage disequilibrium, genotyping errors and assuming inappropriate population.•We ...conclude that denser sets of genetic markers is well suited for inference of distant relationships.•We explore four real cases and illustrate that these may be resolved using dense marker sets.
With the advent of high density SNP arrays and the progress of next generation sequencing, demands for new methods to handle the subsequent data analysis have exploded. Forensic laboratories are generally hesitant to implement new methods in casework unless they are thoroughly tested and validated. This is particularly true when a third party contractor is involved in the analysis. In this paper we explore data from dense sets of SNP markers and study how different errors could potentially affect the results. Particularly, we study the effects of genotyping errors, linkage disequilibrium as well as the use of inappropriate population frequencies. We demonstrate that ignoring these concepts may lead to false conclusions for some different relationship cases and outline solutions to mitigate these problems.
Several applications necessitate an unbiased determination of relatedness, be it in linkage or association studies or in a forensic setting. An appropriate model to compute the joint probability of ...some genetic data for a set of persons given some hypothesis about the pedigree structure is then required. The increasing number of markers available through high-density SNP microarray typing and NGS technologies intensifies the demand, where using a large number of markers may lead to biased results due to strong dependencies between closely located loci, both within pedigrees (linkage) and in the population (allelic association or linkage disequilibrium (LD)). We present a new general model, based on a Markov chain for inheritance patterns and another Markov chain for founder allele patterns, the latter allowing us to account for LD. We also demonstrate a specific implementation for X chromosomal markers that allows for computation of likelihoods based on hypotheses of alleged relationships and genetic marker data. The algorithm can simultaneously account for linkage, LD, and mutations. We demonstrate its feasibility using simulated examples. The algorithm is implemented in the software FamLinkX, providing a user-friendly GUI for Windows systems (FamLinkX, as well as further usage instructions, is freely available at
www.famlink.se
). Our software provides the necessary means to solve cases where no previous implementation exists. In addition, the software has the possibility to perform simulations in order to further study the impact of linkage and LD on computed likelihoods for an arbitrary set of markers.
Highlights • A general implementation for likelihood calculations on X-chromosomal marker data is presented. • We model linkage, linkage disequilibrium as well as mutations. • The implementation is ...freely available at www.famlink.se. • Concordance with other software, where applicable, is demonstrated. • Validation and theoretical derivations for some calculations are provided.
Identifying individuals from biological mixtures to which they contributed is highly relevant in crime scene investigation and various biomedical research fields, but despite previous attempts, ...remains nearly impossible. Here we investigated the potential of using single-cell transcriptome sequencing (scRNA-seq), coupled with a dedicated bioinformatics pipeline (De-goulash), to solve this long-standing problem. We developed a novel approach and tested it with scRNA-seq data that we de-novo generated from multi-person blood mixtures, and also in-silico mixtures we assembled from public single individual scRNA-seq datasets, involving different numbers, ratios, and bio-geographic ancestries of contributors. For all 2 up to 9-person balanced and imbalanced blood mixtures with ratios up to 1:60, we achieved a clear single-cell separation according to the contributing individuals. For all separated mixture contributors, sex and bio-geographic ancestry (maternal, paternal, and bi-parental) were correctly determined. All separated contributors were correctly individually identified with court-acceptable statistical certainty using de-novo generated whole exome sequencing reference data. In this proof-of-concept study, we demonstrate the feasibility of single-cell approaches to deconvolute biological mixtures and subsequently genetically characterise, and individually identify the separated mixture contributors. With further optimisation and implementation, this approach may eventually allow moving to challenging biological mixtures, including those found at crime scenes.
Recent progress in forensic genetics has introduced a number of closely located short tandem repeat (STR) markers on the X chromosome. Inevitably, dependencies arise that have to be accounted for. ...This paper will in detail explore the complex statistical interpretation of X-chromosomal STR markers, focusing on likelihood calculations. Specifically, we will investigate how the phase uncertainty of haplotypes comes into play in the statistical evaluations and what curious effects this phenomenon can have. The starting point is the different real cases where the weight of evidence has provided unexpected results that require further investigation in order to be fully understood. We will touch upon subjects such as association between alleles, recombinations, and mutations. The aim of this study is to facilitate a better understanding of the interaction between the concepts in addition to provide an understanding why good estimates of haplotype frequencies are crucial. The individual subjects have been discussed in other fields, whereas this study will focus on forensic applications where few studies have been conducted relating to the understanding of how these concepts interact.
Despite the high density of brown bears (Ursus arctos piscator) on the Kamchatka peninsula their genetic variation has not been studied by STR analysis. Our aim was, therefore, to provide population ...data from the Kamchatka brown bear population applying a validated DNA profiling system. Twelve dinucleotide STRs commonly used in Western-European (WE) populations and four additional ones (G10C, G10J, G10O, G10X), were included. Template input ≥ 0.2 ng was successfully amplified. Measurements of precision, stutter and heterozygous balance showed that markers could be reliably genotyped applying the thresholds used for genotyping WE brown bears. However, locus G10X revealed an ancient allele-specific polymorphism that led to suboptimal amplification of all 174 bp alleles (Kamchatka and WE). Allele frequency estimates and forensic genetic parameters were obtained from 115 individuals successfully identified by genotyping 434 hair samples. All markers met the Hardy-Weinberg and linkage equilibrium expectations, and the power of discrimination ranged from 0.667 to 0.962. The total average probability of identity from the 15 STRs was 1.4 ×10−14 (FST = 0.05) while the total average probability of sibling identity was 6.0 ×10−6. Relationship tests revealed several parent-cub and full sibling pairs demonstrating that the marker set would be valuable for the study of family structures. The population data is the first of its kind from the Kamchatka brown bear population. Population pairwise FST`s revealed moderate genetic differentiation that mirrored the geographic distances to WE populations. The DNA profiling system, providing individual-specific profiles from non-invasive samples, will be useful for future monitoring and conservation purposes
•15 out of 16 autosomal STRs passed forensic validations for genotyping of non-invasive samples from Kamchatka brown bears.•The average probability of identity (PI) from 15 STRs was 1.4 ×10−14 (Fst = 0.05), while the sibling PI was 6.0 ×10−6.•Individual-specific DNA profiles may be used in forensic genetics as well as in population and conservation genetics studies.
In a number of applications there is a need to determine the most likely pedigree for a group of persons based on genetic markers. Adequate models are needed to reach this goal. The markers used to ...perform the statistical calculations can be linked and there may also be linkage disequilibrium (LD) in the population. The purpose of this paper is to present a graphical Bayesian Network framework to deal with such data. Potential LD is normally ignored and it is important to verify that the resulting calculations are not biased. Even if linkage does not influence results for regular paternity cases, it may have substantial impact on likelihood ratios involving other, more extended pedigrees. Models for LD influence likelihoods for all pedigrees to some degree and an initial estimate of the impact of ignoring LD and/or linkage is desirable, going beyond mere rules of thumb based on marker distance. Furthermore, we show how one can readily include a mutation model in the Bayesian Network; extending other programs or formulas to include such models may require considerable amounts of work and will in many case not be practical. As an example, we consider the two STR markers vWa and D12S391. We estimate probabilities for population haplotypes to account for LD using a method based on data from trios, while an estimate for the degree of linkage is taken from the literature. The results show that accounting for haplotype frequencies is unnecessary in most cases for this specific pair of markers. When doing calculations on regular paternity cases, the markers can be considered statistically independent. In more complex cases of disputed relatedness, for instance cases involving siblings or so-called deficient cases, or when small differences in the LR matter, independence should not be assumed. (The networks are freely available at http://arken.umb.no/~dakl/BayesianNetworks.).