Over the past 500 years, North America has been the site of ongoing mixing of Native Americans, European settlers, and Africans (brought largely by the trans-Atlantic slave trade), shaping the early ...history of what became the United States. We studied the genetic ancestry of 5,269 self-described African Americans, 8,663 Latinos, and 148,789 European Americans who are 23andMe customers and show that the legacy of these historical interactions is visible in the genetic ancestry of present-day Americans. We document pervasive mixed ancestry and asymmetrical male and female ancestry contributions in all groups studied. We show that regional ancestry differences reflect historical events, such as early Spanish colonization, waves of immigration from many regions of Europe, and forced relocation of Native Americans within the US. This study sheds light on the fine-scale differences in ancestry within and across the United States and informs our understanding of the relationship between racial and ethnic identities and genetic ancestry.
Although the causes of Parkinson's disease (PD) are thought to be primarily environmental, recent studies suggest that a number of genes influence susceptibility. Using targeted case recruitment and ...online survey instruments, we conducted the largest case-control genome-wide association study (GWAS) of PD based on a single collection of individuals to date (3,426 cases and 29,624 controls). We discovered two novel, genome-wide significant associations with PD-rs6812193 near SCARB2 (p = 7.6 × 10(-10), OR = 0.84) and rs11868035 near SREBF1/RAI1 (p = 5.6 × 10(-8), OR = 0.85)-both replicated in an independent cohort. We also replicated 20 previously discovered genetic associations (including LRRK2, GBA, SNCA, MAPT, GAK, and the HLA region), providing support for our novel study design. Relying on a recently proposed method based on genome-wide sharing estimates between distantly related individuals, we estimated the heritability of PD to be at least 0.27. Finally, using sparse regression techniques, we constructed predictive models that account for 6%-7% of the total variance in liability and that suggest the presence of true associations just beyond genome-wide significance, as confirmed through both internal and external cross-validation. These results indicate a substantial, but by no means total, contribution of genetics underlying susceptibility to both early-onset and late-onset PD, suggesting that, despite the novel associations discovered here and elsewhere, the majority of the genetic component for Parkinson's disease remains to be discovered.
Myopia, or nearsightedness, is the most common eye disorder, resulting primarily from excess elongation of the eye. The etiology of myopia, although known to be complex, is poorly understood. Here we ...report the largest ever genome-wide association study (45,771 participants) on myopia in Europeans. We performed a survival analysis on age of myopia onset and identified 22 significant associations (Formula: see text), two of which are replications of earlier associations with refractive error. Ten of the 20 novel associations identified replicate in a separate cohort of 8,323 participants who reported if they had developed myopia before age 10. These 22 associations in total explain 2.9% of the variance in myopia age of onset and point toward a number of different mechanisms behind the development of myopia. One association is in the gene PRSS56, which has previously been linked to abnormally small eyes; one is in a gene that forms part of the extracellular matrix (LAMA2); two are in or near genes involved in the regeneration of 11-cis-retinal (RGR and RDH5); two are near genes known to be involved in the growth and guidance of retinal ganglion cells (ZIC2, SFRP1); and five are in or near genes involved in neuronal signaling or development. These novel findings point toward multiple genetic factors involved in the development of myopia and suggest that complex interactions between extracellular matrix remodeling, neuronal development, and visual signals from the retina may underlie the development of myopia in humans.
We conducted a genome-wide association study (GWAS) to identify novel predisposition alleles associated with Philadelphia chromosome-negative myeloproliferative neoplasms (MPNs) and JAK2 V617F clonal ...hematopoiesis in the general population. We recruited a web-based cohort of 726 individuals with polycythemia vera, essential thrombocythemia, and myelofibrosis and 252 637 population controls unselected for hematologic phenotypes. Using a single-nucleotide polymorphism (SNP) array platform with custom probes for the JAK2 V617F mutation (V617F), we identified 497 individuals (0.2%) among the population controls who were V617F carriers. We performed a combined GWAS of the MPN cases plus V617F carriers in the control population (n = 1223) vs the remaining controls who were noncarriers for V617F (n = 252 140). For these MPN cases plus V617F carriers, we replicated the germ line JAK2 46/1 haplotype (rs59384377: odds ratio OR = 2.4, P = 6.6 × 10−89), previously associated with V617F-positive MPN. We also identified genome-wide significant associations in the TERT gene (rs7705526: OR = 1.8, P = 1.1 × 10−32), in SH2B3 (rs7310615: OR = 1.4, P = 3.1 × 10−14), and upstream of TET2 (rs1548483: OR = 2.0, P = 2.0 × 10−9). These associations were confirmed in a separate replication cohort of 446 V617F carriers vs 169 021 noncarriers. In a joint analysis of the combined GWAS and replication results, we identified additional genome-wide significant predisposition alleles associated with CHEK2, ATM, PINT, and GFI1B. All SNP ORs were similar for MPN patients and controls who were V617F carriers. These data indicate that the same germ line variants endow individuals with a predisposition not only to MPN, but also to JAK2 V617F clonal hematopoiesis, a more common phenomenon that may foreshadow the development of an overt neoplasm.
•Germ line variants in TERT, SH2B3, TET2, ATM, CHEK2, PINT, and GFI1B are associated with JAK2 V617F clonal hematopoiesis and MPNs.•Age-related JAK2 V617F clonal hematopoiesis is found in ∼2 out of 1000 individuals in the general population.
Hypothyroidism is the most common thyroid disorder, affecting about 5% of the general population. Here we present the current largest genome-wide association study of hypothyroidism, in 3,736 cases ...and 35,546 controls. Hypothyroidism was assessed via web-based questionnaires. We identify five genome-wide significant associations, three of which are well known to be involved in a large spectrum of autoimmune diseases: rs6679677 near PTPN22, rs3184504 in SH2B3, and rs2517532 in the HLA class I region (p-values 2.8·10(-13), 2.6·10(-12), and 1.3·10(-8), respectively). We also report associations with rs4915077 near VAV3 (p-value 7.5·10(-10)) and rs925489 near FOXE1 (p value 2.4·10(-19)). VAV3 is involved in immune function, and FOXE1 and PTPN22 have previously been associated with hypothyroidism. Although the HLA class I region and SH2B3 have previously been linked with a number of autoimmune diseases, this is the first report of their association with thyroid disease. The VAV3 association is also novel. We also show suggestive evidence of association for hypothyroidism with a SNP in the HLA class II region (independent of the other HLA association) as well as SNPs in CAPZB, PDE8B, and CTLA4. CAPZB and PDE8B have been linked to TSH levels and CTLA4 to a variety of autoimmune diseases. These results suggest heterogeneity in the genetic etiology of hypothyroidism, implicating genes involved in both autoimmune disorders and thyroid function. Using a genetic risk profile score based on the top association from each of the five genome-wide significant regions in our study, the relative risk between the highest and lowest deciles of genetic risk is 2.0.
The invention of agriculture is widely assumed to have driven recent human population growth. However, direct genetic evidence for population growth after independent agricultural origins has been ...elusive. We estimated population sizes through time from a set of globally distributed whole mitochondrial genomes, after separating lineages associated with agricultural populations from those associated with hunter-gatherers. The coalescent-based analysis revealed strong evidence for distinct demographic expansions in Europe, southeastern Asia, and sub-Saharan Africa within the past 10,000 y. Estimates of the timing of population growth based on genetic data correspond neatly to dates for the initial origins of agriculture derived from archaeological evidence. Comparisons of rates of population growth through time reveal that the invention of agriculture facilitated a fivefold increase in population growth relative to more ancient expansions of hunter-gatherers.
Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant ...relationships such as 2(nd) to 9(th) cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100-300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and 'unrelated' population samples. Using these bounds as a guide, we detected tens of thousands of 2(nd) to 9(th) degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large 'unrelated' populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies.
Allergic disease is very common and carries substantial public-health burdens. We conducted a meta-analysis of genome-wide associations with self-reported cat, dust-mite and pollen allergies in ...53,862 individuals. We used generalized estimating equations to model shared and allergy-specific genetic effects. We identified 16 shared susceptibility loci with association P<5×10(-8), including 8 loci previously associated with asthma, as well as 4p14 near TLR1, TLR6 and TLR10 (rs2101521, P=5.3×10(-21)); 6p21.33 near HLA-C and MICA (rs9266772, P=3.2×10(-12)); 5p13.1 near PTGER4 (rs7720838, P=8.2×10(-11)); 2q33.1 in PLCL1 (rs10497813, P=6.1×10(-10)), 3q28 in LPP (rs9860547, P=1.2×10(-9)); 20q13.2 in NFATC2 (rs6021270, P=6.9×10(-9)), 4q27 in ADAD1 (rs17388568, P=3.9×10(-8)); and 14q21.1 near FOXA1 and TTC6 (rs1998359, P=4.8×10(-8)). We identified one locus with substantial evidence of differences in effects across allergies at 6p21.32 in the class II human leukocyte antigen (HLA) region (rs17533090, P=1.7×10(-12)), which was strongly associated with cat allergy. Our study sheds new light on the shared etiology of immune and autoimmune disease.
Starch consumption is a prominent characteristic of agricultural societies and hunter-gatherers in arid environments. In contrast, rainforest and circum-arctic hunter-gatherers and some pastoralists ...consume much less starch. This behavioral variation raises the possibility that different selective pressures have acted on amylase, the enzyme responsible for starch hydrolysis. We found that copy number of the salivary amylase gene (AMY1) is correlated positively with salivary amylase protein level and that individuals from populations with high-starch diets have, on average, more AMY1 copies than those with traditionally low-starch diets. Comparisons with other loci in a subset of these populations suggest that the extent of AMY1 copy number differentiation is highly unusual. This example of positive selection on a copy number-variable gene is, to our knowledge, one of the first discovered in the human genome. Higher AMY1 copy numbers and protein levels probably improve the digestion of starchy foods and may buffer against the fitness-reducing effects of intestinal disease.
Africa is inferred to be the continent of origin for all modern human populations, but the details of human prehistory and evolution in Africa remain largely obscure owing to the complex histories of ...hundreds of distinct populations. We present data for more than 580,000 SNPs for several hunter-gatherer populations: the Hadza and Sandawe of Tanzania, and the not equalKhomani Bushmen of South Africa, including speakers of the nearly extinct N