Conventional human leukocyte antigen (HLA) imputation methods drop their performance for infrequent alleles, which is one of the factors that reduce the reliability of trans-ethnic major ...histocompatibility complex (MHC) fine-mapping due to inter-ethnic heterogeneity in allele frequency spectra. We develop DEEP*HLA, a deep learning method for imputing HLA genotypes. Through validation using the Japanese and European HLA reference panels (n = 1,118 and 5,122), DEEP*HLA achieves the highest accuracies with significant superiority for low-frequency and rare alleles. DEEP*HLA is less dependent on distance-dependent linkage disequilibrium decay of the target alleles and might capture the complicated region-wide information. We apply DEEP*HLA to type 1 diabetes GWAS data from BioBank Japan (n = 62,387) and UK Biobank (n = 354,459), and successfully disentangle independently associated class I and II HLA variants with shared risk among diverse populations (the top signal at amino acid position 71 of HLA-DRβ1; P = 7.5 × 10
). Our study illustrates the value of deep learning in genotype imputation and trans-ethnic MHC fine-mapping.
The BioBank Japan (BBJ) Project was launched in 2003 with the aim of providing evidence for the implementation of personalized medicine by constructing a large, patient-based biobank (BBJ). This ...report describes the study design and profile of BBJ participants who were registered during the first 5-year period of the project.
The BBJ is a registry of patients diagnosed with any of 47 target common diseases. Patients were enrolled at 12 cooperative medical institutes all over Japan from June 2003 to March 2008. Clinical information was collected annually via interviews and medical record reviews until 2013. We collected DNA from all participants at baseline and collected annual serum samples until 2013. In addition, we followed patients who reported a history of 32 of the 47 target diseases to collect survival data, including cause of death.
During the 5-year period, 200,000 participants were registered in the study. The total number of cases was 291,274 at baseline. Baseline data for 199,982 participants (53.1% male) were available for analysis. The average age at entry was 62.7 years for men and 61.5 years for women. Follow-up surveys were performed for participants with any of 32 diseases, and survival time data for 141,612 participants were available for analysis.
The BBJ Project has constructed the infrastructure for genomic research for various common diseases. This clinical information, coupled with genomic data, will provide important clues for the implementation of personalized medicine.
•The BioBank Japan Project (BBJ) enrolled 200,000 patients with 47 target diseases.•The BBJ is one of the largest patient-based biobanks in the world.•The BBJ may allow for personalized medicine in the future.
Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we ...conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (n
= 628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics.
Human height is a representative phenotype to elucidate genetic architecture. However, the majority of large studies have been performed in European population. To investigate the rare and ...low-frequency variants associated with height, we construct a reference panel (N = 3,541) for genotype imputation by integrating the whole-genome sequence data from 1,037 Japanese with that of the 1000 Genomes Project, and perform a genome-wide association study in 191,787 Japanese. We report 573 height-associated variants, including 22 rare and 42 low-frequency variants. These 64 variants explain 1.7% of the phenotypic variance. Furthermore, a gene-based analysis identifies two genes with multiple height-increasing rare and low-frequency nonsynonymous variants (SLC27A3 and CYP26B1; P
< 2.5 × 10
). Our analysis shows a general tendency of the effect sizes of rare variants towards increasing height, which is contrary to findings among Europeans, suggesting that height-associated rare variants are under different selection pressure in Japanese and European populations.
Poor trans-ancestry portability of polygenic risk scores is a consequence of Eurocentric genetic studies and limited knowledge of shared causal variants. Leveraging regulatory annotations may improve ...portability by prioritizing functional over tagging variants. We constructed a resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 epigenetic datasets to predict binding patterns of 142 transcription factors across 245 cell types. We then partitioned the common SNP heritability of 111 genome-wide association study summary statistics of European (average n ≈ 189,000) and East Asian (average n ≈ 157,000) origin. IMPACT annotations captured consistent SNP heritability between populations, suggesting prioritization of shared functional variants. Variant prioritization using IMPACT resulted in increased trans-ancestry portability of polygenic risk scores from Europeans to East Asians across all 21 phenotypes analyzed (49.9% mean relative increase in R
). Our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ancestry portability of genetic data.
Numerous genetic variants associated with hypertension and blood pressure are known, but there is a paucity of evidence from genetic studies of resistant hypertension, especially in Asian ...populations. To identify novel genetic loci associated with resistant hypertension in the Japanese population, we conducted a genome-wide association study with 2705 resistant hypertension cases and 21,296 mild hypertension controls, all from BioBank Japan. We identified one novel susceptibility candidate locus, rs1442386 on chromosome 18p11.3 (DLGAP1), achieving genome-wide significance (odds ratio (95% CI) = 0.85 (0.81-0.90), P = 3.75 × 10
) and 18 loci showing suggestive association, including rs62525059 of 8q24.3 (CYP11B2) and rs3774427 of 3p21.1 (CACNA1D). We further detected biological processes associated with resistant hypertension, including chemical synaptic transmission, regulation of transmembrane transport, neuron development and neurological system processes, highlighting the importance of the nervous system. This study provides insights into the etiology of resistant hypertension in the Japanese population.
The overwhelming majority of participants in current genetic studies are of European ancestry. To elucidate disease biology in the East Asian population, we conducted a genome-wide association study ...(GWAS) with 212,453 Japanese individuals across 42 diseases. We detected 320 independent signals in 276 loci for 27 diseases, with 25 novel loci (P < 9.58 × 10
). East Asian-specific missense variants were identified as candidate causal variants for three novel loci, and we successfully replicated two of them by analyzing independent Japanese cohorts; p.R220W of ATG16L2 (associated with coronary artery disease) and p.V326A of POT1 (associated with lung cancer). We further investigated enrichment of heritability within 2,868 annotations of genome-wide transcription factor occupancy, and identified 378 significant enrichments across nine diseases (false discovery rate < 0.05) (for example, NKX3-1 for prostate cancer). This large-scale GWAS in a Japanese population provides insights into the etiology of complex diseases and highlights the importance of performing GWAS in non-European populations.
The extent to which the biology of oncogenesis and ageing are shaped by factors that distinguish human populations is unknown. Haematopoietic clones with acquired mutations become common with ...advancing age and can lead to blood cancers
. Here we describe shared and population-specific patterns of genomic mutations and clonal selection in haematopoietic cells on the basis of 33,250 autosomal mosaic chromosomal alterations that we detected in 179,417 Japanese participants in the BioBank Japan cohort and compared with analogous data from the UK Biobank. In this long-lived Japanese population, mosaic chromosomal alterations were detected in more than 35.0% (s.e.m., 1.4%) of individuals older than 90 years, which suggests that such clones trend towards inevitability with advancing age. Japanese and European individuals exhibited key differences in the genomic locations of mutations in their respective haematopoietic clones; these differences predicted the relative rates of chronic lymphocytic leukaemia (which is more common among European individuals) and T cell leukaemia (which is more common among Japanese individuals) in these populations. Three different mutational precursors of chronic lymphocytic leukaemia (including trisomy 12, loss of chromosomes 13q and 13q, and copy-neutral loss of heterozygosity) were between two and six times less common among Japanese individuals, which suggests that the Japanese and European populations differ in selective pressures on clones long before the development of clinically apparent chronic lymphocytic leukaemia. Japanese and British populations also exhibited very different rates of clones that arose from B and T cell lineages, which predicted the relative rates of B and T cell cancers in these populations. We identified six previously undescribed loci at which inherited variants predispose to mosaic chromosomal alterations that duplicate or remove the inherited risk alleles, including large-effect rare variants at NBN, MRE11 and CTU2 (odds ratio, 28-91). We suggest that selective pressures on clones are modulated by factors that are specific to human populations. Further genomic characterization of clonal selection and cancer in populations from around the world is therefore warranted.
Obesity is a risk factor for a wide variety of health problems. In a genome-wide association study (GWAS) of body mass index (BMI) in Japanese people (n = 173,430), we found 85 loci significantly ...associated with obesity (P < 5.0 × 10
), of which 51 were previously unknown. We conducted trans-ancestral meta-analyses by integrating these results with the results from a GWAS of Europeans and identified 61 additional new loci. In total, this study identifies 112 novel loci, doubling the number of previously known BMI-associated loci. By annotating associated variants with cell-type-specific regulatory marks, we found enrichment of variants in CD19
cells. We also found significant genetic correlations between BMI and lymphocyte count (P = 6.46 × 10
, r
= 0.18) and between BMI and multiple complex diseases. These findings provide genetic evidence that lymphocytes are relevant to body weight regulation and offer insights into the pathogenesis of obesity.