Polygenic risk scores (PRS) are poised to improve biomedical outcomes via precision medicine. However, the major ethical and scientific challenge surrounding clinical implementation of PRS is that ...those available today are several times more accurate in individuals of European ancestry than other ancestries. This disparity is an inescapable consequence of Eurocentric biases in genome-wide association studies, thus highlighting that-unlike clinical biomarkers and prescription drugs, which may individually work better in some populations but do not ubiquitously perform far better in European populations-clinical uses of PRS today would systematically afford greater improvement for European-descent populations. Early diversifying efforts show promise in leveling this vast imbalance, even when non-European sample sizes are considerably smaller than the largest studies to date. To realize the full and equitable potential of PRS, greater diversity must be prioritized in genetic studies, and summary statistics must be publically disseminated to ensure that health disparities are not increased for those individuals already most underserved.
Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of ...existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall.
We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms.
These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.
Spatial transcriptomics is an emerging technology requiring costly reagents and considerable skills, limiting the identification of transcriptional markers related to histology. Here, we show that ...predicted spatial gene-expression in unmeasured regions and tissues can enhance biologists' histological interpretations. We developed the Deep learning model for Spatial gene Clusters and Expression, DeepSpaCE, and confirmed its performance using the spatial-transcriptome profiles and immunohistochemistry images of consecutive human breast cancer tissue sections. For example, the predicted expression patterns of SPARC, an invasion marker, highlighted a small tumor-invasion region difficult to identify using raw spatial transcriptome data alone because of a lack of measurements. We further developed semi-supervised DeepSpaCE using unlabeled histology images and increased the imputation accuracy of consecutive sections, enhancing applicability for a small sample size. Our method enables users to derive hidden histological characters via spatial transcriptome and gene annotations, leading to accelerated biological discoveries without additional experiments.
Conventional human leukocyte antigen (HLA) imputation methods drop their performance for infrequent alleles, which is one of the factors that reduce the reliability of trans-ethnic major ...histocompatibility complex (MHC) fine-mapping due to inter-ethnic heterogeneity in allele frequency spectra. We develop DEEP*HLA, a deep learning method for imputing HLA genotypes. Through validation using the Japanese and European HLA reference panels (n = 1,118 and 5,122), DEEP*HLA achieves the highest accuracies with significant superiority for low-frequency and rare alleles. DEEP*HLA is less dependent on distance-dependent linkage disequilibrium decay of the target alleles and might capture the complicated region-wide information. We apply DEEP*HLA to type 1 diabetes GWAS data from BioBank Japan (n = 62,387) and UK Biobank (n = 354,459), and successfully disentangle independently associated class I and II HLA variants with shared risk among diverse populations (the top signal at amino acid position 71 of HLA-DRβ1; P = 7.5 × 10
). Our study illustrates the value of deep learning in genotype imputation and trans-ethnic MHC fine-mapping.
The BioBank Japan (BBJ) Project was launched in 2003 with the aim of providing evidence for the implementation of personalized medicine by constructing a large, patient-based biobank (BBJ). This ...report describes the study design and profile of BBJ participants who were registered during the first 5-year period of the project.
The BBJ is a registry of patients diagnosed with any of 47 target common diseases. Patients were enrolled at 12 cooperative medical institutes all over Japan from June 2003 to March 2008. Clinical information was collected annually via interviews and medical record reviews until 2013. We collected DNA from all participants at baseline and collected annual serum samples until 2013. In addition, we followed patients who reported a history of 32 of the 47 target diseases to collect survival data, including cause of death.
During the 5-year period, 200,000 participants were registered in the study. The total number of cases was 291,274 at baseline. Baseline data for 199,982 participants (53.1% male) were available for analysis. The average age at entry was 62.7 years for men and 61.5 years for women. Follow-up surveys were performed for participants with any of 32 diseases, and survival time data for 141,612 participants were available for analysis.
The BBJ Project has constructed the infrastructure for genomic research for various common diseases. This clinical information, coupled with genomic data, will provide important clues for the implementation of personalized medicine.
•The BioBank Japan Project (BBJ) enrolled 200,000 patients with 47 target diseases.•The BBJ is one of the largest patient-based biobanks in the world.•The BBJ may allow for personalized medicine in the future.
Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we ...conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (n
= 628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics.
Human height is a representative phenotype to elucidate genetic architecture. However, the majority of large studies have been performed in European population. To investigate the rare and ...low-frequency variants associated with height, we construct a reference panel (N = 3,541) for genotype imputation by integrating the whole-genome sequence data from 1,037 Japanese with that of the 1000 Genomes Project, and perform a genome-wide association study in 191,787 Japanese. We report 573 height-associated variants, including 22 rare and 42 low-frequency variants. These 64 variants explain 1.7% of the phenotypic variance. Furthermore, a gene-based analysis identifies two genes with multiple height-increasing rare and low-frequency nonsynonymous variants (SLC27A3 and CYP26B1; P
< 2.5 × 10
). Our analysis shows a general tendency of the effect sizes of rare variants towards increasing height, which is contrary to findings among Europeans, suggesting that height-associated rare variants are under different selection pressure in Japanese and European populations.
Mosaic chromosomal alterations (mCAs) are frequently observed in cancer cells and are regarded as one of the common features of cancers. Strikingly, accumulating studies demonstrated that mCAs are ...also prevalent in elderly individuals without cancer, implying mCA could be a feature of aging and not necessarily a cancerous state. However, the genetic basis of mCA has been mostly unknown. Recent studies of autosomal mCA based on biobank-scale datasets, including UK Biobank and Biobank Japan, provided a glimpse into the underlying genetic mechanism. In this concise review, we briefly introduced mCA, its link with cancer and aging, and the emerging genetic mechanisms of this phenomenon. We highlighted the following aspects: (1) the interplay between somatic and inherited germline mutations in generating mosaicism; (2) monogenic and polygenic architectures of mCA; and (3) population-specific profiles of mCA. We provided a future perspective emphasizing the need to understand the connection between mCA and other characteristics of aging, in particular, the epigenetic and immunologic features.
We report genome-wide association studies for hematological and biochemical traits from approximately 14,700 Japanese individuals. We identified 60 associations for 8 hematological traits and 29 ...associations for 12 biochemical traits at genome-wide significance levels (P < 5 x 10(-8)). Of these, 46 associations were new to this study and 43 replicated previous reports. We compared these associated loci with those reported in similar GWAS in European populations. When the minor allele frequency was >10% in the Japanese population, 32 (94.1%) and 31 (91.2%) of the 34 hematological loci previously reported to be associated in a European population were replicated with P-values less than 0.05 and 0.01, respectively, and 31 (73.8%) and 27 (64.3%) of the 42 European biochemical loci were replicated.