The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify ...through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of <1.0%. This detailed analysis detected signatures for purifying selection on regulatory elements as well as coding regions. We also catalogue structural variants, including 3.4 million insertions and deletions, and 25,923 genic copy-number variants. The 1KJPN was effective for imputing genotypes of the Japanese population genome wide. These data demonstrate the value of high-coverage sequencing for constructing population-specific variant panels, which covers 99.0% SNVs of minor allele frequency ≥0.1%, and its value for identifying causal rare variants of complex human disease phenotypes in genetic association studies.
The Tohoku Medical Megabank biobank (TMM biobank) is the first major population-based biobank established in Japan. The TMM biobank was established based on two population cohorts and is a ...reconstruction program from the Great East Japan Earthquake and Tsunami of 2011. The biobank stores more than 3.4 million tubes of biospecimens and associated health and analytic data obtained from approximately 150,000 TMM cohort participants between May 2013 and December 2018, and the TMM biobank currently shares high-quality specimens and data. Various biospecimens, including peripheral and cord blood mononuclear cells, buffy coat, plasma, serum, urine, breast milk and saliva have been collected in the TMM biobank. To minimize human error and maintain the quality of data and specimens, we have been utilizing laboratory information management system into various biobank procedures from registration to storage with various automation systems, such as liquid dispensing, DNA extraction and their storage. The biobank procedures for the quality management system (ISO 9001:2015) and information security management system (ISO 27001:2013) are certified by the International Organization for Standardization. The quality of our biobank samples fulfills the pre-analytical requirements for researchers conducting next-generation whole genome sequencing, DNA array analyses, proteomics, metabolomics, etc. We established analytical centers to conduct standard genomic and multiomic analyses in-house and share the generated data. Additionally, we generate thousands of Epstein-Barr virus (EBV)-transformed lymphoblastoid cell lines and proliferating T cells for functional studies. The TMM biobank serves as an indispensable infrastructure for academic, clinical and industrial research to actualize next-generation medicine in Japan.
Certain large genome cohort studies attempt to return the individual genomic results to the participants; however, the implementation process and psychosocial impacts remain largely unknown. The ...Tohoku Medical Megabank Project has conducted large genome cohort studies of general residents. To implement the disclosure of individual genomic results, we extracted the potential challenges and obstacles. Major challenges include the determination of genes/disorders based on the current medical system in Japan, the storage of results, prevention of misunderstanding, and collaboration of medical professionals. To overcome these challenges, we plan to conduct multilayer pilot studies, which deal with different disorders/genes. We finally chose familial hypercholesterolemia (FH) as a target disease for the first pilot study. Of the 665 eligible candidates, 33.5% were interested in the pilot study and provided consent after an educational "genetics workshop" on the basic genetics and medical facts of FH. The genetics professionals disclosed the results to the participants. All positive participants were referred to medical care, and a serial questionnaire revealed no significant psychosocial distress after the disclosure. Return of genomic results to research participants was implemented using a well-prepared protocol. To further elucidate the impact of different disorders, we will perform multilayer pilot studies with different disorders, including actionable pharmacogenomics and hereditary tumor syndromes.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Genome and other data are already being used in areas including cancer and rare diseases. Data-sharing and secondary uses are likely to become much broader and far more extensive; thus, obtaining ...proper consent for these new uses of data is an important issue. Obtaining consent through online methods may be an option to overcome the problems associated with one-off, paper-based informed consent. When the process of obtaining consent takes place remotely, authentication must be assured. Patients may also choose to store some of their own information online, such as genetic information, and allow healthcare professionals to access this data. In this health information transfer and exchange process, it is vital that anyone accessing this information be correctly authenticated to protect patients' privacy. In this article, we first clarified that authentication has two roles: i.e., not only to prevent impersonation but also to prove intent, which is a vital step to ensure that medical research and health information exchange are conducted ethically. We then set out methods of authentication. As a result, we were able to make suggestions about the requirements for authentication and a possible method of authentication for these purposes. We considered problems of biometrics and recommended two-factor authentication without biometrics as a workable solution. However, three-factor authentication including biometrics seems likely to be used once biometrics become more common.
Recently, many phenotyping algorithms for high-throughput cohort identification have been developed. Prospective genome cohort studies are critical resources for precision medicine, but there are ...many hurdles in the precise cohort identification. Consequently, it is important to develop phenotyping algorithms for cohort data collection. Hypertensive disorders of pregnancy (HDP) is a leading cause of maternal morbidity and mortality. In this study, we developed, applied, and validated rule-based phenotyping algorithms of HDP. Two phenotyping algorithms, algorithms 1 and 2, were developed according to American and Japanese guidelines, and applied into 22,452 pregnant women in the Birth and Three-Generation Cohort Study of the Tohoku Medical Megabank project. To precise cohort identification, we analyzed both structured data (e.g., laboratory and physiological tests) and unstructured clinical notes. The identified subtypes of HDP were validated against reference standards. Algorithms 1 and 2 identified 7.93% and 8.08% of the subjects as having HDP, respectively, along with their HDP subtypes. Our algorithms were high performing with high positive predictive values (0.96 and 0.90 for algorithms 1 and 2, respectively). Overcoming the hurdle of precise cohort identification from large-scale cohort data collection, we achieved both developed and implemented phenotyping algorithms, and precisely identified HDP patients and their subtypes from large-scale cohort data collection.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Autism spectrum disorder (ASD) has phenotypically and genetically heterogeneous characteristics. A simulation study demonstrated that attempts to categorize patients with a complex disease into more ...homogeneous subgroups could have more power to elucidate hidden heritability. We conducted cluster analyses using the k-means algorithm with a cluster number of 15 based on phenotypic variables from the Simons Simplex Collection (SSC). As a preliminary study, we conducted a conventional genome-wide association study (GWAS) with a data set of 597 ASD cases and 370 controls. In the second step, we divided cases based on the clustering results and conducted GWAS in each of the subgroups vs controls (cluster-based GWAS). We also conducted cluster-based GWAS on another SSC data set of 712 probands and 354 controls in the replication stage. In the preliminary study, which was conducted in conventional GWAS design, we observed no significant associations. In the second step of cluster-based GWASs, we identified 65 chromosomal loci, which included 30 intragenic loci located in 21 genes and 35 intergenic loci that satisfied the threshold of P < 5.0 × 10
. Some of these loci were located within or near previously reported candidate genes for ASD: CDH5, CNTN5, CNTNAP5, DNAH17, DPP10, DSCAM, FOXK1, GABBR2, GRIN2A5, ITPR1, NTM, SDK1, SNCA, and SRRM4. Of these 65 significant chromosomal loci, rs11064685 located within the SRRM4 gene had a significantly different distribution in the cases vs controls in the replication cohort. These findings suggest that clustering may successfully identify subgroups with relatively homogeneous disease etiologies. Further cluster validation and replication studies are warranted in larger cohorts.
A family history of urolithiasis is associated with a more than doubling of urolithiasis risk, and a twin study estimating 56% heritability of the condition suggests a pivotal role for host genetic ...factors. However, previous genome-wide association studies (GWAS) have identified only six risk-related loci.
To identify novel urolithiasis-related loci in the Japanese population, we performed a large-scale GWAS of 11,130 cases and 187,639 controls, followed by a replication analysis of 2289 cases and 3817 controls. Diagnosis of urolithiasis was confirmed either by a clinician or using medical records or self-report. We also assessed the association of urolithiasis loci with 16 quantitative traits, including metabolic, kidney-related, and electrolyte traits (such as body mass index, lipid storage, eGFR, serum uric acid, and serum calcium), using up to 160,000 samples from BioBank Japan.
The analysis identified 14 significant loci, including nine novel loci. Ten regions showed a significant association with at least one quantitative trait, including metabolic, kidney-related, and electrolyte traits, suggesting a common genetic basis for urolithiasis and these quantitative traits. Four novel loci were related to metabolic traits, obesity, hypertriglyceridemia, or hyperuricemia. The remaining ten loci were associated with kidney- or electrolyte-related traits; these may affect crystallization. Weighted genetic risk score analysis indicated that the highest risk group (top 20%) showed an odds ratio of 1.71 (95% confidence interval, 1.42 to 2.06) - 2.13 (95% confidence interval, 2.00 to 2.27) compared with the reference group (bottom 20%).
Our findings provide evidence that host genetic factors related to regulation of metabolic and crystallization pathways contribute to the development of urolithiasis.
Development of methods for population screening is necessary to improve the efficiency of secondary prevention of diseases. Until now, a common cutoff has been used for all people in the data set. ...However, if big data for health information can be used to modify individual cutoffs according to background factors, it may avoid wasting medical resources. Here we show that the estimated prevalence of the Center for Epidemiologic Studies Depression Scale positivity can be visualized by a heatmap using background factors from epidemiological big data and scores from the Athens Insomnia Scale. We also show that cutoffs based on the estimated prevalence can be used to decrease the number of people screened without decreasing the number of prevalent cases detected. Since this method can be applied to the screening of different outcomes, we believe our work can contribute to the development of efficient screening methods for various diseases.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK