Abstract
In the Tohoku Medical Megabank project, genome and omics analyses of participants in two cohort studies were performed. A part of the data is available at the Japanese Multi Omics Reference ...Panel (jMorp; https://jmorp.megabank.tohoku.ac.jp) as a web-based database, as reported in our previous manuscript published in Nucleic Acid Research in 2018. At that time, jMorp mainly consisted of metabolome data; however, now genome, methylome, and transcriptome data have been integrated in addition to the enhancement of the number of samples for the metabolome data. For genomic data, jMorp provides a Japanese reference sequence obtained using de novo assembly of sequences from three Japanese individuals and allele frequencies obtained using whole-genome sequencing of 8,380 Japanese individuals. In addition, the omics data include methylome and transcriptome data from ∼300 samples and distribution of concentrations of more than 755 metabolites obtained using high-throughput nuclear magnetic resonance and high-sensitivity mass spectrometry. In summary, jMorp now provides four different kinds of omics data (genome, methylome, transcriptome, and metabolome), with a user-friendly web interface. This will be a useful scientific data resource on the general population for the discovery of disease biomarkers and personalized disease prevention and early diagnosis.
The complete human genome sequence is used as a reference for next-generation sequencing analyses. However, some ethnic ancestries are under-represented in the reference genome (e.g., GRCh37) due to ...its bias toward European and African ancestries. Here, we perform de novo assembly of three Japanese male genomes using > 100× Pacific Biosciences long reads and Bionano Genomics optical maps per sample. We integrate the genomes using the major allele for consensus and anchor the scaffolds using genetic and radiation hybrid maps to reconstruct each chromosome. The resulting genome sequence, JG1, is contiguous, accurate, and carries the Japanese major allele at most loci. We adopt JG1 as the reference for confirmatory exome re-analyses of seven rare-disease Japanese families and find that re-analysis using JG1 reduces total candidate variant calls versus GRCh37 while retaining disease-causing variants. These results suggest that integrating multiple genomes from a single population can aid genome analyses of that population.
Skin pigmentation is associated with skin damages and skin cancers, and ultraviolet (UV) photography is used as a minimally invasive mean for the assessment of pigmentation. Since UV photography ...equipment is not usually available in general practice, technologies emphasizing pigmentation in color photo images are desired for daily care. We propose a new method using conditional generative adversarial networks, named UV-photo Net, to generate synthetic UV images from color photo images. Evaluations using color and UV photo image pairs taken by a UV photography system demonstrated that pigment spots were well reproduced in synthetic UV images by UV-photo Net, and some of the reproduced pigment spots were difficult to be recognized in color photo images. In the pigment spot detection analysis, the rate of pigment spot areas in cheek regions for synthetic UV images was highly correlated with the rate for UV photo images (Pearson's correlation coefficient 0.92). We also demonstrated that UV-photo Net was effective for floating up pigment spots for photo images taken by a smartphone camera. UV-photo Net enables an easy assessment of pigmentation from color photo images and will promote self-care of skin damages and early signs of skin cancers for preventive medicine.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Deep learning algorithms have been successfully used in medical image classification. In the next stage, the technology of acquiring explainable knowledge from medical images is highly desired. Here ...we show that deep learning algorithm enables automated acquisition of explainable features from diagnostic annotation-free histopathology images. We compare the prediction accuracy of prostate cancer recurrence using our algorithm-generated features with that of diagnosis by expert pathologists using established criteria on 13,188 whole-mount pathology images consisting of over 86 billion image patches. Our method not only reveals findings established by humans but also features that have not been recognized, showing higher accuracy than human in prognostic prediction. Combining both our algorithm-generated features and human-established criteria predicts the recurrence more accurately than using either method alone. We confirm robustness of our method using external validation datasets including 2276 pathology images. This study opens up fields of machine learning analysis for discovering uncharted knowledge.
Large-scale, sometimes nationwide, prospective genomic cohorts biobanking rich biological specimens such as blood, urine and tissues, have been established and released their vast amount of data in ...several countries. These genetic and epidemiological resources are expected to allow investigators to disentangle genetic and environmental components conferring common complex diseases. There are, however, two major challenges to statistical genetics for this goal: small sample size—high dimensionality and multilayered—heterogenous endophenotypes. Rather counterintuitively, biobank data generally have small sample size relative to their data dimensionality consisting of genomic variation, lifestyle questionnaire, and sometimes their interaction. This is a widely acknowledged difficulty in data analysis, so-called “p»n problem” in statistics or “curse of dimensionality” in machine-learning field. On the other hand, we have too many measurements of individual health status, which are endophenotypes, such as health check-up data, images, psychological test scores in addition to metabolomics and proteomics data. These endophenotypes are rich but not so tractable because of their worsen dimensionality, and substantial correlation, sometimes confusing causation among them. We have tried to overcome the problems inherent to biobank data, using statistical machine-learning and deep-learning technologies.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
The purpose of the present study was to identify a plasma protein biomarker able to predict pre-eclampsia (PE). Comprehensive quantitative proteomics using mass spectrometry with sequential window ...acquisition of all theoretical fragment ion spectra (SWATH-MS) was applied to plasma samples of 7 PE and 14 healthy pregnant women (for PE subjects, plasma samples were taken before onset of PE), and 11 proteins were selected as candidates potentially able to differentiate the two groups. Plasmas collected at gestational weeks 14–24 from 36 PE and 120 healthy pregnant women (for PE subjects, plasma samples were taken before onset of PE) were used to conduct selected reaction monitoring quantification analysis, optimize protein combinations and conduct internal validation, which consisted of 30 iterations of 10-fold cross-validation using multivariate logistic regression and receiver operating characteristic (ROC) analysis. The combination of afamin, fibronectin, and sex-hormone-binding globulin was selected as the best candidate. The 3-protein combination predictive model (predictive equation and cut-off value) generated using the internal validation subjects was successfully validated in another group of validation subjects (36 PE and 54 healthy (for PE subjects, plasma samples were taken before onset of PE)) and showed good predictive performance, with the area under the curve (AUC) 0.835 and odds ratio 13.43. In conclusion, we newly identified a 3-protein combination biomarker and established a predictive equation and cut-off value that can predict the onset of PE based on analysis of plasma samples collected during gestational weeks 14–24.
Most plants do poorly when flooded. Certain rice varieties, known as deepwater rice, survive periodic flooding and consequent oxygen deficiency by activating internode growth of stems to keep above ...the water. Here, we identify the gibberellin biosynthesis gene,
(
), whose loss-of-function allele catapulted the rice Green Revolution, as being responsible for submergence-induced internode elongation. When submerged, plants carrying the deepwater rice-specific
haplotype amplify a signaling relay in which the
gene is transcriptionally activated by an ethylene-responsive transcription factor, OsEIL1a. The SD1 protein directs increased synthesis of gibberellins, largely GA
, which promote internode elongation. Evolutionary analysis shows that the deepwater rice-specific haplotype was derived from standing variation in wild rice and selected for deepwater rice cultivation in Bangladesh.
Full text
Available for:
BFBNIB, NMLJ, NUK, ODKLJ, PNG, SAZU, UL, UM, UPUK
Abstract
We propose a genetic prediction modeling approach for genome-wide association study (GWAS) data that can include not only marginal gene effects but also gene–environment (GxE) interaction ...effects—i.e., multiplicative effects of environmental factors with genes rather than merely additive effects of each. The proposed approach is a straightforward extension of our previous multiple regression-based method, STMGP (smooth-threshold multivariate genetic prediction), with the new feature being that genome-wide test statistics from a GxE interaction analysis are used to weight the corresponding variants. We develop a simple univariate regression approximation to the GxE interaction effect that allows a direct fit of the STMGP framework without modification. The sparse nature of our model automatically removes irrelevant predictors (including variants and GxE combinations), and the model is able to simultaneously incorporate multiple environmental variables. Simulation studies to evaluate the proposed method in comparison with other modeling approaches demonstrate its superior performance under the presence of GxE interaction effects. We illustrate the usefulness of our prediction model through application to real GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI).