The 2–oxoglutarate‐dependent dioxygenase (2OGD) superfamily is the second largest enzyme family in the plant genome, and its members are involved in various oxygenation/hydroxylation reactions. ...Despite their biochemical significance in metabolism, a systematic analysis of plant 2OGDs remains to be accomplished. We present a phylogenetic classification of 479 2OGDs in six plant models, ranging from green algae to angiosperms. These were classified into three classes – DOXA, DOXB and DOXC – based on amino acid sequence similarity. The DOXA class includes plant homologs of Escherichia coli AlkB, which is a prototype of 2OGD involved in the oxidative demethylation of alkylated nucleic acids and histones. The DOXB class is conserved across all plant taxa and is involved in proline 4–hydroxylation in cell wall protein synthesis. The DOXC class is involved in specialized metabolism of various phytochemicals, including phytohormones and flavonoids. The vast majority of 2OGDs from land plants were classified into the DOXC class, but only seven from Chlamydomonas, suggesting that this class has diversified during land plant evolution. Phylogenetic analysis assigned DOXC‐class 2OGDs to 57 phylogenetic clades. 2OGD genes involved in gibberellin biosynthesis were conserved among vascular plants, and those involved in flavonoid and ethylene biosynthesis were shared among seed plants. Several angiosperm‐specific clades were found to be involved in various lineage‐specific specialized metabolisms, but 31 of the 57 DOXC‐class clades were only found in a single species. Therefore, the evolution and diversification of DOXC‐class 2OGDs is partly responsible for the diversity and complexity of specialized metabolites in land plants.
Studies in human genetics deal with a plethora of human genome sequencing data that are generated from specimens as well as available on public domains. With the development of various bioinformatics ...applications, maintaining the productivity of research, managing human genome data, and analyzing downstream data is essential. This review aims to guide struggling researchers to process and analyze these large-scale genomic data to extract relevant information for improved downstream analyses. Here, we discuss worldwide human genome projects that could be integrated into any data for improved analysis. Obtaining human whole-genome sequencing data from both data stores and processes is costly; therefore, we focus on the development of data format and software that manipulate whole-genome sequencing. Once the sequencing is complete and its format and data processing tools are selected, a computational platform is required. For the platform, we describe a multi-cloud strategy that balances between cost, performance, and customizability. A good quality published research relies on data reproducibility to ensure quality results, reusability for applications to other datasets, as well as scalability for the future increase of datasets. To solve these, we describe several key technologies developed in computer science, including workflow engine. We also discuss the ethical guidelines inevitable for human genomic data analysis that differ from model organisms. Finally, the future ideal perspective of data processing and analysis is summarized.
Land plants produce diverse flavonoids for growth, survival, and reproduction. Chalcone synthase is the first committed enzyme of the flavonoid biosynthetic pathway and catalyzes the production of ...2',4,4',6'-tetrahydroxychalcone (THC). However, it also produces other polyketides, including p-coumaroyltriacetic acid lactone (CTAL), because of the derailment of the chalcone-producing pathway. This promiscuity of CHS catalysis adversely affects the efficiency of flavonoid biosynthesis, although it is also believed to have led to the evolution of stilbene synthase and p-coumaroyltriacetic acid synthase. In this study, we establish that chalcone isomerase-like proteins (CHILs), which are encoded by genes that are ubiquitous in land plant genomes, bind to CHS to enhance THC production and decrease CTAL formation, thereby rectifying the promiscuous CHS catalysis. This CHIL function has been confirmed in diverse land plant species, and represents a conserved strategy facilitating the efficient influx of substrates from the phenylpropanoid pathway to the flavonoid pathway.
We report a family with progressive myoclonic epilepsy who underwent whole-exome sequencing but was negative for pathogenic variants. Similar clinical courses of a devastating neurodegenerative ...phenotype of two affected siblings were highly suggestive of a genetic etiology, which indicates that the survey of genetic variation by whole-exome sequencing was not comprehensive. To investigate the presence of a variant that remained unrecognized by standard genetic testing, PacBio long-read sequencing was performed. Structural variant (SV) detection using low-coverage (6×) whole-genome sequencing called 17,165 SVs (7,216 deletions and 9,949 insertions). Our SV selection narrowed down potential candidates to only five SVs (two deletions and three insertions) on the genes tagged with autosomal recessive phenotypes. Among them, a 12.4-kb deletion involving the CLN6 gene was the top candidate because its homozygous abnormalities cause neuronal ceroid lipofuscinosis. This deletion included the initiation codon and was found in a GC-rich region containing multiple repetitive elements. These results indicate the presence of a causal variant in a difficult-to-sequence region and suggest that such variants that remain enigmatic after the application of current whole-exome sequencing technology could be uncovered by unbiased application of long-read whole-genome sequencing.
Microsatellites (MS) are tandem repeats of short units, and have been used for population genetics, individual identification, and medical genetics. However, studies of MS on a whole-genome level are ...limited, and genotyping methods for MS have yet to be established. Here, we analyzed approximately 8.5 million MS regions using a previously developed MS caller for short reads (MIVcall method) for three large publicly available human genome sequencing data sets: the Korean Personal Genome Project, Simons Genome Diversity Project, and Human Genome Diversity Project. Our analysis identified 253,114 polymorphic MS. A comparison among different populations suggests that MS in the coding region evolved by random genetic drift and natural selection. In an analysis of genetic structures, MS clearly revealed population structures as SNPs and detected clusters that were not found by SNPs in African and Oceanian populations. Based on the MS polymorphisms, we selected MS marker candidates for individual identification. Finally, we applied our method to a deep sequenced ancient DNA sample. This study provides a comprehensive picture of MS polymorphisms and application to human population studies.
Acute encephalopathy with biphasic seizures and late reduced diffusion (AESD) is a severe encephalopathy preceded by viral infections with high fever. AESD is a multifactorial disease, however, few ...disease susceptibility genes have previously been identified. Here, we conducted a genome-wide association study (GWAS) and assessed functional variants in non-coding regions to study genetic susceptibility in AESD using 254 Japanese children with AESD and 799 adult healthy controls. We also performed a microRNA enrichment analysis using GWAS statistics to search for candidate biomarkers in AESD. The variant with the lowest p-value, rs1850440, was located in the intron of serine/threonine kinase 39 gene (STK39) on chromosome 2q24.3 (p = 2.44 × 10
, odds ratio = 1.71). The minor allele T of rs1850440 correlated with the stronger expression of STK39 in peripheral blood. This variant possessed enhancer histone modification marks in STK39, the encoded protein of which activates the p38 mitogen-activated protein kinase (MAPK) pathway. In the replication study, the odds ratios of three SNPs, including rs1850440, showed the same direction of association with that in the discovery stage GWAS. One of the candidate microRNAs identified by the microRNA enrichment analysis was associated with inflammatory responses regulated by the MAPK pathway. This study identified STK39 as a novel susceptibility locus of AESD, found microRNAs as potential biomarkers, and implicated immune responses and the MAPK cascade in its pathogenesis.
The Funadomari Jomon people were hunter-gatherers living on Rebun Island, Hokkaido, Japan c. 3500–3800 years ago. In this study, we determined the high-depth and low-depth nuclear genome sequences ...from a Funadomari Jomon female (F23) and male (F5), respectively. We genotyped the nuclear DNA of F23 and determined the human leukocyte antigen (HLA) class-I genotypes and the phenotypic traits. Moreover, a pathogenic mutation in the CPT1A gene was identified in both F23 and F5. The mutation provides metabolic advantages for consumption of a high-fat diet, and its allele frequency is more than 70% in Arctic populations, but is absent elsewhere. This variant may be related to the lifestyle of the Funadomari Jomon people, who fished and hunted land and marine animals. We observed high homozygosity by descent (HBD) in F23, but HBD tracts longer than 10 cM were very limited, suggesting that the population size of Northern Jomon populations were small. Our analysis suggested that population size of the Jomon people started to decrease c. 50000 years ago. The phylogenetic relationship among F23, modern/ancient Eurasians, and Native Americans showed a deep divergence of F23 in East Eurasia, probably before the split of the ancestor of Native Americans from East Eurasians, but after the split of 40000-year-old Tianyuan, indicating that the Northern Jomon people were genetically isolated from continental East Eurasians for a long period. Intriguingly, we found that modern Japanese as well as Ulchi, Korean, aboriginal Taiwanese, and Philippine populations were genetically closer to F23 than to Han Chinese. Moreover, the Y chromosome of F5 belonged to haplogroup D1b2b, which is rare in modern Japanese populations. These findings provided insights into the history and reconstructions of the ancient human population structures in East Eurasia, and the F23 genome data can be considered as the Jomon Reference Genome for future studies.
The origins of people in the Japanese archipelago are of long-standing interest among anthropologists, archeologists, linguists, and historians studying the history of Japan. While the ...‘dual-structure’ model proposed by Hanihara in 1991 has been considered the primary working hypothesis for three decades, recent advances in DNA typing and sequencing technologies provide an unprecedented amount of present-day and ancient human nuclear genome data, which enable us to refine or extend the dual-structure model. In this review, we summarize recent genome sequencing efforts of present-day and ancient people in Asia, mostly focusing on East Asia, and we discuss the possible migration routes and admixture patterns of Japanese ancestors. We also report on a meta-analysis we performed by compiling publicly available datasets to clarify the genetic relationships of present-day and ancient Japanese populations with surrounding populations. Because the ancient genetic data from the Japanese archipelago have not yet been fully analyzed, we have to corroborate models of prehistoric human movement using not only new genetic data but also linguistic and archeological data to reconstruct a more comprehensive history of the Japanese people.
The Eurasian house mouse Mus musculus is useful for tracing prehistorical human movement related to the spread of farming. We determined whole mitochondrial DNA (mtDNA) sequences (ca. 16,000 bp) of ...98 wild-derived individuals of two subspecies, M. m. musculus (MUS) and M. m. castaneus (CAS). We revealed directional dispersals reaching as far as the Japanese Archipelago from their homelands. Our phylogenetic analysis indicated that the eastward movement of MUS was characterised by five step-wise regional extension events: (1) broad spatial expansion into eastern Europe and the western part of western China, (2) dispersal to the eastern part of western China, (3) dispersal to northern China, (4) dispersal to the Korean Peninsula and (5) colonisation and expansion in the Japanese Archipelago. These events were estimated to have occurred during the last 2000-18,000 years. The dispersal of CAS was characterised by three events: initial divergences (ca. 7000-9000 years ago) of haplogroups in northernmost China and the eastern coast of India, followed by two population expansion events that likely originated from the Yangtze River basin to broad areas of South and Southeast Asia, including Sri Lanka, Bangladesh and Indonesia (ca. 4000-6000 years ago) and to Yunnan, southern China and the Japanese Archipelago (ca. 2000-3500). This study provides a solid framework for the spatiotemporal movement of the human-associated organisms in Holocene Eastern Eurasia using whole mtDNA sequences, reliable evolutionary rates and accurate branching patterns. The information obtained here contributes to the analysis of a variety of animals and plants associated with prehistoric human migration.
The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome ...sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659,253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r(2)>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%<MAF⩽5%) of the Japonica array reached 67.2%, which is higher than those of the existing arrays. In addition, we confirmed the high quality genotyping performance of the Japonica array using the 288 samples in 1KJPN; the average call rate 99.7% and the average concordance rate 99.7% to the genotypes obtained from high-throughput sequencer. As demonstrated in this study, the creation of custom-made SNP arrays based on a population-specific reference panel is a practical way to facilitate further association studies through genome-wide genotype imputations.