In genome medicine, which is now being implemented in medical care, variants detected by genome analysis such as next-generation sequencers are clinically interpreted to determine the diagnosis and ...treatment plan. The clinical interpretation is performed based on the detailed clinical background and the information from journal papers and public databases, such as frequencies in the population and their relationship to the disease. A large amount of genomic data has been accumulated so far, and many genomic variant databases related to diseases have been developed, including ClinVar. On the other hand, the genes and variants involved in diseases are different between populations with different genetic backgrounds. Furthermore, it has been reported that there is a racial bias in the information shared in current public databases, which affects clinical interpretation. Therefore, increasing the diversity of genomic variant data has become an important issue worldwide. In Japan, the Japan Agency for Medical Research and Development (AMED) launched a project to develop an integrated clinical genome information database in 2016. This project targeted “Cancer,” “Rare/Intractable diseases,” “Infectious diseases,” “Dementia,” and “Hearing loss”, and in collaboration with research institutes that provide genomic medicine in Japan, we developed an integrated database named MGeND (Medical Genomics Japan Database). The MGeND is a freely accessible database, which provides disease-related genomic information detected from the Japanese population. The MGeND widely collects variant data for monogenic diseases represented by rare diseases and polygenic diseases such as dementia and infectious disease. The genome variant data are integrated by genomic position for these diseases and can be searched across diseases. The useful genome analysis methods differ depending on the disease area. Therefore, in addition to “SNV, short indel, SV, and CNV” data handled by ClinVar, MGeND includes GWAS (Genome-Wide Association Study) data, which is widely used in studies of polygenic diseases, and HLA (Human Leukemia Virus) allele frequency data, which is used in immune-related diseases such as infectious diseases. As of September 2021, more than 150,000 variants have been registered in MGeND, and 60,000 unique variants have been made public. Of these variants, about 70% were variants registered only in MGeND and not registered in ClinVar. This fact shows the importance of the efforts to collect genomic information by each ethnic group. On the other hands, many variants have not been annotated with any clinical interpretation because the effects on molecular function and the mechanisms of disease are not clear at this time. These variants of uncertain significance (VUS) are a bottleneck for genomic medicine because they cannot be used for diagnosis or treatment selection. The evaluation of VUS requires detailed experimental validation and a vast amount of knowledge integration, which is costly. In order to understand the molecular function and disease relevance of VUS and to enable optimal drug selection, we have been developing a machine learning-based method for predicting the pathogenicity of variants and a computational platform for estimating the effect of variants on drug sensitivity. Many methods for predicting the pathogenicity of genomic variants using machine learning have been developed. Most of them use the conservation of amino acid or nucleotide sequences among closely related species, physicochemical properties of proteins as features for prediction. There are also many prediction methods based on ensemble learning that aggregate the predicted scores by existing tools. These approaches focus on individual genes and variants and evaluate their effects. However, in many diseases, multiple molecules play a complex role in the pathogenesis of the disease. In other words, to assess the pathological significance of variants more accurately, it is necessary to consider the molecular association. Therefore, we constructed a knowledge graph based on molecular networks, genomic variants, and predicted scores by existing methods and proposed a prediction model using Graph Convolutional Network (GCN). The prediction performance evaluation using a benchmark set showed that the GCN-based method outperformed existing methods. It is known that variants can affect the interaction between a molecule and a drug. For optimal drug selection, it is necessary to clarify the effect of the variant on drug affinity. It is time-consuming and costly to perform experiments on a large number of VUSs. Our previous studies show that molecular dynamics calculation can evaluate the affinity between mutants and drugs energetically and estimate with high accuracy. We are currently working on a project to estimate the effects of a large number of VUSs using the supercomputer Fugaku. To realize calculations for many VUS in this project, we are developing a data platform for seamlessly performing molecular dynamics simulation from genome information. Moreover, we are constructing a database to publish calculation results and their outcomes for contributing a selection of optimal drugs. In the presentation, I will introduce the development of the databases and prediction methods to improve the efficiency of genomic medicine.
Osimertinib has been demonstrated to overcome the epidermal growth factor receptor (EGFR)-T790M, the most relevant acquired resistance to first-generation EGFR-tyrosine kinase inhibitors (EGFR-TKIs). ...However, the C797S mutation, which impairs the covalent binding between the cysteine residue at position 797 of EGFR and osimertinib, induces resistance to osimertinib. Currently, there are no effective therapeutic strategies to overcome the C797S/T790M/activating-mutation (triple-mutation)-mediated EGFR-TKI resistance. In the present study, we identify brigatinib to be effective against triple-mutation-harbouring cells in vitro and in vivo. Our original computational simulation demonstrates that brigatinib fits into the ATP-binding pocket of triple-mutant EGFR. The structure-activity relationship analysis reveals the key component in brigatinib to inhibit the triple-mutant EGFR. The efficacy of brigatinib is enhanced markedly by combination with anti-EGFR antibody because of the decrease of surface and total EGFR expression. Thus, the combination therapy of brigatinib with anti-EGFR antibody is a powerful candidate to overcome triple-mutant EGFR.
In cancer genomic medicine, finding driver mutations involved in cancer development and tumor growth is crucial. Machine-learning methods to predict driver missense mutations have been developed ...because variants are frequently detected by genomic sequencing. However, even though the abnormalities in molecular networks are associated with cancer, many of these methods focus on individual variants and do not consider molecular networks. Here we propose a new network-based method, Net-DMPred, to predict driver missense mutations considering molecular networks. Net-DMPred consists of the graph part and the prediction part. In the graph part, molecular networks are learned by a graph neural network (GNN). The prediction part learns whether variants are driver variants using features of individual variants combined with the graph features learned in the graph part. Net-DMPred, which considers molecular networks, performed better than conventional methods. Furthermore, the prediction performance differed by the molecular network structure used in learning, suggesting that it is important to consider not only the local network related to cancer but also the large-scale network in living organisms. We propose a network-based machine learning method, Net-DMPred, for predicting cancer driver missense mutations. Our method enables us to consider the entire graph architecture representing the molecular network because it uses GNN. Net-DMPred is expected to detect driver mutations from a lot of missense mutations that are not known to be associated with cancer.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Recent effective therapies enable most rheumatoid arthritis (RA) patients to achieve remission; however, some patients experience relapse. We aimed to predict relapse in RA patients through machine ...learning (ML) using data on ultrasound (US) examination and blood test. Overall, 210 patients with RA in remission at baseline were dichotomized into remission (n = 150) and relapse (n = 60) based on the disease activity at 2-year follow-up. Three ML classifiers Logistic Regression, Random Forest, and extreme gradient boosting (XGBoost) and data on 73 features (14 US examination data, 54 blood test data, and five data on patient information) at baseline were used for predicting relapse. The best performance was obtained using the XGBoost classifier (area under the receiver operator characteristic curve (AUC) = 0.747), compared with Random Forest and Logistic Regression (AUC = 0.719 and 0.701, respectively). In the XGBoost classifier prediction, ten important features, including wrist/metatarsophalangeal superb microvascular imaging scores, were selected using the recursive feature elimination method. The performance was superior to that predicted by researcher-selected features, which are conventional prognostic markers. These results suggest that ML can provide an accurate prediction of relapse in RA patients, and the use of predictive algorithms may facilitate personalized treatment options.
Previous studies have reported genome-wide mutation profile analyses in ovarian clear cell carcinomas (OCCCs). This study aims to identify specific novel molecular alterations by combined analyses of ...somatic mutation and copy number variation. We performed whole exome sequencing of 39 OCCC samples with 16 matching blood tissue samples. Four hundred twenty-six genes had recurrent somatic mutations. Among the 39 samples, ARID1A (62%) and PIK3CA (51%) were frequently mutated, as were genes such as KRAS (10%), PPP2R1A (10%), and PTEN (5%), that have been reported in previous OCCC studies. We also detected mutations in MLL3 (15%), ARID1B (10%), and PIK3R1 (8%), which are associations not previously reported. Gene interaction analysis and functional assessment revealed that mutated genes were clustered into groups pertaining to chromatin remodeling, cell proliferation, DNA repair and cell cycle checkpointing, and cytoskeletal organization. Copy number variation analysis identified frequent amplification in chr8q (64%), chr20q (54%), and chr17q (46%) loci as well as deletion in chr19p (41%), chr13q (28%), chr9q (21%), and chr18q (21%) loci. Integration of the analyses uncovered that frequently mutated or amplified/deleted genes were involved in the KRAS/phosphatidylinositol 3-kinase (82%) and MYC/retinoblastoma (75%) pathways as well as the critical chromatin remodeling complex switch/sucrose nonfermentable (85%). The individual and integrated analyses contribute details about the OCCC genomic landscape, which could lead to enhanced diagnostics and therapeutic options.
Tumors demonstrating deficient mismatch repair (dMMR) account for 12%–15% of colorectal cancers (CRCs), but their characteristics have not been fully elucidated. The aim of this study was to ...characterize dMMR CRCs in terms of clinicopathological findings and molecular alterations. Immunostaining for mismatch repair (MMR) proteins was performed to determine MMR status, and then MLH1 promoter methylation and genetic variants of 25 genes involved in colorectal carcinogenesis were analyzed by next-generation sequencing in dMMR tumors. Coexistence of precancerous lesions was histologically evaluated to characterize the type of precursors. Immunohistochemistry revealed 34 dMMR tumors in 492 CRCs. Among dMMR CRCs, there were 25 MLH1 methylation-positive, 16 BRAF V600E variant-positive, and 7 KRAS variant-positive tumors. Positive MLH1 methylation was associated with BRAF V600E, older age, and right-side tumor location. MLH1 methylated BRAF/KRAS wild-type tumors were distinct in that all 5 tumors possessed variants in ligand-independent WNT signaling genes including APC, AXIN2, and CTNNB1. Among 10 dMMR CRCs that presented with precancerous lesions, 4 BRAF variant-positive, 1 KRAS variant-positive, and 2 BRAF/KRAS wild-type MLH1 methylated tumors coexisted with serrated lesions, whereas 1 MLH1 methylated BRAF/KRAS wild-type tumor and 2 MLH1 unmethylated tumors accompanied conventional adenomas. The present study characterized distinct subgroups of dMMR CRCs based on molecular alterations including MLH1 methylation and variants in BRAF, KRAS, and ligand-independent WNT signaling genes. The existence of distinct precursor lesions including serrated lesion and conventional adenoma further illustrates the involvement of heterogeneous carcinogenetic pathways in the development of dMMR CRCs.
•Distinct molecular alterations characterize subgroups of dMMR colorectal cancers.•Ligand-independent WNT signaling genes contribute differently among these subgroups.•Heterogeneous pathways are involved in the carcinogenesis of dMMR colorectal cancers.
Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of ...extracellular poly-γ-glutamic acid (γPGA), we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food) starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from "Tua Nao" of Thailand traces a different evolutionary process from other strains.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS) platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis ...natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food "natto." The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
We aimed to examine the association between homologous recombination repair (HRR)-related gene mutations and efficacy of oxaliplatin-based chemotherapy in patients with pancreatic ductal ...adenocarcinoma (PDAC).
Non-synonymous mutations in HRR-related genes were found in 13 patients and only one patient had a family history of pancreatic cancer. Eight patients with HRR-related gene mutations (group A) and nine without HRR-related gene mutations (group B) received oxaliplatin-based chemotherapy. Median progression-free survival after initiation of oxaliplatin-based chemotherapy was significantly longer in group A than in group B (20.8 months vs 1.7 months,
= 0.049). Interestingly, two patients with inactivating HRR-related gene mutations who received FOLFIRINOX as first-line treatment showed exceptional responses with respect to progression-free survival for > 24 months.
Complete coding exons of 12 HRR-related genes (
and
) were sequenced using a Clinical Laboratory Improvement Amendment-certified multiplex next-generation sequencing assay. Thirty consecutive PDAC patients who underwent this assay between April 2015 and July 2017 were included.
Our results suggest that inactivating HRR-related gene mutations are predictive of response to oxaliplatin-based chemotherapy in patients with PDAC.
Machine learning methods are nowadays used for many biological prediction problems involving drugs, ligands or polypeptide segments of a protein. In order to build a prediction model a so called ...training data set of molecules with measured target properties is needed. For many such problems the size of the training data set is limited as measurements have to be performed in a wet lab. Furthermore, the considered problems are often complex, such that it is not clear which molecular descriptors (features) may be suitable to establish a strong correlation with the target property. In many applications all available descriptors are used. This can lead to difficult machine learning problems, when thousands of descriptors are considered and only few (e.g. below hundred) molecules are available for training.
The CoEPrA contest provides four data sets, which are typical for biological regression problems (few molecules in the training data set and thousands of descriptors). We applied the same two-step training procedure for all four regression tasks. In the first stage, we used optimized L1 regularization to select the most relevant features. Thus, the initial set of more than 6,000 features was reduced to about 50. In the second stage, we used only the selected features from the preceding stage applying a milder L2 regularization, which generally yielded further improvement of prediction performance. Our linear model employed a soft loss function which minimizes the influence of outliers.
The proposed two-step method showed good results on all four CoEPrA regression tasks. Thus, it may be useful for many other biological prediction problems where for training only a small number of molecules are available, which are described by thousands of descriptors.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK