•New technical developments should facilitate that data are FAIR.•Implementation of common ontologies and data harmonization should enable data re-use.•Environment and Health (E&H) data should be ...part of the European Open Science Cloud.•Interconnect and synergize different research domains for optimal use of data.•An EU level E&H data research infrastructure is needed to support E&H research.
Management of datasets that include health information and other sensitive personal information of European study participants has to be compliant with the General Data Protection Regulation (GDPR, Regulation (EU) 2016/679). Within scientific research, the widely subscribed’FAIR’ data principles should apply, meaning that research data should be findable, accessible, interoperable and re-usable. Balancing the aim of open science driven FAIR data management with GDPR compliant personal data protection safeguards is now a common challenge for many research projects dealing with (sensitive) personal data.
In December 2020 a workshop was held with representatives of several large EU research consortia and of the European Commission to reflect on how to apply the FAIR data principles for environment and health research (E&H). Several recent data intensive EU funded E&H research projects face this challenge and work intensively towards developing solutions to access, exchange, store, handle, share, process and use such sensitive personal data, with the aim to support European and transnational collaborations. As a result, several recommendations, opportunities and current limitations were formulated.
New technical developments such as federated data management and analysis systems, machine learning together with advanced search software, harmonized ontologies and data quality standards should in principle facilitate the FAIRification of data. To address ethical, legal, political and financial obstacles to the wider re-use of data for research purposes, both specific expertise and underpinning infrastructure are needed. There is a need for the E&H research data to find their place in the European Open Science Cloud. Communities using health and population data, environmental data and other publicly available data have to interconnect and synergize. To maximize the use and re-use of environment and health data, a dedicated supporting European infrastructure effort, such as the EIRENE research infrastructure within the ESFRI roadmap 2021, is needed that would interact with existing infrastructures.
The
genome comprises 263 Mb and 34,240 gene models organized in 20 different chromosomes. To improve our understanding of gene function we have generated an EMS mutant platform, consisting of 3,751 ...independent M2 families. The quality of the collection has been evaluated based on phenotyping and whole-genome re-sequencing (WGS) results. The phenotypic evaluation of the whole platform at seedling stage has demonstrated that the rate of variation for easily observable traits is more than 10%. The percentage of families with albino or chlorotic seedlings exceeded 3%, similar or higher to that found in other EMS collections of cucurbit crops. A rapid screening of the library for triple ethylene response in etiolated seedlings allowed the identification of four ethylene-insensitive mutants, that were found to be semidominant (
,
, and
) or dominant (
). By evaluating 4 adult plants from 300 independent families more than 28% of apparent mutations were found for vegetative and reproductive traits, including plant vigor, leaf size and shape, sex expression and sex determination, and fruit set and development. Two pools of genomic DNA derived from 20 plants of two mutant families were subjected to WGS by using NGS methodology, estimating the density, spectrum, distribution and impact of EMS induced mutation. The number of EMS mutations in the genomes of families L1 and L2 was 1,704 and 859, respectively, which represents a density of 11.8 and 6 mutations per Mb, respectively. As expected, the predominant EMS induced mutations were C > T and G > A transitions (80.3% in L1, and 61% L2), that were found to be randomly distributed along the 20 chromosomes of
. The mutations were mostly affecting intergenic regions, but 7.9 and 6% of the identified EMS mutations in L1 and L2, respectively, were located in the exome, and 0.4 and 0.2% had a moderate and high putative impact on gene functions. These results provide information regarding the potential use of the obtained mutant platform in the discovery of novel alleles for both functional genomics and
breeding by using direct- or reverse-genetic approaches.
Since 72% of rare diseases are genetic in origin and mostly paediatrics, genetic newborn screening represents a diagnostic "window of opportunity". Therefore, many gNBS initiatives started in ...different European countries. Screen4Care is a research project, which resulted of a joint effort between the European Union Commission and the European Federation of Pharmaceutical Industries and Associations. It focuses on genetic newborn screening and artificial intelligence-based tools which will be applied to a large European population of about 25.000 infants. The neonatal screening strategy will be based on targeted sequencing, while whole genome sequencing will be offered to all enrolled infants who may show early symptoms but have resulted negative at the targeted sequencing-based newborn screening. We will leverage artificial intelligence-based algorithms to identify patients using Electronic Health Records (EHR) and to build a repository "symptom checkers" for patients and healthcare providers. S4C will design an equitable, ethical, and sustainable framework for genetic newborn screening and new digital tools, corroborated by a large workout where legal, ethical, and social complexities will be addressed with the intent of making the framework highly and flexibly translatable into the diverse European health systems.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The area of the Spanish Pyrenees is particularly interesting for studying the demographic dynamics of European rural areas given its orography, the main traditional rural condition of its population ...and the reported higher patterns of consanguinity of the region. Previous genetic studies suggest a gradient of genetic continuity of the area in the West to East axis. However, it has been shown that micro-population substructure can be detected when considering high-quality NGS data and using spatial explicit methods. In this work, we have analyzed the genome of 30 individuals sequenced at 40× from five different valleys in the Spanish Eastern Pyrenees (SEP) separated by less than 140 km along a west to east axis. Using haplotype-based methods and spatial analyses, we have been able to detect micro-population substructure within SEP not seen in previous studies. Linkage disequilibrium and autozygosity analyses suggest that the SEP populations show diverse demographic histories. In agreement with these results, demographic modeling by means of ABC-DL identify heterogeneity in their effective population sizes despite of their close geographic proximity, and suggests that the population substructure within SEP could have appeared around 2500 years ago. Overall, these results suggest that each rural population of the Pyrenees could represent a unique entity.
Somatic single-nucleotide variants (SNVs) occur every time a cell divides, appearing even in healthy tissues at low frequencies. These mutations may accumulate as neutral variants during aging, or ...eventually, promote the development of neoplasia. Here, we present the SP-ddPCR, a droplet digital PCR (ddPCR) based approach that utilizes customized SuperSelective primers aiming at quantifying the proportion of rare SNVs. For that purpose, we selected five potentially pathogenic variants identified by whole-exome sequencing (WES) occurring at low variant allele frequency (VAF) in at-risk colon healthy mucosa of patients diagnosed with colorectal cancer or advanced adenoma. Additionally, two APC SNVs detected in two cancer lesions were added to the study for WES-VAF validation. SuperSelective primers were designed to quantify SNVs at low VAFs both in silico and in clinical samples. In addition to the two APC SNVs in colonic lesions, SP-ddPCR confirmed the presence of three out of five selected SNVs in the normal colonic mucosa with allelic frequencies ≤ 5%. Moreover, SP-ddPCR showed the presence of two potentially pathogenic variants in the distal normal mucosa of patients with colorectal carcinoma. In summary, SP-ddPCR offers a rapid and feasible methodology to validate next-generation sequencing data and accurately quantify rare SNVs, thus providing a potential tool for diagnosis and stratification of at-risk patients based on their mutational profiling.
Epimutations are rare alterations of the normal DNA methylation pattern at specific loci, which can lead to rare diseases. Methylation microarrays enable genome-wide epimutation detection, but ...technical limitations prevent their use in clinical settings: methods applied to rare diseases' data cannot be easily incorporated to standard analyses pipelines, while epimutation methods implemented in R packages (ramr) have not been validated for rare diseases. We have developed epimutacions, a Bioconductor package (
https://bioconductor.org/packages/release/bioc/html/epimutacions.html
). epimutacions implements two previously reported methods and four new statistical approaches to detect epimutations, along with functions to annotate and visualize epimutations. Additionally, we have developed an user-friendly Shiny app to facilitate epimutations detection (
https://github.com/isglobal-brge/epimutacionsShiny
) to non-bioinformatician users. We first compared the performance of epimutacions and ramr packages using three public datasets with experimentally validated epimutations. Methods in epimutacions had a high performance at low sample sizes and outperformed methods in ramr. Second, we used two general population children cohorts (INMA and HELIX) to determine the technical and biological factors that affect epimutations detection, providing guidelines on how designing the experiments or preprocessing the data. In these cohorts, most epimutations did not correlate with detectable regional gene expression changes. Finally, we exemplified how epimutacions can be used in a clinical context. We run epimutacions in a cohort of children with autism disorder and identified novel recurrent epimutations in candidate genes for autism. Overall, we present epimutacions a new Bioconductor package for incorporating epimutations detection to rare disease diagnosis and provide guidelines for the design and data analyses.
ABSTRACT
Inactivating mutations in the BCKDK gene, which codes for the kinase responsible for the negative regulation of the branched‐chain α‐keto acid dehydrogenase complex (BCKD), have recently ...been associated with a form of autism in three families. In this work, two novel exonic BCKDK mutations, c.520C>G/p.R174G and c.1166T>C/p.L389P, were identified at the homozygous state in two unrelated children with persistently reduced body fluid levels of branched‐chain amino acids (BCAAs), developmental delay, microcephaly, and neurobehavioral abnormalities. Functional analysis of the mutations confirmed the missense character of the c.1166T>C change and showed a splicing defect r.520c>g;521_543del/p.R174Gfs1*, for c.520C>G due to the presence of a new donor splice site. Mutation p.L389P showed total loss of kinase activity. Moreover, patient‐derived fibroblasts showed undetectable (p.R174Gfs1*) or barely detectable (p.L389P) levels of BCKDK protein and its phosphorylated substrate (phospho‐E1α), resulting in increased BCKD activity and the very rapid BCAA catabolism manifested by the patients’ clinical phenotype. Based on these results, a protein‐rich diet plus oral BCAA supplementation was implemented in the patient homozygous for p.R174Gfs1*. This treatment normalized plasma BCAA levels and improved growth, developmental and behavioral variables. Our results demonstrate that BCKDK mutations can result in neurobehavioral deficits in humans and support the rationale for dietary intervention.
Rare diseases are individually rare but globally affect around 6% of the population, and in over 70% of cases are genetically determined. Their rarity translates into a delayed diagnosis, with 25% of ...patients waiting 5 to 30 years for one. It is essential to raise awareness of patients and clinicians of existing gene and variant-specific therapeutics at the time of diagnosis to avoid that treatment delays add up to the diagnostic odyssey of rare diseases' patients and their families.
This paper aims to provide guidance and give detailed instructions on how to write homogeneous systematic reviews of rare diseases' treatments in a manner that allows the capture of the results in a computer-accessible form. The published results need to comply with the FAIR guiding principles for scientific data management and stewardship to facilitate the extraction of datasets that are easily transposable into machine-actionable information. The ultimate purpose is the creation of a database of rare disease treatments ("Treatabolome") at gene and variant levels as part of the H2020 research project Solve-RD.
Each systematic review follows a written protocol to address one or more rare diseases in which the authors are experts. The bibliographic search strategy requires detailed documentation to allow its replication. Data capture forms should be built to facilitate the filling of a data capture spreadsheet and to record the application of the inclusion and exclusion criteria to each search result. A PRISMA flowchart is required to provide an overview of the processes of search and selection of papers. A separate table condenses the data collected during the Systematic Review, appraised according to their level of evidence.
This paper provides a template that includes the instructions for writing FAIR-compliant systematic reviews of rare diseases' treatments that enables the assembly of a Treatabolome database that complement existing diagnostic and management support tools with treatment awareness data.
Somatic mutations occur at early stages of adenoma and accumulate throughout colorectal cancer progression. The aim of this study was to characterize the mutational landscape of stage II tumors and ...to search for novel recurrent mutations likely implicated in colorectal cancer tumorigenesis.
The exomic DNA of 42 stage II, microsatellite-stable colon tumors and their paired mucosae were sequenced. Other molecular data available in the discovery dataset gene expression, methylation, and copy number variations (CNV) were used to further characterize these tumors. Additional datasets comprising 553 colorectal cancer samples were used to validate the discovered mutations.
As a result, 4,886 somatic single-nucleotide variants (SNV) were found. Almost all SNVs were private changes, with few mutations shared by more than one tumor, thus revealing tumor-specific mutational landscapes. Nevertheless, these diverse mutations converged into common cellular pathways, such as cell cycle or apoptosis. Among this mutational heterogeneity, variants resulting in early stop codons in the AMER1 (also known as FAM123B or WTX) gene emerged as recurrent mutations in colorectal cancer. Losses of AMER1 by other mechanisms apart from mutations such as methylation and copy number aberrations were also found. Tumors lacking this tumor suppressor gene exhibited a mesenchymal phenotype characterized by inhibition of the canonical Wnt pathway.
In silico and experimental validation in independent datasets confirmed the existence of functional mutations in AMER1 in approximately 10% of analyzed colorectal cancer tumors. Moreover, these tumors exhibited a characteristic phenotype.
Colorectal cancer(CRC)is one of the most frequent neoplasms and an important cause of mortality in the developed world.This cancer is caused by both genetic and environmental factors although 35%of ...the variation in CRC susceptibility involves inherited genetic differences.Mendelian syndromes account for about5%of the total burden of CRC,with Lynch syndrome and familial adenomatous polyposis the most common forms.Excluding hereditary forms,there is an important fraction of CRC cases that present familial aggregation for the disease with an unknown germline genetic cause.CRC can be also considered as a complex disease taking into account the common diseasecommom variant hypothesis with a polygenic model of inheritance where the genetic components of common complex diseases correspond mostly to variants of low/moderate effect.So far,30 common,low-penetrance susceptibility variants have been identified for CRC.Recently,new sequencing technologies including exomeand whole-genome sequencing have permitted to add a new approach to facilitate the identification of new genes responsible for human disease predisposition.By using whole-genome sequencing,germline mutations in the POLE and POLD1 genes have been found to be responsible for a new form of CRC genetic predisposition called polymerase proofreading-associated polyposis.