The main paper has nearly 450 authors, working from more than 30 institutions. Because of its complexity (see page 46), the project could not have worked in the same way as one involving just one or ...two laboratories. Early data-release policies focused on how data should be shared before publication, with clumsy etiquette-based restrictions on the first publications of global analysis, such as waiting for the authors who generated the data to publish their analyses before others can publish on the entire data set.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, KISLJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Quantifying the contribution of genetics and environmental effects on disease initiation and progression, as well as the shared genetics of different diseases, is vital for the understanding of the ...disease etiology of multimorbidities. In this study, we leverage nationwide Danish registries to provide a granular atlas of the genetic origin of disease phenotypes for a cohort of all Danes 1978–2018 with partially known pedigree (n = 6.3 million). We estimate the heritability and genetic correlation between thousands of disease phenotypes using a novel approach that can be scaled to nationwide data. Our findings confirm the importance of genetics for a number of known associations and increase the resolution of heritability by adding numerous associations, some of which point to shared biologically origin of different phenotypes. We also establish the heritability of disease trajectories and the importance of sex-specific genetic contributions. Results can be accessed at https://h2.cpr.ku.dk/.Discovering the genetic roots of diseases is a major question in genetic research. Here, the authors shed light on pleiotropy across diseases using a new method, scaled to millions of individuals, applied to single diseases and disease trajectories.
Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ...ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.
Despite the complete determination of the genome sequence of several higher eukaryotes, their proteomes remain relatively poorly defined. Information about proteins identified by different ...experimental and computational methods is stored in different databases, meaning that no single resource offers full coverage of known and predicted proteins. IPI (the International Protein Index) has been developed to address these issues and offers complete nonredundant data sets representing the human, mouse and rat proteomes, built from the Swiss‐Prot, TrEMBL, Ensembl and RefSeq databases.
...researchers and clinicians need to be able to consider both genetic and non-genetic risk factors (for type 2 diabetes, for example, these would encompass hundreds of genetic markers and measures ...of diet, exercise and socio-economic status alongside measures of current clinical state, such as glucose levels). ...the field needs to move away from its tendency to collapse all these rich, individual-level data into rigid clinical categories. ...to the rare, high-impact genetic variants that underlie diseases such as cystic fibrosis and sickle-cell anaemia, these generally have subtle effects that limit their clinical value when considered one at a time. ...rather than all women starting to have annual mammography screening at 45 years old (as currently recommended by the American Cancer Society), polygenic scores for breast cancer risk could be used to tailor schedules so that women with the highest genetic risk are screened earlier and more intensively than are those with below-average risk6.
The extent to which variation in chromatin structure and transcription factor binding may influence gene expression, and thus underlie or contribute to variation in phenotype, is unknown. To address ...this question, we cataloged both individual-to-individual variation and differences between homologous chromosomes within the same individual (allele-specific variation) in chromatin structure and transcription factor binding in lymphoblastoid cells derived from individuals of geographically diverse ancestry. Ten percent of active chromatin sites were individual-specific; a similar proportion were allele-specific. Both individual-specific and allele-specific sites were commonly transmitted from parent to child, which suggests that they are heritable features of the human genome. Our study shows that heritable chromatin status and transcription factor binding differ as a result of genetic variation and may underlie phenotypic variation in humans.
Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are ...an obstacle to large assemblies.
We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly.
These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation ...data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics.
The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl.
Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The human retroviruses HTLV-1 (human T cell leukemia virus type 1) and HIV-1 persist in vivo as a reservoir of latently infected T cell clones. It is poorly understood what determines which clones ...survive in the reservoir. We compared >160,000 HTLV-1 integration sites (>40,000 HIV-1 sites) from T cells isolated ex vivo from naturally infected individuals with >230,000 HTLV-1 integration sites (>65,000 HIV-1 sites) from in vitro infection to identify genomic features that determine selective clonal survival. Three statistically independent factors together explained >40% of the observed variance in HTLV-1 clonal survival in vivo: the radial intranuclear position of the provirus, its genomic distance from the centromere, and the intensity of local host genome transcription. The radial intranuclear position of the provirus and its distance from the centromere also explained ~7% of clonal persistence of HIV-1 in vivo. Selection for the intranuclear and intrachromosomal location of the provirus and host transcription intensity favors clonal persistence of human retroviruses in vivo.