Family trees have vast applications in fields as diverse as genetics, anthropology, and economics. However, the collection of extended family trees is tedious and usually relies on resources with ...limited geographical scope and complex data usage restrictions. We collected 86 million profiles from publicly available online data shared by genealogy enthusiasts. After extensive cleaning and validation, we obtained population-scale family trees, including a single pedigree of 13 million individuals. We leveraged the data to partition the genetic architecture of human longevity and to provide insights into the geographical dispersion of families. We also report a simple digital procedure to overlay other data sets with our resource.
The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one ...of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10-15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.
Fluorescent molecular probes have become valuable tools in protein research; however, the current methods for using these probes are less suitable for analysing specific populations of proteins in ...their native environment. In this study, we address this gap by developing a unimolecular fluorescent probe that combines the properties of small-molecule-based probes and cross-reactive sensor arrays (the so-called chemical 'noses/tongues'). On the one hand, the probe can detect different proteins by generating unique identification (ID) patterns, akin to cross-reactive arrays. On the other hand, its unimolecular scaffold and selective binding enable this ID-generating probe to identify combinations of specific protein families within complex mixtures and to discriminate among isoforms in living cells, where macroscopic arrays cannot access. The ability to recycle the molecular device and use it to track several binding interactions simultaneously further demonstrates how this approach could expand the fluorescent toolbox currently used to detect and image proteins.
Nuclear mechanotransduction has been implicated in the control of chromatin organization; however, its impact on functional contractile myofibers is unclear. We found that deleting components of the ...linker of nucleoskeleton and cytoskeleton (LINC) complex in
larval muscles abolishes the controlled and synchronized DNA endoreplication, typical of nuclei across myofibers, resulting in increased and variable DNA content in myonuclei of individual myofibers. Moreover, perturbation of LINC-independent mechanical input after knockdown of β-Integrin in larval muscles similarly led to increased DNA content in myonuclei. Genome-wide RNA-polymerase II occupancy analysis in myofibers of the LINC mutant
indicated an altered binding profile, including a significant decrease in the chromatin regulator barrier-to-autointegration factor (BAF) and the contractile regulator Troponin C. Importantly, muscle-specific knockdown of BAF led to increased DNA content in myonuclei, phenocopying the LINC mutant phenotype. We propose that mechanical stimuli transmitted via the LINC complex act via BAF to regulate synchronized cell-cycle progression of myonuclei across single myofibers.
Hemifacial microsomia (HFM) is the second most common facial anomaly after cleft lip and palate. The phenotype is highly variable and most cases are sporadic. We investigated the disorder in a large ...pedigree with five affected individuals spanning eight meioses. Whole-exome sequencing results indicated the absence of a pathogenic coding point mutation. A genome-wide survey of segmental variations identified a 1.3 Mb duplication of chromosome 14q22.3 in all affected individuals that was absent in more than 1000 chromosomes of ethnically matched controls. The duplication was absent in seven additional sporadic HFM cases, which is consistent with the known heterogeneity of the disorder. To find the critical gene in the duplicated region, we analyzed signatures of human craniofacial disease networks, mouse expression data, and predictions of dosage sensitivity. All of these approaches implicated OTX2 as the most likely causal gene. Moreover, OTX2 is a known oncogenic driver in medulloblastoma, a condition that was diagnosed in the proband during the course of the study. Our findings suggest a role for OTX2 dosage sensitivity in human craniofacial development and raise the possibility of a shared etiology between a subtype of hemifacial microsomia and medulloblastoma.
Primary microcephaly is a congenital neurodevelopmental disorder of reduced head circumference and brain volume, with fewer neurons in the cortex of the developing brain due to premature transition ...between symmetrical and asymmetrical cellular division of the neuronal stem cell layer during neurogenesis. We now show through linkage analysis and whole exome sequencing, that a dominant mutation in ALFY, encoding an autophagy scaffold protein, causes human primary microcephaly. We demonstrate the dominant effect of the mutation in drosophila: transgenic flies harboring the human mutant allele display small brain volume, recapitulating the disease phenotype. Moreover, eye-specific expression of human mutant ALFY causes rough eye phenotype. In molecular terms, we demonstrate that normally ALFY attenuates the canonical Wnt signaling pathway via autophagy-dependent removal specifically of aggregates of DVL3 and not of Dvl1 or Dvl2. Thus, autophagic attenuation of Wnt signaling through removal of Dvl3 aggregates by ALFY acts in determining human brain size.
The main bottleneck for genomic studies of tumors is the limited availability of fresh frozen (FF) samples collected from patients, coupled with comprehensive long-term clinical follow-up. This ...shortage could be alleviated by using existing large archives of routinely obtained and stored Formalin-Fixed Paraffin-Embedded (FFPE) tissues. However, since these samples are partially degraded, their RNA sequencing is technically challenging.
In an effort to establish a reliable and practical procedure, we compared three protocols for RNA sequencing using pairs of FF and FFPE samples, both taken from the same breast tumor. In contrast to previous studies, we compared the expression profiles obtained from the two matched sample types, using the same protocol for both. Three protocols were tested on low initial amounts of RNA, as little as 100 ng, to represent the possibly limited availability of clinical samples. For two of the three protocols tested, poly(A) selection (mRNA-seq) and ribosomal-depletion, the total gene expression profiles of matched FF and FFPE pairs were highly correlated. For both protocols, differential gene expression between two FFPE samples was in agreement with their matched FF samples. Notably, although expression levels of FFPE samples by mRNA-seq were mainly represented by the 3'-end of the transcript, they yielded very similar results to those obtained by ribosomal-depletion protocol, which produces uniform coverage across the transcript. Further, focusing on clinically relevant genes, we showed that the high correlation between expression levels persists at higher resolutions.
Using the poly(A) protocol for FFPE exhibited, unexpectedly, similar efficiency to the ribosomal-depletion protocol, with the latter requiring much higher (2-3 fold) sequencing depth to compensate for the relative low fraction of reads mapped to the transcriptome. The results indicate that standard poly(A)-based RNA sequencing of archived FFPE samples is a reliable and cost-effective alternative for measuring mRNA-seq on FF samples. Expression profiling of FFPE samples by mRNA-seq can facilitate much needed extensive retrospective clinical genomic studies.
Rodent‐associated Bartonella species have shown a remarkable genetic diversity and pathogenic potential. To further explore the extent of the natural intraspecific genomic variation and its potential ...role as an evolutionary driver, we focused on a single genetically diverse Bartonella species, Bartonella krasnovii, which circulates among gerbils and their associated fleas. Twenty genomes from 16 different B. krasnovii genotypes were fully characterized through a genome sequencing assay (using short and long read sequencing), pulse field gel electrophoresis (PFGE), and PCR validation. Genomic analyses were performed in comparison to the B. krasnovii strain OE 1–1. While, single nucleotide polymorphism represented only a 0.3% of the genome variation, structural diversity was identified in these genomes, with an average of 51 ± 24 structural variation (SV) events per genome. Interestingly, a large proportion of the SVs (>40%) was associated with prophages. Further analyses revealed that most of the SVs, and prophage insertions were found at the chromosome replication termination site (ter), suggesting this site as a plastic zone of the B. krasnovii chromosome. Accordingly, six genomes were found to be unbalanced, and essential genes near the ter showed a shift between the leading and lagging strands, revealing the SV effect on these genomes. In summary, our findings demonstrate the extensive genomic diversity harbored by wild B. krasnovii strains and suggests that its diversification is initially promoted by structural changes, probably driven by phages. These events may constantly feed the system with novel genotypes that ultimately lead to inter‐ and intraspecies competition and adaptation.
Abnormal molecular processes occurring throughout the genome leave distinct somatic mutational patterns termed mutational signatures. Exploring the associations between mutational signatures and ...clinicopathological features can unravel potential mechanisms driving tumorigenic processes.
We analyzed whole genome sequencing (WGS) data of tumor and peripheral blood samples from 37 primary breast cancer (BC) patients receiving neoadjuvant chemotherapy. Comprehensive clinico‐pathologic features were correlated with genomic profiles and mutational signatures.
Somatic mutational landscapes were highly concordant with known BC data sets. Remarkably, we observed a divergence of dominant mutational signatures in association with BC subtype. Signature 5 was overrepresented in hormone receptor positive (HR+) patients, whereas triple‐negative tumors mostly lacked Signature 5, but expectedly overrepresented Signature 3. We validated these findings in a large WGS data set of BC, demonstrating dominance of Signature 5 in HR+ patients, mostly in luminal A subtype. We further investigated the association between Signature 5 and gene expression signatures, and identified potential networks, likely related to estrogen regulation.
Our results suggest that the yet elusive Signature 5 represents an alternative mechanism for mutation accumulation in HR+ BC, independent of the homologous recombination repair machinery related to Signature 3. This study provides theoretical basis for further elucidating the processes promoting hormonal breast carcinogenesis.
We investigated the pathophysiology of diet-induced diabetes in the Cohen diabetic rat (CDs/y) from its induction to its chronic phase, using a multi-layered integrated genomic approach. We ...identified by linkage analysis two diabetes-related quantitative trait loci on RNO4 and RNO13. We determined their functional contribution to diabetes by chromosomal substitution, using congenic and consomic strains. To identify within these loci genes of relevance to diabetes, we sequenced the genome of CDs/y and compared it to 25 other rat strains. Within the RNO4 locus, we detected a novel high impact deletion in the
gene that was unique to CDs/y. Within the RNO13 locus, we found multiple SNPs and INDELs that were unique to CDs/y but were unable to prioritize any of the genes. Genome wide screening identified a novel third locus not detected by linkage analysis that consisted of a novel high impact deletion on RNO11 that was unique to CDs/y and that involved the
gene. Using co-segregation analysis, we investigated in silico the relative contribution to the diabetic phenotype and the interaction between the three genomic loci on RNO4, RNO11 and RNO13. We found that the RNO4 locus plays a major role during the induction of diabetes, whereas the genomic loci on RNO13 and RNO11, while interacting with the RNO4 locus, contribute more significantly to the diabetic phenotype during the chronic phase of the disease. The mechanisms whereby the mutations on RNO4 and 11 and the RNO13 locus contribute to the development of diabetes are under continuing investigation.