Mammalian pre-implantation development is a complex process involving dramatic changes in the transcriptional architecture. We report here a comprehensive analysis of transcriptome dynamics from ...oocyte to morula in both human and mouse embryos, using single-cell RNA sequencing. Based on single-nucleotide variants in human blastomere messenger RNAs and paternal-specific single-nucleotide polymorphisms, we identify novel stage-specific monoallelic expression patterns for a significant portion of polymorphic gene transcripts (25 to 53%). By weighted gene co-expression network analysis, we find that each developmental stage can be delineated concisely by a small number of functional modules of co-expressed genes. This result indicates a sequential order of transcriptional changes in pathways of cell cycle, gene regulation, translation and metabolism, acting in a step-wise fashion from cleavage to morula. Cross-species comparisons with mouse pre-implantation embryos reveal that the majority of human stage-specific modules (7 out of 9) are notably preserved, but developmental specificity and timing differ between human and mouse. Furthermore, we identify conserved key members (or hub genes) of the human and mouse networks. These genes represent novel candidates that are likely to be key in driving mammalian pre-implantation development. Together, the results provide a valuable resource to dissect gene regulatory mechanisms underlying progressive development of early mammalian embryos.
Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, ...reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied.
We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways.
The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools.
Symptoms of Major Depressive Disorder (MDD) are hypothesized to arise from dysfunction in brain networks linking the limbic system and cortical regions. Alterations in brain functional cortical ...connectivity in resting-state networks have been detected with functional imaging techniques, but neurophysiologic connectivity measures have not been systematically examined. We used weighted network analysis to examine resting state functional connectivity as measured by quantitative electroencephalographic (qEEG) coherence in 121 unmedicated subjects with MDD and 37 healthy controls. Subjects with MDD had significantly higher overall coherence as compared to controls in the delta (0.5-4 Hz), theta (4-8 Hz), alpha (8-12 Hz), and beta (12-20 Hz) frequency bands. The frontopolar region contained the greatest number of "hub nodes" (surface recording locations) with high connectivity. MDD subjects expressed higher theta and alpha coherence primarily in longer distance connections between frontopolar and temporal or parietooccipital regions, and higher beta coherence primarily in connections within and between electrodes overlying the dorsolateral prefrontal cortical (DLPFC) or temporal regions. Nearest centroid analysis indicated that MDD subjects were best characterized by six alpha band connections primarily involving the prefrontal region. The present findings indicate a loss of selectivity in resting functional connectivity in MDD. The overall greater coherence observed in depressed subjects establishes a new context for the interpretation of previous studies showing differences in frontal alpha power and synchrony between subjects with MDD and normal controls. These results can inform the development of qEEG state and trait biomarkers for MDD.
Copy-number variants (CNVs) are a major contributor to the pathophysiology of autism spectrum disorders (ASDs), but the functional impact of CNVs remains largely unexplored. Because brain tissue is ...not available from most samples, we interrogated gene expression in lymphoblasts from 244 families with discordant siblings in the Simons Simplex Collection in order to identify potentially pathogenic variation. Our results reveal that the overall frequency of significantly misexpressed genes (which we refer to here as outliers) identified in probands and unaffected siblings does not differ. However, in probands, but not their unaffected siblings, the group of outlier genes is significantly enriched in neural-related pathways, including neuropeptide signaling, synaptogenesis, and cell adhesion. We demonstrate that outlier genes cluster within the most pathogenic CNVs (rare de novo CNVs) and can be used for the prioritization of rare CNVs of potentially unknown significance. Several nonrecurrent CNVs with significant gene-expression alterations are identified (these include deletions in chromosomal regions 3q27, 3p13, and 3p26 and duplications at 2p15), suggesting that these are potential candidate ASD loci. In addition, we identify distinct expression changes in 16p11.2 microdeletions, 16p11.2 microduplications, and 7q11.23 duplications, and we show that specific genes within the 16p CNV interval correlate with differences in head circumference, an ASD-relevant phenotype. This study provides evidence that pathogenic structural variants have a functional impact via transcriptome alterations in ASDs at a genome-wide level and demonstrates the utility of integrating gene expression with mutation data for the prioritization of genes disrupted by potentially pathogenic mutations.
Since human brain tissue is often unavailable for transcriptional profiling studies, blood expression data is frequently used as a substitute. The underlying hypothesis in such studies is that genes ...expressed in brain tissue leave a transcriptional footprint in blood. We tested this hypothesis by relating three human brain expression data sets (from cortex, cerebellum and caudate nucleus) to two large human blood expression data sets (comprised of 1463 individuals).
We found mean expression levels were weakly correlated between the brain and blood data (r range: 0.24,0.32). Further, we tested whether co-expression relationships were preserved between the three brain regions and blood. Only a handful of brain co-expression modules showed strong evidence of preservation and these modules could be combined into a single large blood module. We also identified highly connected intramodular "hub" genes inside preserved modules. These preserved intramodular hub genes had the following properties: first, their expression levels tended to be significantly more heritable than those from non-preserved intramodular hub genes (p < 10⁻⁹⁰); second, they had highly significant positive correlations with the following cluster of differentiation genes: CD58, CD47, CD48, CD53 and CD164; third, a significant number of them were known to be involved in infection mechanisms, post-transcriptional and post-translational modification and other basic processes.
Overall, we find transcriptome organization is poorly preserved between brain and blood. However, the subset of preserved co-expression relationships characterized here may aid future efforts to identify blood biomarkers for neurological and neuropsychiatric diseases when brain tissue samples are unavailable.
Developing effective and green methods for food analysis and separation has become an urgent issue regarding the ever-increasing concern of food quality and safety. Ionic liquids (ILs) are a new ...chemical medium and soft functional material developed under the framework of green chemistry and possess many unique properties, such as low melting points, low-to-negligible vapor pressures, excellent solubility, structural designability and high thermal stability. Combining ILs with extraction techniques not only takes advantage of ILs but also overcomes the disadvantages of traditional extraction methods. This subject has attracted intensive research efforts recently. Here, we present a brief review of the current research status and latest developments regarding the application of IL-assisted microextraction, including dispersive liquid–liquid microextraction (DLLME) and solid-phase microextraction (SPME), in food analysis and separation. The practical applications of ILs in determining toxic and harmful substances in food specimens with quite different natures are summarized and discussed. The critical function of ILs and the advantages of IL-based microextraction techniques over conventional extraction techniques are discussed in detail. Additionally, the recovery of ILs using different approaches is also presented to comply with green analytical chemistry requirements.
Additively Homomorphic Encryption (AHE) has been widely used in various applications, such as federated learning, blockchain, and online auctions. Elliptic Curve (EC) based AHE has the advantages of ...efficient encryption, homomorphic addition, scalar multiplication algorithms, and short ciphertext length. However, EC-based AHE schemes require solving a small exponential Elliptic Curve Discrete Logarithm Problem (ECDLP) when running the decryption algorithm, i.e., recovering the plaintext m ∈ {0, 1} ℓ from m * G . Therefore, the decryption of EC-based AHE schemes is inefficient when the plaintext length ℓ > 32. This leads to people being more inclined to use RSA-based AHE schemes rather than EC-based ones. This paper proposes an efficient algorithm called FastECDLP for solving the small exponential ECDLP at 128-bit security level. We perform a series of deep optimizations from two points: computation and memory overhead. These optimizations ensure efficient decryption when the plaintext length ℓ is as long as possible in practice. Moreover, we also provide a concrete implementation and apply FastECDLP to some specific applications. Experimental results show that FastECDLP is far faster than the previous works. For example, the decryption can be done in 0.35 ms with a single thread when ℓ = 40, which is about 30 times faster than that of Paillier. Furthermore, we experiment with ℓ from 27 to 54, and the existing works generally only consider ℓ ≤ 32. The decryption only requires 1 second with 16 threads when ℓ = 54. In the practical applications, we can speed up model training of existing vertical federated learning frameworks by 4 to 14 times. At the same time, the decryption efficiency is accelerated by about 140 times in a blockchain financial system (ESORICS 2021) with the same memory overhead.
Private Set Intersection (PSI) protocols can securely compute the intersection of the private sets on the server and the client without revealing additional data. This work introduces the concept of ...Privacy-Preserving Feature Retrieved Private Set Intersection (<inline-formula> <tex-math notation="LaTeX">\mathsf {P^{2}FRPSI} </tex-math></inline-formula>). In <inline-formula> <tex-math notation="LaTeX">\mathsf {P^{2}FRPSI} </tex-math></inline-formula> protocols, the client can obtain the intersection that satisfies a given predicate without revealing the predicate and additional data. We formally define the <inline-formula> <tex-math notation="LaTeX">\mathsf {P^{2}FRPSI} </tex-math></inline-formula> protocol, including its inputs, outputs, functionality, and security. To achieve the privacy guarantee in <inline-formula> <tex-math notation="LaTeX">\mathsf {P^{2}FRPSI} </tex-math></inline-formula> protocols, a new two-party protocol is designed, namely Secure Secret Shared Retrieval (<inline-formula> <tex-math notation="LaTeX">\mathsf {S^{3}R} </tex-math></inline-formula>), which can be used to securely determine whether each item on the server satisfies the predicate. We construct an <inline-formula> <tex-math notation="LaTeX">\mathsf {S^{3}R} </tex-math></inline-formula> protocol and prove its security in the semi-honest model. On the basis of this, we design an efficient OT-based <inline-formula> <tex-math notation="LaTeX">\mathsf {P^{2}FRPSI} </tex-math></inline-formula> protocol and an easy-to-implement DH-based <inline-formula> <tex-math notation="LaTeX">\mathsf {P^{2}FRPSI} </tex-math></inline-formula> protocol and prove that they are secure in the semi-honest model. Our implementation shows that the OT-based <inline-formula> <tex-math notation="LaTeX">\mathsf {P^{2}FRPSI} </tex-math></inline-formula> protocol can perform the matching for about 1000K items in 3.8 seconds with a single thread. Moreover, the DH-based <inline-formula> <tex-math notation="LaTeX">\mathsf {P^{2}FRPSI} </tex-math></inline-formula> can perform the matching for about 7000K items in one hour with four threads, with communication totaling 1456 MB, while the OT-based <inline-formula> <tex-math notation="LaTeX">\mathsf {P^{2}FRPSI} </tex-math></inline-formula> protocol requires 1673 MB.
It has been debated whether human induced pluripotent stem cells (iPSCs) and embryonic stem cells (ESCs) express distinctive transcriptomes. By using the method of weighted gene co-expression network ...analysis, we showed here that iPSCs exhibit altered functional modules compared with ESCs. Notably, iPSCs and ESCs differentially express 17 modules that primarily function in transcription, metabolism, development, and immune response. These module activations (up- and downregulation) are highly conserved in a variety of iPSCs, and genes in each module are coherently co-expressed. Furthermore, the activation levels of these modular genes can be used as quantitative variables to discriminate iPSCs and ESCs with high accuracy (96%). Thus, differential activations of these functional modules are the conserved features distinguishing iPSCs from ESCs. Strikingly, the overall activation level of these modules is inversely correlated with the DNA methylation level, suggesting that DNA methylation may be one mechanism regulating the module differences. Overall, we conclude that human iPSCs and ESCs exhibit distinct gene expression networks, which are likely associated with different epigenetic reprogramming events during the derivation of iPSCs and ESCs.
Primary Sjögren's syndrome (pSS) is a chronic autoimmune disease with complex etiopathogenesis. Despite extensive studies to understand the disease process utilizing human and mouse models, the ...intersection between these species remains elusive. To address this gap, we utilized a novel systems biology approach to identify disease-related gene modules and signaling pathways that overlap between humans and mice.
Parotid gland tissues were harvested from 24 pSS and 16 non-pSS sicca patients and 25 controls. For mouse studies, salivary glands were harvested from C57BL/6.NOD-Aec1Aec2 mice at various times during development of pSS-like disease. RNA was analyzed with Affymetrix HG U133+2.0 arrays for human samples and with MOE430+2.0 arrays for mouse samples. The images were processed with Affymetrix software. Weighted-gene co-expression network analysis was used to identify disease-related and functional pathways.
Nineteen co-expression modules were identified in human parotid tissue, of which four were significantly upregulated and three were downregulated in pSS patients compared with non-pSS sicca patients and controls. Notably, one of the human disease-related modules was highly preserved in the mouse model, and was enriched with genes involved in immune and inflammatory responses. Further comparison between these two species led to the identification of genes associated with leukocyte recruitment and germinal center formation.
Our systems biology analysis of genome-wide expression data from salivary gland tissue of pSS patients and from a pSS mouse model identified common dysregulated biological pathways and molecular targets underlying critical molecular alterations in pSS pathogenesis.