This project examines how state broadband policies diffused among the states in the United States over the last 30-year period utilizing a network approach and the State Broadband Explorer dataset ...curated by the Pew Charitable Trusts’ Broadband Access Initiate. The 621 valid state broadband policies in the U.S. (until January 2021) have been categorized into six main themes: broadband programs, competition and regulation, definitions, funding and financing, infrastructure access, and legislative intent. Our analytical strategy follows a two-step process: (1) to identify the latent network of broadband policy diffusion across the states using the NetInf algorithm; (2) to identify the nodal and dyadic variables that predict the observed diffusion flows. Our objective for the second step is to test out two competing hypotheses: the geographic learning model and the (co-)partisan learning model, which privilege geographic proximity and ideological affiliation respectively as the primary drivers of policy diffusion. The results show that geographic contiguity is the most significant factor predicting broadband policy diffusion. However, the results also identify the low salience of political factors in predicting broadband policy diffusion. Among nodal factors, only one namely divided government (of sender states) is a significant predictor of a diffusion tie. Among dyadic factors, there is one variable that supported political homophily as a significant predictor of diffusion flows (i.e., both states sharing the same type of legislative control). Partisanship appears to be much less of a driver of broadband policy in the U.S. context.
•This study is the first attempt to infer the latent network of state broadband policies diffusion in the U.S.•It empirically examines state broadband policies diffusion network, identifying key innovators, facilitators, and adopters.•Geographic proximity is still the most significant factor predicting state broadband policies diffusion in the U.S.•Partisanship appears to be much less of a driver of state broadband policies diffusion in the U.S. context.
Soil aggregates are crucial for soil organic carbon (OC) accumulation. This study, utilizing a 32-year fertilization experiment, investigates whether the core microbiome can elucidate variations in ...carbon content and decomposition across different aggregate sizes more effectively than broader bacterial and fungal community analyses. Employing ensemble learning algorithms that integrate machine learning with network inference, we found that the core microbiome accounts for an average increase of 26 % and 20 % in the explained variance of PCoA and Adonis analyses, respectively, in response to fertilization. Compared to the control, inorganic and organic fertilizers decreased the decomposition index (DDI) by 31 % and 38 %, respectively. The fungal core microbiome predominantly influenced OC content and DDI in larger macroaggregates (>2000 μm), explaining over 35 % of the variance, while the bacterial core microbiome had a lesser impact, explaining <30 %. Conversely, in smaller aggregates (<2000 μm), the bacterial core microbiome significantly influenced DDI (R2 > 0.2), and the fungal core microbiome more strongly affected OC content (R2 > 0.3). Mantel tests showed that pH is the most significant environmental factor affecting core microbiome composition across all aggregate sizes (Mantel's r > 0.8, P < 0.01). Linear correlation analysis further confirmed that the core microbiome's community structure could accurately predict OC content and DDI in aggregates (R2 > 0.8, P < 0.05). Overall, our findings suggested that the core microbiome provides deeper insights into the variability of aggregate organic carbon content and decomposition, with the bacterial core microbiome playing a particularly pivotal role within the soil aggregates.
Display omitted
•Ensemble algorithms accurately predict organic carbon and core microbe relationships.•Core microbiomes could better predict organic carbon variability across soil aggregates.•Bacterial core-microbiomes explain organic carbon decomposition better than fungi.
Abstract Background Inference of Gene Regulatory Networks (GRNs) is a difficult and long-standing question in Systems Biology. Numerous approaches have been proposed with the latest methods exploring ...the richness of single-cell data. One of the current difficulties lies in the fact that many methods of GRN inference do not result in one proposed GRN but in a collection of plausible networks that need to be further refined. In this work, we present a Design of Experiment strategy to use as a second stage after the inference process. It is specifically fitted for identifying the next most informative experiment to perform for deciding between multiple network topologies, in the case where proposed GRNs are executable models. This strategy first performs a topological analysis to reduce the number of perturbations that need to be tested, then predicts the outcome of the retained perturbations by simulation of the GRNs and finally compares predictions with novel experimental data. Results We apply this method to the results of our divide-and-conquer algorithm called WASABI, adapt its gene expression model to produce perturbations and compare our predictions with experimental results. We show that our networks were able to produce in silico predictions on the outcome of a gene knock-out, which were qualitatively validated for 48 out of 49 genes. Finally, we eliminate as many as two thirds of the candidate networks for which we could identify an incorrect topology, thus greatly improving the accuracy of our predictions. Conclusion These results both confirm the inference accuracy of WASABI and show how executable gene expression models can be leveraged to further refine the topology of inferred GRNs. We hope this strategy will help systems biologists further explore their data and encourage the development of more executable GRN models.
Boolean network inference is essential for gaining insights into gene regulatory networks through multivariate gene expression time series. However, most existing algorithms cannot accurately ...reconstruct large-scale Boolean networks due to the complex and diverse relationships among genes and the overfitting problem. To address these problems, a novel inference algorithm using a mutual information-based fuzzy genetic programming approach (MIFuGP) is proposed to infer large-scale Boolean networks accurately. To represent complex regulatory relationships in Boolean networks, MIFuGP encodes Boolean functions as syntax tree programs. Taking the dependency between genes into account, MIFuGP fully extracts the mutual information from the syntax trees to alleviate the bloat problem. MIFuGP also provides a novel fitness function to make full use of state-transitions and topology information, together with a fuzzy logic control strategy to reduce the overfitting problem. Extensive experiments validate that MIFuGP significantly outperforms state-of-the-art algorithms on both real-world gene regulatory networks and artificial Boolean networks.
•A novel inference approach MIFuGP is proposed to reconstruct large-scale gene regulatory networks accurately.•A novel mutual information metric is presented to fully excavate the dependency relationship among genes.•Different other existing algorithms, MIFuGP imposes no limit on the in-degree to allow the flexibility of solutions.•A fuzzy logic control strategy is introduced for the first time to reduce the over-fit problem in Boolean network inference.
Gene regulatory networks are graph models representing cellular transcription events. Networks are far from complete due to time and resource consumption for experimental validation and curation of ...the interactions. Previous assessments have shown the modest performance of the available network inference methods based on gene expression data. Here, we study several caveats on the inference of regulatory networks and methods assessment through the quality of the input data and gold standard, and the assessment approach with a focus on the global structure of the network. We used synthetic and biological data for the predictions and experimentally-validated biological networks as the gold standard (ground truth). Standard performance metrics and graph structural properties suggest that methods inferring co-expression networks should no longer be assessed equally with those inferring regulatory interactions. While methods inferring regulatory interactions perform better in global regulatory network inference than co-expression-based methods, the latter is better suited to infer function-specific regulons and co-regulation networks. When merging expression data, the size increase should outweigh the noise inclusion and graph structure should be considered when integrating the inferences. We conclude with guidelines to take advantage of inference methods and their assessment based on the applications and available expression datasets.
Abstract Reconstructing dynamics of complex systems from sparse, incomplete time series data is a challenging problem with applications in various domains. Here, we develop an iterative heuristic ...method to infer the underlying network structure and parameters governed by Ising dynamics from incomplete spin configurations based on sparse and small-sized samples. Our method iterates between imputing missing spin states given current coupling strengths and re-estimating couplings from completed spin state data. Central to our approach is the novel application of adaptive $$l_1$$ l 1 regularization on updating coupling strengths, which features an automatic adjustment of the regularization strength throughout the iterative inference process. By doing so, we aim at preventing over-fitting and enforcing the sparsity of couplings without access to ground truth parameters. We demonstrate that this approach accurately recovers parameters and imputes missing spins even with substantial missing data and short time series, providing improvements in the inference of Ising model parameters even for relatively small sample sizes.
Here, we present Scribe (https://github.com/aristoteleo/Scribe-py), a toolkit for detecting and visualizing causal regulatory interactions between genes and explore the potential for single-cell ...experiments to power network reconstruction. Scribe employs restricted directed information to determine causality by estimating the strength of information transferred from a potential regulator to its downstream target. We apply Scribe and other leading approaches for causal network reconstruction to several types of single-cell measurements and show that there is a dramatic drop in performance for “pseudotime”-ordered single-cell data compared with true time-series data. We demonstrate that performing causal inference requires temporal coupling between measurements. We show that methods such as “RNA velocity” restore some degree of coupling through an analysis of chromaffin cell fate commitment. These analyses highlight a shortcoming in experimental and computational methods for analyzing gene regulation at single-cell resolution and suggest ways of overcoming it.
Display omitted
•Scribe detects causal regulatory networks between genes in diverse single-cell datasets•Scribe uses restricted directed information to identify regulators and their targets•Inferring causal regulatory networks requires temporal coupling between measurements•RNA velocity outperforms pseudotime, but neither perform as well as true time-series data
Qiu et al. present Scribe (https://github.com/aristoteleo/Scribe-py), a toolkit for detecting and visualizing causal regulatory networks between genes in diverse single-cell datasets. They use Scribe to understand how causal network reconstruction depends on temporal coupling between measurements. They show that while pseudotime-ordered single-cell data fail to capture much of the information present in true temporal couplings, RNA velocity measurements restore much of this information.
Quantitatively identifying direct dependencies between variables is an important task in data analysis, in particular for reconstructing various types of networks and causal relations in science and ...engineering. One of the most widely used criteria is partial correlation, but it can only measure linearly direct association and miss nonlinear associations. However, based on conditional independence, conditional mutual information (CMI) is able to quantify nonlinearly direct relationships among variables from the observed data, superior to linear measures, but suffers from a serious problem of underestimation, in particular for those variables with tight associations in a network, which severely limits its applications. In this work, we propose a new concept, “partial independence,” with a new measure, “part mutual information” (PMI), which not only can overcome the problem of CMI but also retains the quantification properties of both mutual information (MI) and CMI. Specifically, we first defined PMI to measure nonlinearly direct dependencies between variables and then derived its relations with MI and CMI. Finally, we used a number of simulated data as benchmark examples to numerically demonstrate PMI features and further real gene expression data from Escherichia coli and yeast to reconstruct gene regulatory networks, which all validated the advantages of PMI for accurately quantifying nonlinearly direct associations in networks.
Constructing networks by filtering correlation matrices Kojaku, Sadamori; Masuda, Naoki
Proceedings of the Royal Society. A, Mathematical, physical, and engineering sciences,
11/2019, Letnik:
475, Številka:
2231
Journal Article
Recenzirano
Network analysis has been applied to various correlation matrix data. Thresholding on the value of the pairwise correlation is probably the most straightforward and common method to create a network ...from a correlation matrix. However, there have been criticisms on this thresholding approach such as an inability to filter out spurious correlations, which have led to proposals of alternative methods to overcome some of the problems. We propose a method to create networks from correlation matrices based on optimization with regularization, where we lay an edge between each pair of nodes if and only if the edge is unexpected from a null model. The proposed algorithm is advantageous in that it can be combined with different types of null models. Moreover, the algorithm can select the most plausible null model from a set of candidate null models using a model selection criterion. For three economic datasets, we find that the configuration model for correlation matrices is often preferred to standard null models. For country-level product export data, the present method better predicts main products exported from countries than sample correlation matrices do.