The authors consider processes on social networks that can potentially involve three factors: homophily, or the formation of social ties due to matching individual traits; social contagion, also ...known as social influence; and the causal effect of an individual’s covariates on his or her behavior or other measurable responses. The authors show that generically, all of these are confounded with each other. Distinguishing them from one another requires strong assumptions on the parametrization of the social process or on the adequacy of the covariates used (or both). In particular the authors demonstrate, with simple examples, that asymmetries in regression coefficients cannot identify causal effects and that very simple models of imitation (a form of social contagion) can produce substantial correlations between an individual’s enduring traits and his or her choices, even when there is no intrinsic affinity between them. The authors also suggest some possible constructive responses to these results.
Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and ...characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov—Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.
Philosophy and the practice of Bayesian statistics Gelman, Andrew; Shalizi, Cosma Rohilla
British journal of mathematical & statistical psychology,
February 2013, Letnik:
66, Številka:
1
Journal Article
Recenzirano
Odprti dostop
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success ...of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico‐deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.
To identify pathways between stress indicators and adverse pregnancy outcomes, we applied a nonparametric graph-learning algorithm, PC-KCI, to data from an observational prospective cohort study. The ...Measurement of Maternal Stress study (MOMS) followed 744 women with a singleton intrauterine pregnancy recruited between June 2013 and May 2015. Infant adverse pregnancy outcomes were prematurity (<37 weeks' gestation), infant days spent in hospital after birth, and being small for gestational age (percentile gestational weight at birth). Maternal adverse pregnancy outcomes were pre-eclampsia, gestational diabetes, and gestational hypertension. PC-KCI replicated well-established pathways, such as the relationship between gestational weeks and preterm premature rupture of membranes. PC-KCI also identified previously unobserved pathways to adverse pregnancy outcomes, including 1) a link between hair cortisol levels (at 12-21 weeks of pregnancy) and pre-eclampsia; 2) two pathways to preterm birth depending on race, with one linking Hispanic race, pre-gestational diabetes and gestational weeks, and a second pathway linking black race, hair cortisol, preeclampsia, and gestational weeks; and 3) a relationship between maternal childhood trauma, perceived social stress in adulthood, and low weight for gestational age. Our approach confirmed previous findings and identified previously unobserved pathways to adverse pregnancy outcomes. It presents a method for a global assessment of a clinical problem for further study of possible causal pathways.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the ...information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing its complexity. Starting from spike trains, our approach finds their causal state models (CSMs), the minimal hidden Markov models or stochastic automata capable of generating statistically identical time series. We then use these CSMs to objectively quantify both the generalizable structure and the idiosyncratic randomness of the spike train. Specifically, we show that the expected algorithmic information content (the information needed to describe the spike train exactly) can be split into three parts describing (1) the time-invariant structure (complexity) of the minimal spike-generating process, which describes the spike train statistically; (2) the randomness (internal entropy rate) of the minimal spike-generating process; and (3) a residual pure noise term not described by the minimal spike-generating process. We use CSMs to approximate each of these quantities. The CSMs are inferred nonparametrically from the data, making only mild regularity assumptions, via the causal state splitting reconstruction algorithm. The methods presented here complement more traditional spike train analyses by describing not only spiking probability and spike train entropy, but also the complexity of a spike train's structure. We demonstrate our approach using both simulated spike trains and experimental data recorded in rat barrel cortex during vibrissa stimulation.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of net-work structure. Typically, however, these are ...models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.
In this paper we present a statistical method for inferring historical social networks from biographical documents as well as the scholarly aims for doing so. Existing scholarship on historical ...social networks is scattered across an unmanageable number of disparate books and articles. A researcher interested in how persons were connected to one another in our field of study, early modern Britain (c. 1500-1700), has no global, unified resource to which to turn. Manually building such a network is infeasible, since it would need to represent thousands of nodes and tens of millions of potential edges just to include the relations among the most prominent persons of the period. Our Six Degrees of Francis Bacon project takes up recent statistical techniques and digital tools to reconstruct and visualize the early modern social network. We describe in this paper the natural language processing tools and statistical graph learning techniques that we used to extract names and infer relations from the Oxford Dictionary of National Biography. We then explain the steps taken to test inferred relations against the knowledge of experts in order to improve the accuracy of the learning techniques. Our argument here is twofold: first, that the results of this process, a global visualization of Britain’s early modern social network, will be useful to scholars and students of the period; second, that the pipeline we have developed can, with local modifications, be reused by other scholars to generate networks for other historical or contemporary societies from biographical documents.
Approximate Methods for State-Space Models Koyama, Shinsuke; Castellanos Pérez-Bolde, Lucia; Shalizi, Cosma Rohilla ...
Journal of the American Statistical Association,
03/2010, Letnik:
105, Številka:
489
Journal Article
Recenzirano
Odprti dostop
State-space models provide an important body of techniques for analyzing time series, but their use requires estimating unobserved states. The optimal estimate of the state is its conditional ...expectation given the observation histories, and computing this expectation is hard when there are nonlinearities. Existing filtering methods, including sequential Monte Carlo, tend to be either inaccurate or slow. In this paper, we study a nonlinear filter for nonlinear/non-Gaussian state-space models, which uses Laplace's method, an asymptotic series expansion, to approximate the state's conditional mean and variance, together with a Gaussian conditional distribution. This Laplace Gaussian filter (LGF) gives fast, recursive, deterministic state estimates, with an error which is set by the stochastic characteristics of the model and is, we show, stable over time. We illustrate the estimation ability of the LGF by applying it to the problem of neural decoding and compare it to sequential Monte Carlo both in simulations and with real data. We find that the LGF can deliver superior results in a small fraction of the computing time. This article has supplementary material online.
Fields like public health, public policy, and social science often want to quantify the degree of dependence between variables whose relationships take on unknown functional forms. Typically, in ...fact, researchers in these fields are attempting to evaluate causal theories, and so want to quantify dependence after conditioning on other variables that might explain, mediate or confound causal relations. One reason conditional mutual information is not more widely used for these tasks is the lack of estimators which can handle combinations of continuous and discrete random variables, common in applications. This article develops a new method for estimating mutual and conditional mutual information for data samples containing a mix of discrete and continuous variables. We prove that this estimator is consistent and show, via simulation, that it is more accurate than similar estimators.