The replicability of some scientific findings has recently been called into question. To contribute data about replicability in economics, we replicated 18 studies published in the American Economic ...Review and the Quarterly Journal of Economics between 2011 and 2014. All of these replications followed predefined analysis plans that were made publicly available beforehand, and they all have a statistical power of at least 90% to detect the original effect size at the 5% significance level. We found a significant effect in the same direction as in the original study for 11 replications (61%); on average, the replicated effect size is 66% of the original. The replicability rate varies between 67% and 78% for four additional replicability indicators, including a prediction market measure of peer beliefs.
We measure how accurately replication of experimental results can be predicted by black-box statistical models. With data from four large-scale replication projects in experimental psychology and ...economics, and techniques from machine learning, we train predictive models and study which variables drive predictable replication. The models predicts binary replication with a cross-validated accuracy rate of 70% (AUC of 0.77) and estimates of relative effect sizes with a Spearman ρ of 0.38. The accuracy level is similar to market-aggregated beliefs of peer scientists 1, 2. The predictive power is validated in a pre-registered out of sample test of the outcome of 3, where 71% (AUC of 0.73) of replications are predicted correctly and effect size correlations amount to ρ = 0.25. Basic features such as the sample and effect sizes in original papers, and whether reported effects are single-variable main effects or two-variable interactions, are predictive of successful replication. The models presented in this paper are simple tools to produce cheap, prognostic replicability metrics. These models could be useful in institutionalizing the process of evaluation of new findings and guiding resources to those direct replications that are likely to be most informative.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Being able to replicate scientific findings is crucial for scientific progress
. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science ...between 2010 and 2015
. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.
•Psychologists participated in prediction markets to predict replication outcomes.•Prediction markets correctly predicted 75% of the replication outcomes.•Prediction markets performed better than ...survey data in predicting replication outcomes.•Survey data performed better in predicting relative effect size of the replications.
Understanding and improving reproducibility is crucial for scientific progress. Prediction markets and related methods of eliciting peer beliefs are promising tools to predict replication outcomes. We invited researchers in the field of psychology to judge the replicability of 24 studies replicated in the large scale Many Labs 2 project. We elicited peer beliefs in prediction markets and surveys about two replication success metrics: the probability that the replication yields a statistically significant effect in the original direction (p < 0.001), and the relative effect size of the replication. The prediction markets correctly predicted 75% of the replication outcomes, and were highly correlated with the replication outcomes. Survey beliefs were also significantly correlated with replication outcomes, but had larger prediction errors. The prediction markets for relative effect sizes attracted little trading and thus did not work well. The survey beliefs about relative effect sizes performed better and were significantly correlated with observed relative effect sizes. The results suggest that replication outcomes can be predicted and that the elicitation of peer beliefs can increase our knowledge about scientific reproducibility and the dynamics of hypothesis testing.
•The paper uses micro data scraped from Google Play to estimate how consumers react to quality signals.•We estimate the effects of other users’ stated preferences (ratings) and ii) revealed ...preferences (downloads).•The data consists of daily information, for 42 consecutive days, of more than 500 000 apps.•Action speaks louder than words: very strong reaction to other consumers’ revealed preferences.•Talk is cheap: weaker reaction to other consumers’ stated preferences.
Knowledge of how consumers react to different signals is fundamental to understanding how markets work. The modern electronic marketplace has revolutionized the possibilities for consumers to gather detailed information about products and services before purchase. Specifically, a consumer can easily – through a host of online forums and evaluation sites – estimate a product’s quality based on either (i) what other users say about the product (stated preferences) or (ii) how many other users that have bought the product (revealed preferences). In this paper, we compare the causal effects on demand from these two signals based on data from the biggest marketplace for Android apps, Google play. This data consists of daily information, for 42 consecutive days, of more than 500,000 apps from the US version of Google play. Our main result is that consumers are much more responsive to other consumers’ revealed preferences, compared to others’ stated preferences. A 10 percentile increase in displayed average rating only increases downloads by about 3%, while a 10 percentile increase in displayed number of downloads increases downloads by about 25%.
Using a randomized field experiment in the Swedish pension system, we investigate whether receiving an information letter affects the take-up rate of the housing allowance for pensioners. We also ...investigate whether the framing of the information letter affects take-up. The results show that simple information letters had a significant effect on the application rate and subsequent take-up rate: The baseline application rate in the targeted control population was only 1.4%, while the corresponding rates in the different treatment groups were between 9.9 and 12.1%. However, while the applications in the control group were accepted in almost 3 out of 4 cases, up to 50% of the applications in the treatment group were declined. The lower conditional acceptance rate in the treatment group seems to be largely driven by wealth, which the Pensions Agency cannot observe prior to submission. Information campaigns aimed at increasing benefit take-up therefore need careful design in situations with imperfect targeting.
Många äldre med låga inkomster ansöker inte om bostadstillägg för pensionärer trots att de kan ha rätt till det. En viktig fråga är därför hur man kan få fler berättigade att ansöka. Vi har ...tillsammans med Pensionsmyndigheten genomfört ett randomiserat informationsexperiment riktat till populationen av potentiellt berättigade pensionärer. Ungefär var tionde pensionär som fick ett brev (behandlingsgruppen) ansökte om bostadstillägg inom fyra månader jämfört med drygt en av hundra som inte fick ett brev (kontrollgruppen). Andelen avslag i behandlingsgruppen var dock något högre.
Linear models are used in online decision making, such as in machine learning, policy algorithms, and experimentation platforms. Many engineering systems that use linear models achieve computational ...efficiency through distributed systems and expert configuration. While there are strengths to this approach, it is still difficult to have an environment that enables researchers to interactively iterate and explore data and models, as well as leverage analytics solutions from the open source community. Consequently, innovation can be blocked. Conditionally sufficient statistics is a unified data compression and estimation strategy that is useful for the model development process, as well as the engineering deployment process. The strategy estimates linear models from compressed data without loss on the estimated parameters and their covariances, even when errors are autocorrelated within clusters of observations. Additionally, the compression preserves almost all interactions with the the original data, unlocking better productivity for both researchers and engineering systems.
We demonstrate the effectiveness of democratization and efficient computation as key concepts of our experimentation platform (XP) by presenting four new models supported by the platform: 1) Weighted ...least squares, 2) Quantile bootstrapping, 3) Bayesian shrinkage, and 4) Dynamic treatment effects. Each model is motivated by a specific business problem but is generalizable and extensible. The modular structure of our platform allows independent innovation on statistical and computational methods. In practice, a technical symbiosis is created where increasingly advanced user contributions inspire innovations to the software that in turn enable further methodological improvements. This cycle adds further value to how the XP contributes to business solutions.
We measure how accurately replication of experimental results can be predicted by black-box statistical models. With data from four large-scale replication projects in experimental psychology and ...economics, and techniques from machine learning, we train predictive models and study which variables drive predictable replication. The models predicts binary replication with a cross-validated accuracy rate of 70% (AUC of 0.77) and estimates of relative effect sizes with a Spearman rho of 0.38. The accuracy level is similar to market-aggregated beliefs of peer scientists 1, 2. The predictive power is validated in a pre-registered out of sample test of the outcome of 3, where 71% (AUC of 0.73) of replications are predicted correctly and effect size correlations amount to rho = 0.25. Basic features such as the sample and effect sizes in original papers, and whether reported effects are single-variable main effects or two-variable interactions, are predictive of successful replication. The models presented in this paper are simple tools to produce cheap, prognostic replicability metrics. These models could be useful in institutionalizing the process of evaluation of new findings and guiding resources to those direct replications that are likely to be most informative.