The e-posterior Grünwald, Peter D
Philosophical transactions of the Royal Society of London. Series A: Mathematical, physical, and engineering sciences,
05/2023, Letnik:
381, Številka:
2247
Journal Article
Recenzirano
We develop a representation of a decision maker's uncertainty based on e-variables. Like the Bayesian posterior, this
allows for making predictions against arbitrary loss functions that may not be ...specified ex ante. Unlike the Bayesian posterior, it provides risk bounds that have frequentist validity irrespective of prior adequacy: if the e-collection (which plays a role analogous to the Bayesian prior) is chosen badly, the bounds get loose rather than wrong, making
decision rules safer than Bayesian ones. The resulting quasi-conditional paradigm is illustrated by re-interpreting a previous influential partial Bayes-frequentist unification,
, in terms of e-posteriors. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
The no-free-lunch theorems promote a skeptical conclusion that all possible machine learning algorithms equally lack justification. But how could this leave room for a learning theory, that shows ...that some algorithms are better than others? Drawing parallels to the philosophy of induction, we point out that the no-free-lunch results presuppose a conception of learning algorithms as purely data-driven. On this conception, every algorithm must have an inherent inductive bias, that wants justification. We argue that many standard learning algorithms should rather be understood as model-dependent: in each application they also require for input a model, representing a bias. Generic algorithms themselves, they can be given a model-relative justification.
Recently, optional stopping has been a subject of debate in the Bayesian psychology community. Rouder (
Psychonomic Bulletin & Review
21
(2), 301–308,
2014
) argues that optional stopping is no ...problem for Bayesians, and even recommends the use of optional stopping in practice, as do (Wagenmakers, Wetzels, Borsboom, van der Maas & Kievit,
Perspectives on Psychological Science
7
, 627–633,
2012
). This article addresses the question of whether optional stopping is problematic for Bayesian methods, and specifies under which circumstances and in which sense it is and is not. By slightly varying and extending Rouder’s (
Psychonomic Bulletin & Review
21
(2), 301–308,
2014
) experiments, we illustrate that, as soon as the parameters of interest are equipped with default or pragmatic priors—which means, in most practical applications of Bayes factor hypothesis testing—resilience to optional stopping can break down. We distinguish between three types of default priors, each having their own specific issues with optional stopping, ranging from no-problem-at-all (type 0 priors) to quite severe (type II priors).
Images document scientific discoveries and are prevalent in modern biomedical research. Microscopy imaging in particular is currently undergoing rapid technological advancements. However, for ...scientists wishing to publish obtained images and image-analysis results, there are currently no unified guidelines for best practices. Consequently, microscopy images and image data in publications may be unclear or difficult to interpret. Here, we present community-developed checklists for preparing light microscopy images and describing image analyses for publications. These checklists offer authors, readers and publishers key recommendations for image formatting and annotation, color selection, data availability and reporting image-analysis workflows. The goal of our guidelines is to increase the clarity and reproducibility of image figures and thereby to heighten the quality and explanatory power of microscopy data.
We describe and develop a close relationship between two problems that have customarily been regarded as distinct: that of maximizing entropy, and that of minimizing worst-case expected loss. Using a ...formulation grounded in the equilibrium theory of zero-sum games between Decision Maker and Nature, these two problems are shown to be dual to each other, the solution to each providing that to the other. Although Topsøe described this connection for the Shannon entropy over 20 years ago, it does not appear to be widely known even in that important special case. We here generalize this theory to apply to arbitrary decision problems and loss functions. We indicate how an appropriate generalized definition of entropy can be associated with such a problem, and we show that, subject to certain regularity conditions, the above-mentioned duality continues to apply in this extended context. This simultaneously provides a possible rationale for maximizing entropy and a tool for finding robust Bayes acts. We also describe the essential identity between the problem of maximizing entropy and that of minimizing a related discrepancy or divergence between distributions. This leads to an extension, to arbitrary discrepancies, of a well-known minimax theorem for the case of Kullback-Leibler divergence (the "redundancy-capacity theorem" of information theory). For the important case of families of distributions having certain mean values specified, we develop simple sufficient conditions and methods for identifying the desired solutions. We use this theory to introduce a new concept of "generalized exponential family" linked to the specific decision problem under consideration, and we demonstrate that this shares many of the properties of standard exponential families. Finally, we show that the existence of an equilibrium in our game can be rephrased in terms of a "Pythagorean property" of the related divergence, thus generalizing previously announced results for Kullback-Leibler and Bregman divergences.
Consider the set of all sequences of n outcomes, each taking one of m values, whose frequency vectors satisfy a set of linear constraints. If m is fixed while n increases, most sequences that satisfy ...the constraints result in frequency vectors whose entropy approaches that of the maximum entropy vector satisfying the constraints. This well-known entropy concentration phenomenon underlies the maximum entropy method. Existing proofs of the concentration phenomenon are based on limits or asymptotics and unrealistically assume that constraints hold precisely, supporting maximum entropy inference more in principle than in practice. We present, for the first time, non-asymptotic, explicit lower bounds on n for a number of variants of the concentration result to hold to any prescribed accuracies, with the constraints holding up to any specified tolerance, considering the fact that allocations of discrete units can satisfy constraints only approximately. Again unlike earlier results, we measure concentration not by deviation from the maximum entropy value, but by the 11 and 12 distances from the maximum entropy-achieving frequency vector. One of our results holds independently of the alphabet size m and is based on a novel proof technique using the multi-dimensional Berry-Esseen theorem. We illustrate and compare our results using various detailed examples.
Late-embryogenesis abundant (LEA) proteins are hydrophilic proteins that accumulate to a high level in desiccation-tolerant tissues and are thus prominent in seeds. They are expected to play a ...protective role during dehydration; however, functional evidence is scarce. We identified a LEA protein of group 3 (PsLEAm) that was localized within the matrix space of pea (Pisum sativum) seed mitochondria. PsLEAm revealed typical LEA features such as high hydrophilicity and repeated motifs, except for the N-terminal transit peptide. Most of the highly charged protein was predicted to fold into amphiphilic alpha-helixes. PsLEAm was expressed during late seed development and remained in the dry seed and throughout germination. Application of the stress hormone abscisic acid was found to reinduce the expression of PsLEAm transcripts during germination. PsLEAm could not be detected in vegetative tissues; however, its expression could be reinduced in leaves by severe water stress. The recombinant PsLEAm was shown to protect two mitochondrial matrix enzymes, fumarase and rhodanese, during drying in an in vitro assay. The overall results constitute, to our knowledge, the first characterization of a LEA protein in mitochondria and experimental evidence for a beneficial role of a LEA protein with respect to proteins during desiccation.
Biochar application in combination with slurry might be an option to increase aggregate formation and organic carbon (OC) sequestration in agricultural soil. However, to assess the value of these ...management options for improving soil structure more precisely, naturally occurring effects of changing moisture on soil aggregation and feedbacks on organic matter (OM) decomposition need to be addressed. Therefore, we aimed to quantify the effects of biochar or slurry application on the amount of OC associated with macro‐aggregates and OM decomposition under different moisture conditions. Four silty loam sites in Germany were sampled, and the soil macro‐aggregates were crushed. We added biochar (53–250 μm) and slurry individually and in combination at two rates before incubating the samples under changing moisture conditions for 60 days. As well as monitoring CO2 fluxes, samples were analyzed for microbial biomass carbon, macro‐aggregate yields and macro‐aggregate‐associated OC. Biochar application decreased macro‐aggregate yields by 50–70%. However, the macro‐aggregate‐associated OC of treatments with biochar was similar to or greater than in treatments without, indicating biochar incorporation into these fractions. This was especially pronounced for biochar treatments with large volumes of slurry. Thus, slurry seems to promote the formation of biochar–mineral interactions. Drying and rewetting decreased macro‐aggregate yields and associated OC, being most pronounced for samples with biochar and slurry. In contrast to slurry, biochar typically did not increase macro‐aggregate formation. However, the combination with slurry could further enhance the suitability of biochar for carbon sequestration, although this might be less pronounced in soils experiencing frequent drying‐wetting cycles.
A large number of alignment-free techniques of graphical representation and numerical characterization (GRANCH) of bio-molecular sequences have been proposed in the recent past years, but the ...relative efficacy of these methods in determining the degree of similarities and dissimilarities of such sequences have not been ascertained.
Our objective is to make an assessment of the relative efficacy of these methods in determining the degree of similarities and dissimilarities of bio-molecular sequences.
We have chosen 7 published/communicated methods that represent various classes of GRANCH techniques and computed the descriptors that are expected to characterize similarities and dissimilarities in several sets of gene sequences. We critically appraise the different methods and determine which of these yield non-redundant structural information that could be used to compute different properties of the sequences, and which are correlated enough to one another so that using the simplest representative of the group would suffice. We also do a principal component analysis (PCA) to determine how the variances in the calculated sequence descriptors are explained by the computed principal components (PCs).
We found that some of the descriptors are strongly correlated implying a commonality of structural information encoded by them while others are distinctly separate. The PCA results show that the first three PC's explain >97% of the variances.
We found that some mathematical DNA descriptors calculated by a few of these techniques correlate strongly with one another implying a redundancy in the structural information quantified by those descriptors; others are not strongly correlated with one another suggesting that they encode non-redundant sequence information. From this and our PCA results, our recommendation would be to use minimally correlated set of descriptors or orthogonal descriptors like PCs derived from the descriptor set for the characterization of nucleic acid structure and function.