Natural language is often seen as the single factor that explains the cognitive singularity of the human species. Instead, we propose that humans possess multiple internal languages of thought, akin ...to computer languages, which encode and compress structures in various domains (mathematics, music, shape…). These languages rely on cortical circuits distinct from classical language areas. Each is characterized by: (i) the discretization of a domain using a small set of symbols, and (ii) their recursive composition into mental programs that encode nested repetitions with variations. In various tasks of elementary shape or sequence perception, minimum description length in the proposed languages captures human behavior and brain activity, whereas non-human primate data are captured by simpler nonsymbolic models. Our research argues in favor of discrete symbolic models of human thought.
Accounting for human spatial memory requires the postulation of a mental language that can recursively compose primitives of number, space, and repetition with variations.The same language accounts for the human perception of binary auditory sequences.Minimum description length, rather than actual sequence length, predicts human working memory for auditory and visual sequences.When perceiving geometric shapes, humans exhibit a strong geometric regularity effect, which is absent in non-human primates.Multiple languages with similar computational principles but distinct, parallel brain circuits coexist in the human brain.
Interpretable classifiers have recently witnessed an increase in attention from the data mining community because they are inherently easier to understand and explain than their more complex ...counterparts. Examples of interpretable classification models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate probabilistic rule lists for multiclass classification. Specifically, we propose a novel formalization based on probabilistic rule lists and the minimum description length (MDL) principle. This results in virtually parameter-free model selection that naturally allows to trade-off model complexity with goodness of fit, by which overfitting and the need for hyperparameter tuning are effectively avoided. Finally, we introduce the Classy algorithm, which greedily finds rule lists according to the proposed criterion.
We empirically demonstrate that Classy selects small probabilistic rule lists that outperform state-of-the-art classifiers when it comes to the combination of predictive performance and interpretability. We show that Classy is insensitive to its only parameter, i.e., the candidate set, and that compression on the training set correlates with classification performance, validating our MDL-based selection criterion.
A Tutorial on Fisher information Ly, Alexander; Marsman, Maarten; Verhagen, Josine ...
Journal of mathematical psychology,
October 2017, 2017-10-00, Volume:
80
Journal Article
Peer reviewed
Open access
In many statistical applications that concern mathematical psychologists, the concept of Fisher information plays an important role. In this tutorial we clarify the concept of Fisher information as ...it manifests itself across three different statistical paradigms. First, in the frequentist paradigm, Fisher information is used to construct hypothesis tests and confidence intervals using maximum likelihood estimators; second, in the Bayesian paradigm, Fisher information is used to define a default prior; finally, in the minimum description length paradigm, Fisher information is used to measure model complexity.
•We illustrate the use of Fisher information in the three statistical paradigms: frequentist, Bayesian, and MDL.•Fisher information is used to construct hypothesis tests and confidence intervals.•Fisher information is used to construct the Jeffreys’s prior.•Fisher information is used to measure model complexity.
We present a new image complexity metric. Existing complexity metrics cannot distinguish meaningful content from noise, and give a high score to white noise images, which contain no meaningful ...information. We use the minimum description length principle to determine the number of clusters and designate certain points as outliers and, hence, correctly assign white noise a low score. The presented method is a step towards humans’ ability to detect when data contain a meaningful pattern. It also has similarities to theoretical ideas for measuring meaningful complexity. We conduct experiments on seven different sets of images, which show that our method assigns the most accurate scores to all images considered. Additionally, comparing the different levels of the hierarchy of clusters can reveal how complexity manifests at different scales, from local detail to global structure. We then present ablation studies showing the contribution of the components of our method, and that it continues to assign reasonable scores when the inputs are modified in certain ways, including the addition of Gaussian noise and the lowering of the resolution. Code is available at https://github.com/Lou1sM/meaningful_image_complexity.
Display omitted
•A new technique for measuring the amount of meaningful complexity in an image.•Such a measure should not respond to random, unstructured variation, e.g. white noise.•Existing methods all give noise images a high score, but ours give them a low score.•It can also measure complexity at different scales, from local detail to global structure.
The
α
-expansion algorithm has had a significant impact in computer vision due to its generality, effectiveness, and speed. It is commonly used to minimize energies that involve unary, pairwise, and ...specialized higher-order terms. Our main algorithmic contribution is an extension of
α
-expansion that also optimizes “label costs” with well-characterized optimality bounds. Label costs penalize a solution based on the set of labels that appear in it, for example by simply penalizing the number of labels in the solution.
Our energy has a natural interpretation as minimizing description length (MDL) and sheds light on classical algorithms like
K
-means and expectation-maximization (EM). Label costs are useful for multi-model fitting and we demonstrate several such applications: homography detection, motion segmentation, image segmentation, and compression. Our C++ and MATLAB code is publicly available
http://vision.csd.uwo.ca/code/
.
•The wavelet function and decomposition level are selected adaptively from offline analysis.•A weighted version of Shannon entropy is proposed to find the best basis of the signal.•A modified MDL ...algorithm is proposed to independently threshold at each node to denoise.•A different threshold criteria is implemented for compression.
Denoising and compression of power system data from the measurement and monitoring instruments in smart grid is an important topic. Compression is essential for the transmission and storage of a big amount of smart grid data through the communication channels and denoising is essential as the noise produces erroneous results in further analysis of the power system data. This paper presents a technique for denoising and lossy compression of data in smart grid communication using wavelet packet transform (WPT). The paper proposes weighted entropy to calculate the best basis of a signal from the complete WPT. Then the paper presents the application of the proposed best basis algorithm for the denoising and compression of the signal. A modified minimum description length (MDL) algorithm is proposed that allows to adjust the threshold for the denoising that does not require the use of a noise estimation formula. A set of real power system data recorded by various measurement and monitoring instruments in smart grid is utilized to assess the effectiveness of the proposed technique. Both the denoising and compression performance on the simulation show promising results such that the proposed algorithm could be considered as a potential technique for the real time noise removal and compression application. Results from the comparison are presented with Shannon entropy - MDL based WPT, wavelet transform (WT) based method, fuzzy transform and Matlab’s state of art technique.
Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against ...complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neural network (P-NN) that utilizes the minimum description features to substantially reduce the complexity of deep learning-based WP. P-NN's feature selection strategy is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We improve P-NN's learning ability by intelligently processing two different types of inputs: sparse image and measurement matrices. Specifically, we implement a self-attention layer to reinforce the training ability of our network. We also develop a technique to adapt feature space size, optimizing over the expected information gain and the classification capability quantified with information-theoretic measures on signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). In particular, we find that P-NN achieves a large improvement in performance for low SNR, as unnecessary measurements are discarded in our minimum description features
This article considers the problem of modeling a class of count time series with multiple change-points using segmented generalized integer-valued autoregressive (S-GINAR) processes. The minimum ...description length principle (MDL) is applied to study the statistical inference for the S-GINAR model, and the consistency results of the MDL model selection procedure are established respectively under the condition of known and unknown number of change-points. To find the “best” combination of the number of change-points, the locations of change-points, the order of each segment and its parameters, a genetic algorithm with simulated annealing is implemented to solve this difficult optimization problem. In particular, the simulated annealing process makes up for the precocious problem of the traditional genetic algorithm. Numerical results from simulation experiments and three examples of real data analysis show that the procedure has excellent empirical properties.
•The S-GINAR model is proposed to model a class of non-stationary count time series.•Change-points estimation for the S-GINAR model based on the MDL criterion is studied.•The consistency theorems of parameters are discussed.•The Auto-SGINARM-SA procedure is proposed to solve the optimizing problems of change-points estimation.