Stimuli are represented in the brain by the collective population responses of sensory neurons, and an object presented under varying conditions gives rise to a collection of neural population ...responses called an 'object manifold'. Changes in the object representation along a hierarchical sensory system are associated with changes in the geometry of those manifolds, and recent theoretical progress connects this geometry with 'classification capacity', a quantitative measure of the ability to support object classification. Deep neural networks trained on object classification tasks are a natural testbed for the applicability of this relation. We show how classification capacity improves along the hierarchies of deep neural networks with different architectures. We demonstrate that changes in the geometry of the associated object manifolds underlie this improved capacity, and shed light on the functional roles different levels in the hierarchy play to achieve it, through orchestrated reduction of manifolds' radius, dimensionality and inter-manifold correlations.
Firing patterns in the central nervous system often exhibit strong temporal irregularity and considerable heterogeneity in time-averaged response properties. Previous studies suggested that these ...properties are the outcome of the intrinsic chaotic dynamics of the neural circuits. Indeed, simplified rate-based neuronal networks with synaptic connections drawn from Gaussian distribution and sigmoidal nonlinearity are known to exhibit chaotic dynamics when the synaptic gain (i.e., connection variance) is sufficiently large. In the limit of an infinitely large network, there is a sharp transition from a fixed point to chaos, as the synaptic gain reaches a critical value. Near the onset, chaotic fluctuations are slow, analogous to the ubiquitous, slow irregular fluctuations observed in the firing rates of many cortical circuits. However, the existence of a transition from a fixed point to chaos in neuronal circuit models with more realistic architectures and firing dynamics has not been established. In this work, we investigate rate-based dynamics of neuronal circuits composed of several subpopulations with randomly diluted connections. Nonzero connections are either positive for excitatory neurons or negative for inhibitory ones, while single neuron output is strictly positive with output rates rising as a power law above threshold, in line with known constraints in many biological systems. Using dynamic mean field theory, we find the phase diagram depicting the regimes of stable fixed-point, unstable-dynamic, and chaotic-rate fluctuations. We focus on the latter and characterize the properties of systems near this transition. We show that dilute excitatory-inhibitory architectures exhibit the same onset to chaos as the single population with Gaussian connectivity. In these architectures, the large mean excitatory and inhibitory inputs dynamically balance each other, amplifying the effect of the residual fluctuations. Importantly, the existence of a transition to chaos and its critical properties depend on the shape of the single-neuron nonlinear input-output transfer function, near firing threshold. In particular, for nonlinear transfer functions with a sharp rise near threshold, the transition to chaos disappears in the limit of a large network; instead, the system exhibits chaotic fluctuations even for small synaptic gain. Finally, we investigate transition to chaos in network models with spiking dynamics. We show that when synaptic time constants are slow relative to the mean inverse firing rates, the network undergoes a transition from fast spiking fluctuations with constant rates to a state where the firing rates exhibit chaotic fluctuations, similar to the transition predicted by rate-based dynamics. Systems with finite synaptic time constants and firing rates exhibit a smooth transition from a regime dominated by stationary firing rates to a regime of slow rate fluctuations. This smooth crossover obeys scaling properties, similar to crossover phenomena in statistical mechanics. The theoretical results are supported by computer simulations of several neuronal architectures and dynamics. Consequences for cortical circuit dynamics are discussed. These results advance our understanding of the properties of intrinsic dynamics in realistic neuronal networks and their functional consequences.
The curse of dimensionality poses severe challenges to both technical and conceptual progress in neuroscience. In particular, it plagues our ability to acquire, process, and model high-dimensional ...data sets. Moreover, neural systems must cope with the challenge of processing data in high dimensions to learn and operate successfully within a complex world. We review recent mathematical advances that provide ways to combat dimensionality in specific situations. These advances shed light on two dual questions in neuroscience. First, how can we as neuroscientists rapidly acquire high-dimensional data from the brain and subsequently extract meaningful models from limited amounts of these data? And second, how do brains themselves process information in their intrinsically high-dimensional patterns of neural activity as well as learn meaningful, generalizable models of the external world from limited experience?
We perform an analysis of the average generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant “high-dimensional” regime where the number of ...free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem. We find that the dynamics of gradient descent learning naturally protect against overtraining and overfitting in large networks. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and thus can be reduced by making a network smaller or larger. Additionally, in the high-dimensional regime, low generalization error requires starting with small initial weights. We then turn to non-linear neural networks, and show that making networks very large does not harm their generalization performance. On the contrary, it can in fact reduce overtraining, even without early stopping or regularization of any sort. We identify two novel phenomena underlying this behavior in overcomplete models: first, there is a frozen subspace of the weights in which no learning occurs under gradient descent; and second, the statistical properties of the high-dimensional regime yield better-conditioned input correlations which protect against overtraining. We demonstrate that standard application of theories such as Rademacher complexity are inaccurate in predicting the generalization performance of deep neural networks, and derive an alternative bound which incorporates the frozen subspace and conditioning effects and qualitatively matches the behavior observed in simulation.
The groundbreaking success of deep learning in many real-world tasks has triggered an intense effort to theoretically understand the power and limitations of deep learning in the training and ...generalization of complex tasks, so far with limited progress. In this work, we study the statistical mechanics of learning in deep linear neural networks (DLNNs) in which the input-output function of an individual unit is linear. Despite the linearity of the units, learning in DLNNs is highly nonlinear; hence, studying its properties reveals some of the essential features of nonlinear deep neural networks (DNNs). Importantly, we exactly solve the network properties following supervised learning using an equilibrium Gibbs distribution in the weight space. To do this, we introduce the backpropagating kernel renormalization (BPKR), which allows for the incremental integration of the network weights layer by layer starting from the network output layer and progressing backward until the first layer’s weights are integrated out. This procedure allows us to evaluate important network properties, such as its generalization error, the role of network width and depth, the impact of the size of the training set, and the effects of weight regularization and learning stochasticity. BPKR does not assume specific statistics of the input or the task’s output. Furthermore, by performing partial integration of the layers, the BPKR allows us to compute the emergent properties of the neural representations across the different hidden layers. We propose a heuristic extension of the BPKR to nonlinear DNNs with rectified linear units (ReLU). Surprisingly, our numerical simulations reveal that despite the nonlinearity, the predictions of our theory are largely shared by ReLU networks of modest depth, in a wide regime of parameters. Our work is the first exact statistical mechanical study of learning in a family of deep neural networks, and the first successful theory of learning through the successive integration of degrees of freedom in the learned weight space.
A long standing challenge in biological and artificial intelligence is to understand how new knowledge can be constructed from known building blocks in a way that is amenable for computation by ...neuronal circuits. Here we focus on the task of storage and recall of structured knowledge in long-term memory. Specifically, we ask how recurrent neuronal networks can store and retrieve multiple knowledge structures. We model each structure as a set of binary relations between events and attributes (attributes may represent e.g., temporal order, spatial location, role in semantic structure), and map each structure to a distributed neuronal activity pattern using a vector symbolic architecture scheme.We then use associative memory plasticity rules to store the binarized patterns as fixed points in a recurrent network. By a combination of signal-to-noise analysis and numerical simulations, we demonstrate that our model allows for efficient storage of these knowledge structures, such that the memorized structures as well as their individual building blocks (e.g., events and attributes) can be subsequently retrieved from partial retrieving cues. We show that long-term memory of structured knowledge relies on a new principle of computation beyond the memory basins. Finally, we show that our model can be extended to store sequences of memories as single attractors.
Perceptual manifolds arise when a neural population responds to an ensemble of sensory signals associated with different physical features (e.g., orientation, pose, scale, location, and intensity) of ...the same perceptual object. Object recognition and discrimination require classifying the manifolds in a manner that is insensitive to variability within a manifold. How neuronal systems give rise to invariant object classification and recognition is a fundamental problem in brain theory as well as in machine learning. Here, we study the ability of a readout network to classify objects from their perceptual manifold representations. We develop a statistical mechanical theory for the linear classification of manifolds with arbitrary geometry, revealing a remarkable relation to the mathematics of conic decomposition. We show how special anchor points on the manifolds can be used to define novel geometrical measures of radius and dimension, which can explain the classification capacity for manifolds of various geometries. The general theory is demonstrated on a number of representative manifolds, includingℓ2ellipsoids prototypical of strictly convex manifolds,ℓ1balls representing polytopes with finite samples, and ring manifolds exhibiting nonconvex continuous structures that arise from modulating a continuous degree of freedom. The effects of label sparsity on the classification capacity of general manifolds are elucidated, displaying a universal scaling relation between label sparsity and the manifold radius. Theoretical predictions are corroborated by numerical simulations using recently developed algorithms to compute maximum margin solutions for manifold dichotomies. Our theory and its extensions provide a powerful and rich framework for applying statistical mechanics of linear classification to data arising from perceptual neuronal responses as well as to artificial deep networks trained for object recognition tasks.
We present a simple model for coherent, spatially correlated chaos in a recurrent neural network. Networks of randomly connected neurons exhibit chaotic fluctuations and have been studied as a model ...for capturing the temporal variability of cortical activity. The dynamics generated by such networks, however, are spatially uncorrelated and do not generate coherent fluctuations, which are commonly observed across spatial scales of the neocortex. In our model we introduce a structured component of connectivity, in addition to random connections, which effectively embeds a feedforward structure via unidirectional coupling between a pair of orthogonal modes. Local fluctuations driven by the random connectivity are summed by an output mode and drive coherent activity along an input mode. The orthogonality between input and output mode preserves chaotic fluctuations by preventing feedback loops. In the regime of weak structured connectivity we apply a perturbative approach to solve the dynamic mean-field equations, showing that in this regime coherent fluctuations are driven passively by the chaos of local residual fluctuations. When we introduce a row balance constraint on the random connectivity, stronger structured connectivity puts the network in a distinct dynamical regime of self-tuned coherent chaos. In this regime the coherent component of the dynamics self-adjusts intermittently to yield periods of slow, highly coherent chaos. The dynamics display longer time-scales and switching-like activity. We show how in this regime the dynamics depend qualitatively on the particular realization of the connectivity matrix: a complex leading eigenvalue can yield coherent oscillatory chaos while a real leading eigenvalue can yield chaos with broken symmetry. The level of coherence grows with increasing strength of structured connectivity until the dynamics are almost entirely constrained to a single spatial mode. We examine the effects of network-size scaling and show that these results are not finite-size effects. Finally, we show that in the regime of weak structured connectivity, coherent chaos emerges also for a generalized structured connectivity with multiple input-output modes.
Neurons and networks in the cerebral cortex must operate reliably despite multiple sources of noise. To evaluate the impact of both input and output noise, we determine the robustness of ...single-neuron stimulus selective responses, as well as the robustness of attractor states of networks of neurons performing memory tasks. We find that robustness to output noise requires synaptic connections to be in a balanced regime in which excitation and inhibition are strong and largely cancel each other. We evaluate the conditions required for this regime to exist and determine the properties of networks operating within it. A plausible synaptic plasticity rule for learning that balances weight configurations is presented. Our theory predicts an optimal ratio of the number of excitatory and inhibitory synapses for maximizing the encoding capacity of balanced networks for given statistics of afferent activations. Previous work has shown that balanced networks amplify spatiotemporal variability and account for observed asynchronous irregular states. Here we present a distinct type of balanced network that amplifies small changes in the impinging signals and emerges automatically from learning to perform neuronal and network functions robustly.
Memory traces in dynamical systems Ganguli, Surya; Huh, Dongsung; Sompolinsky, Haim
Proceedings of the National Academy of Sciences - PNAS,
12/2008, Letnik:
105, Številka:
48
Journal Article
Recenzirano
Odprti dostop
To perform nontrivial, real-time computations on a sensory input stream, biological systems must retain a short-term memory trace of their recent inputs. It has been proposed that generic ...high-dimensional dynamical systems could retain a memory trace for past inputs in their current state. This raises important questions about the fundamental limits of such memory traces and the properties required of dynamical systems to achieve these limits. We address these issues by applying Fisher information theory to dynamical systems driven by time-dependent signals corrupted by noise. We introduce the Fisher Memory Curve (FMC) as a measure of the signal-to-noise ratio (SNR) embedded in the dynamical state relative to the input SNR. The integrated FMC indicates the total memory capacity. We apply this theory to linear neuronal networks and show that the capacity of networks with normal connectivity matrices is exactly 1 and that of any network of N neurons is, at most, N. A nonnormal network achieving this bound is subject to stringent design constraints: It must have a hidden feedforward architecture that superlinearly amplifies its input for a time of order N, and the input connectivity must optimally match this architecture. The memory capacity of networks subject to saturating nonlinearities is further limited, and cannot exceed Formula: see text. This limit can be realized by feedforward structures with divergent fan out that distributes the signal across neurons, thereby avoiding saturation. We illustrate the generality of the theory by showing that memory in fluid systems can be sustained by transient nonnormal amplification due to convective instability or the onset of turbulence.