The water maze is commonly used to assay spatial cognition, or, more generally, learning and memory in experimental rodent models. In the water maze, mice or rats are trained to navigate to a ...platform located below the water's surface. Spatial learning is then typically assessed in a probe test, where the platform is removed from the pool and the mouse or rat is allowed to search for it. Performance in the probe test may then be evaluated using either occupancy-based (percent time in a virtual quadrant Q or zone Z centered on former platform location), error-based (mean proximity to former platform location P) or counting-based (platform crossings X) measures. While these measures differ in their popularity, whether they differ in their ability to detect group differences is not known. To address this question we compiled five separate databases, containing more than 1600 mouse probe tests. Random selection of individual trials from respective databases then allowed us to simulate experiments with varying sample and effect sizes. Using this Monte Carlo-based method, we found that the P measure consistently outperformed the Q, Z and X measures in its ability to detect group differences. This was the case regardless of sample or effect size, and using both parametric and non-parametric statistical analyses. The relative superiority of P over other commonly used measures suggests that it is the most appropriate measure to employ in both low- and high-throughput water maze screens.
Although the hippocampus plays a crucial role in the formation of spatial memories, as these memories mature they may become additionally (or even exclusively) dependent on extrahippocampal ...structures. However, the identity of these extrahippocampal structures that support remote spatial memory is currently not known. Using a Morris water-maze task, we show that the anterior cingulate cortex (ACC) plays a key role in the expression of remote spatial memories in mice. To first evaluate whether the ACC is activated after the recall of spatial memory, we examined the expression of the immediate early gene, c-fos, in the ACC. Fos expression was elevated after expression of a remote (1 month old), but not recent (1 d old), water-maze memory, suggesting that ACC plays an increasingly important role as a function of time. Consistent with the gene expression data, targeted pharmacological inactivation of the ACC with the sodium channel blocker lidocaine blocked expression of remote, but spared recent, spatial memory. In contrast, inactivation of the dorsal hippocampus disrupted expression of spatial memory, regardless of its age. We further showed that inactivation of the ACC blocked expression of remote spatial memory in two different mouse strains, after training with either a hidden or visible platform in a constant location, and using the AMPA receptor antagonist CNQX. Together, our data provide evidence that circuits supporting spatial memory are reorganized in a time-dependent manner, and establish that activity in neurons intrinsic to the ACC is critical for processing remote spatial memories.
Demand response (DR) for residential and small commercial buildings is estimated to account for as much as 65% of the total energy savings potential of DR, and previous work shows that a fully ...automated energy management system (EMS) is a necessary prerequisite to DR in these areas. In this paper, we propose a novel EMS formulation for DR problems in these sectors. Specifically, we formulate a fully automated EMS's rescheduling problem as a reinforcement learning (RL) problem, and argue that this RL problem can be approximately solved by decomposing it over device clusters. Compared with existing formulations, our new formulation does not require explicitly modeling the user's dissatisfaction on job rescheduling, enables the EMS to self-initiate jobs, allows the user to initiate more flexible requests, and has a computational complexity linear in the number of device clusters. We also demonstrate the simulation results of applying Q-learning, one of the most popular and classical RL algorithms, to a representative example.
In the water maze, mice are trained to navigate to an escape platform located below the water's surface, and spatial learning is most commonly evaluated in a probe test in which the platform is ...removed from the pool. While contemporary tracking software provides precise positional information of mice for the duration of the probe test, existing performance measures (e.g., percent quadrant time, platform crossings) fail to exploit fully the richness of this positional data. Using the concept of entropy (H), here we develop a new measure that considers both how focused the search is and the degree to which searching is centered on the former platform location. To evaluate how H performs compared to existing measures of water maze performance we compiled five separate databases, containing more than 1600 mouse probe tests. Random selection of individual trials from respective databases then allowed us to simulate experiments with varying sample and effect sizes. Using this Monte Carlo-based method, we found that H outperformed existing measures in its ability to detect group differences over a range of sample or effect sizes. Additionally, we validated the new measure using three models of experimentally induced hippocampal dysfunction: (1) complete hippocampal lesions, (2) genetic deletion of alphaCaMKII, a gene implicated in hippocampal behavioral and synaptic plasticity, and (3) a mouse model of Alzheimer's disease. Together, these data indicate that H offers greater sensitivity than existing measures, most likely because it exploits the richness of the precise positional information of the mouse throughout the probe test.
The brain is easily able to process and categorize complex time-varying signals. For example, the two sentences, “It is cold in London this time of year” and “It is hot in London this time of year,” ...have different meanings, even though the words
and
appear several seconds before the ends of the two sentences. Any network that can tell these sentences apart must therefore have a long temporal memory. In other words, the current state of the network must depend on events that happened several seconds ago. This is a difficult task, as neurons are dominated by relatively short time constants—tens to hundreds of milliseconds. Nevertheless, it was recently proposed that randomly connected networks could exhibit the long memories necessary for complex temporal processing. This is an attractive idea, both for its simplicity and because little tuning of recurrent synaptic weights is required. However, we show that when connectivity is high, as it is in the mammalian brain, randomly connected networks cannot exhibit temporal memory much longer than the time constants of their constituent neurons.
We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution ...targets problems in reinforcement learning where the action representation adds to the-curse-of-dimensionality; that is, with continuous or large action sets, thus making it infeasible to estimate state-action value functions (Q functions). Using state-value functions helps to lift the curse and as a result naturally turn our policy-gradient solution into classical Actor-Critic architecture whose Actor uses state-value function for the update. Our algorithms, Gradient Actor-Critic and Emphatic Actor-Critic, are derived based on the exact gradient of averaged state-value function objective and thus are guaranteed to converge to its optimal solution, while maintaining all the desirable properties of classical Actor-Critic methods with no additional hyper-parameters. To our knowledge, this is the first time that convergent off-policy learning methods have been extended to classical Actor-Critic methods with function approximation.
Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose ...complexity scales only linearly in the size of the function approximator. Although their gradient temporal difference (GTD) algorithm converges reliably, it can be very slow compared to conventional linear TD (on on-policy problems where TD is convergent), calling into question its practical utility. In this paper we introduce two new related algorithms with better convergence rates. The first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD). The second new algorithm, linear TD with gradient correction, or TDC, uses the same update rule as conventional TD except for an additional term which is initially zero. In our experiments on small test problems and in a Computer Go application with a million features, the learning rate of this algorithm was comparable to that of conventional TD. This algorithm appears to extend linear TD to off-policy learning with no penalty in performance while only doubling computational requirements.
In this paper we introduce a fully end-to-end approach for visual tracking in videos that learns to predict the bounding box locations of a target object at every frame. An important insight is that ...the tracking problem can be considered as a sequential decision-making process and historical semantics encode highly relevant information for future decisions. Based on this intuition, we formulate our model as a recurrent convolutional neural network agent that interacts with a video overtime, and our model can be trained with reinforcement learning (RL) algorithms to learn good tracking policies that pay attention to continuous, inter-frame correlation and maximize tracking performance in the long run. The proposed tracking algorithm achieves state-of-the-art performance in an existing tracking benchmark and operates at frame-rates faster than real-time. To the best of our knowledge, our tracker is the first neural-network tracker that combines convolutional and recurrent networks with RL algorithms.