This page uses JavaScript. Your browser either doesn't support JavaScript or you have it turned off. To see this page as it is meant to appear please use a JavaScript enabled browser.

Upload image

Upload file

Capture with webcam

Remove image

Capture

Crop

File is uploading ...

MENU MENU

COBISS databases
New search
Search history
Inform. resources
E-resources
mEga
dLib.si
Help
- Help
- News
Theme
My Profile
- Login
- Become a member

Search Search results

Search results

Basic search Advanced search

Search
request

Instructions for searching through e-resources

Library

Currently you are NOT authorised to access e-resources NUK. For full access, REGISTER.

COBISS
E-resources NUK

Add to 'My shelf'
Export to Excel
Export to RIS
Save search
Create alert
Subscribe to the RSS feed for this search

1 2 3 4 5

1.	Gradient temporal-difference learning for off-policy evaluation using emphatic weightings Cao, Jiaqing; Liu, Quan; Zhu, Fei ... Information sciences, November 2021, 2021-11-00, Volume: 580 Journal Article Peer reviewed The problem of off-policy evaluation (OPE) has long been advocated as one of the foremost challenges in reinforcement learning. Gradient-based and emphasis-based temporal-difference (TD) learning ...	Full text
2.	Rethinking dopamine as generalized prediction error Gardner, Matthew P H; Schoenbaum, Geoffrey; Gershman, Samuel J Proceedings of the Royal Society. B, Biological sciences, 11/2018, Volume: 285, Issue: 1891 Journal Article Peer reviewed Open access Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, several ...	Full text PDF
3.	High-probability sample complexities for policy evaluation with linear function approximation Li, Gen; Wu, Weichen; Chi, Yuejie ... IEEE transactions on information theory, 2024 Journal Article Peer reviewed This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities ...	Full text
4.	Learning from experience: Event-related potential correlates of reward processing, neural adaptation, and behavioral choice WALSH, Matthew M; ANDERSON, John R Neuroscience and biobehavioral reviews, 09/2012, Volume: 36, Issue: 8 Journal Article Peer reviewed Open access To behave adaptively, we must learn from the consequences of our actions. Studies using event-related potentials (ERPs) have been informative with respect to the question of how such learning occurs. ...	Full text PDF
5.	Community energy storage operation via reinforcement learning with eligibility traces Salazar Duque, Edgar Mauricio; Giraldo, Juan S.; Vergara, Pedro P. ... Electric power systems research, November 2022, 2022-11-00, Volume: 212 Journal Article Peer reviewed Open access The operation of a community energy storage system (CESS) is challenging due to the volatility of photovoltaic distributed generation, electricity consumption, and energy prices. Selecting the ...	Full text
6.	The Medial Prefrontal Cortex Shapes Dopamine Reward Prediction Errors under State Uncertainty Starkweather, Clara Kwon; Gershman, Samuel J.; Uchida, Naoshige Neuron (Cambridge, Mass.), 05/2018, Volume: 98, Issue: 3 Journal Article Peer reviewed Open access Animals make predictions based on currently available information. In natural settings, sensory cues may not reveal complete information, requiring the animal to infer the “hidden state” of the ...	Full text PDF
7.	Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision Lak, Armin; Nomoto, Kensaku; Keramati, Mehdi ... Current biology, 03/2017, Volume: 27, Issue: 6 Journal Article Peer reviewed Open access Central to the organization of behavior is the ability to predict the values of outcomes to guide choices. The accuracy of such predictions is honed by a teaching signal that indicates how incorrect ...	Full text PDF
8.	Natural actor–critic algorithms Bhatnagar, Shalabh; Sutton, Richard S.; Ghavamzadeh, Mohammad ... Automatica (Oxford), 11/2009, Volume: 45, Issue: 11 Journal Article Peer reviewed Open access We present four new reinforcement learning algorithms based on actor–critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor–critic reinforcement ...	Full text PDF
9.	Simple and Optimal Methods for Stochastic Variational Inequalities, II: Markovian Noise and Policy Evaluation in Reinforcement Learning 01/2022 Journal Article Peer reviewed Open access	Full text
10.	A Fast Technique for Smart Home Management: ADP With Temporal Difference Learning Keerthisinghe, Chanaka; Verbic, Gregor; Chapman, Archie C. IEEE transactions on smart grid, 2018-July, 2018-7-00, Volume: 9, Issue: 4 Journal Article Peer reviewed This paper presents a computationally efficient smart home energy management system (SHEMS) using an approximate dynamic programming (ADP) approach with temporal difference learning for scheduling ...	Full text