NUK - logo

Search results

Basic search    Advanced search   
Search
request
Library

Currently you are NOT authorised to access e-resources NUK. For full access, REGISTER.

1 2 3 4 5
hits: 242
1.
  • Gradient temporal-differenc... Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
    Cao, Jiaqing; Liu, Quan; Zhu, Fei ... Information sciences, November 2021, 2021-11-00, Volume: 580
    Journal Article
    Peer reviewed

    The problem of off-policy evaluation (OPE) has long been advocated as one of the foremost challenges in reinforcement learning. Gradient-based and emphasis-based temporal-difference (TD) learning ...
Full text
2.
  • Rethinking dopamine as gene... Rethinking dopamine as generalized prediction error
    Gardner, Matthew P H; Schoenbaum, Geoffrey; Gershman, Samuel J Proceedings of the Royal Society. B, Biological sciences, 11/2018, Volume: 285, Issue: 1891
    Journal Article
    Peer reviewed
    Open access

    Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, several ...
Full text

PDF
3.
  • High-probability sample com... High-probability sample complexities for policy evaluation with linear function approximation
    Li, Gen; Wu, Weichen; Chi, Yuejie ... IEEE transactions on information theory, 2024
    Journal Article
    Peer reviewed

    This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities ...
Full text
4.
  • Learning from experience: E... Learning from experience: Event-related potential correlates of reward processing, neural adaptation, and behavioral choice
    WALSH, Matthew M; ANDERSON, John R Neuroscience and biobehavioral reviews, 09/2012, Volume: 36, Issue: 8
    Journal Article
    Peer reviewed
    Open access

    To behave adaptively, we must learn from the consequences of our actions. Studies using event-related potentials (ERPs) have been informative with respect to the question of how such learning occurs. ...
Full text

PDF
5.
  • Community energy storage op... Community energy storage operation via reinforcement learning with eligibility traces
    Salazar Duque, Edgar Mauricio; Giraldo, Juan S.; Vergara, Pedro P. ... Electric power systems research, November 2022, 2022-11-00, Volume: 212
    Journal Article
    Peer reviewed
    Open access

    The operation of a community energy storage system (CESS) is challenging due to the volatility of photovoltaic distributed generation, electricity consumption, and energy prices. Selecting the ...
Full text
6.
  • The Medial Prefrontal Corte... The Medial Prefrontal Cortex Shapes Dopamine Reward Prediction Errors under State Uncertainty
    Starkweather, Clara Kwon; Gershman, Samuel J.; Uchida, Naoshige Neuron (Cambridge, Mass.), 05/2018, Volume: 98, Issue: 3
    Journal Article
    Peer reviewed
    Open access

    Animals make predictions based on currently available information. In natural settings, sensory cues may not reveal complete information, requiring the animal to infer the “hidden state” of the ...
Full text

PDF
7.
  • Midbrain Dopamine Neurons S... Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision
    Lak, Armin; Nomoto, Kensaku; Keramati, Mehdi ... Current biology, 03/2017, Volume: 27, Issue: 6
    Journal Article
    Peer reviewed
    Open access

    Central to the organization of behavior is the ability to predict the values of outcomes to guide choices. The accuracy of such predictions is honed by a teaching signal that indicates how incorrect ...
Full text

PDF
8.
  • Natural actor–critic algori... Natural actor–critic algorithms
    Bhatnagar, Shalabh; Sutton, Richard S.; Ghavamzadeh, Mohammad ... Automatica (Oxford), 11/2009, Volume: 45, Issue: 11
    Journal Article
    Peer reviewed
    Open access

    We present four new reinforcement learning algorithms based on actor–critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor–critic reinforcement ...
Full text

PDF
9.
Full text
10.
  • A Fast Technique for Smart ... A Fast Technique for Smart Home Management: ADP With Temporal Difference Learning
    Keerthisinghe, Chanaka; Verbic, Gregor; Chapman, Archie C. IEEE transactions on smart grid, 2018-July, 2018-7-00, Volume: 9, Issue: 4
    Journal Article
    Peer reviewed

    This paper presents a computationally efficient smart home energy management system (SHEMS) using an approximate dynamic programming (ADP) approach with temporal difference learning for scheduling ...
Full text
1 2 3 4 5
hits: 242

Load filters