Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an ...important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions
, the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems
. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks
. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.
If a conditioned stimulus or response has been inconsistently ("partially") reinforced, conditioned responding will take longer to extinguish than if responding had been established by consistent ...("continuous") reinforcement. This partial reinforcement extinction effect (PREE) is one of the best-known phenomena in associative learning but defies ready explanation by associative models which assume that a partial reinforcement schedule will produce weaker conditioning that should be less resistant to extinction. The most popular explanation of the PREE is that, during partial reinforcement, animals learn that recent nonreinforced (N) trials are associated with subsequent reinforcement (R), and therefore the presence of N trials during extinction serves to promote generalization of conditioning to extinction. According to sequential theory (Capaldi, 1966), animals can encode whole sequences (runs) of N trials and associate their memory of the sequence with subsequent R. The length of these N sequences during conditioning affects how long the animal will continue to respond during extinction. The present experiment used Pavlovian magazine approach conditioning with rats to test two predictions of this theory. Consistent with sequential theory, the PREE was sensitive to the length of the N sequence: conditioning with long sequences (runs of 3-5 N trials) produced a stronger PREE than conditioning with short sequences (runs of 1 or 2) even when the total number of N and R trials was held constant. Surprisingly, there was no PREE among rats trained with the short sequences. Moreover, contrary to the theory's prediction, interrupting the long N sequences with reinforced trials of a different conditioned stimulus did not affect the subsequent PREE. I conclude that uncertainty about reinforcement, rather than the memory of N sequences per se, is a key factor in the development of the PREE.
Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL ...research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.
•A hierarchical coordination reinforcement learning method is developed.•A framework is proposed to optimise the maintenance of complex systems via the HCRL.•The HCRL outperforms benchmark ...maintenance optimisation methods.•An efficient method is developed to simulate interdependent degradation processes.
The Markov decision process (MDP) is a widely used method to optimise the maintenance of multicomponent systems, which can provide a system-level maintenance action at each decision point to address various dependences among components. However, MDP suffers from the “curse of dimensionality” and can only process small-scale systems. This paper develops a hierarchical coordinated reinforcement learning (HCRL) algorithm to optimise the maintenance of large-scale multicomponent systems. Both parameters of agents and the coordination relationship among agents are designed based on system characteristics. Furthermore, the hierarchical structure of agents is established according to the structural importance measures of components. The effectiveness of the proposed HCRL algorithm is validated using two maintenance optimisation problems, one on a natural gas plant system and the other using a 12-component series system under dependant competing risks. Results show that the proposed HCRL outperforms methods in two recently published papers and other benchmark approaches including the new emerging deep reinforcement learning.
Background and aims
Emerging evidence suggests that solitary drinking may be an important early risk marker for alcohol use disorder. The current paper is the first meta‐analysis and systematic ...review on adolescent and young adult solitary drinking to examine associations between solitary drinking and increased alcohol consumption, alcohol problems, and drinking to cope motives.
Methods
PsychINFO, PubMed, and Google Scholar were searched using the Preferred Reporting Items for Systematic Reviews and Meta‐Analyses (PRISMA) methodology and a pre‐registered International Prospective Register of Systematic Reviews (PROSPERO) protocol (no. CRD42020143449). Data from self‐report questionnaires regarding negative correlates of solitary drinking (e.g. alcohol problems) and solitary drinking motives (e.g. drinking to cope) were pooled across studies using random‐effects models. Studies included adolescents (aged 12–18 years) and young adults (mean age between 18 and 30 years or samples with the majority of participants aged 30 years or younger).
Results
Meta‐analytical results from 21 unique samples including 28,372 participants showed significant effects for the associations between solitary drinking and the following factors: increased alcohol consumption, r = 0.23, 95% confidence interval (CI) = 0.12, 0.33; drinking problems, r = 0.23, 95% CI = 0.13, 0.32; negative affect, r = 0.21, 95% CI = 0.16, 0.26; social discomfort, r = 0.17, 95% CI = 0.06, 0.27; negative reinforcement, r = 0.28, 95% CI = 0.24, 0.31; and positive reinforcement, r = 0.10, 95% CI = 0.03, 0.17. These associations were not moderated by age group (i.e. adolescent versus young adult), study quality, or differing solitary drinking definitions. Accounting for publication bias increased the effect sizes from r = 0.23 to 0.34 for alcohol consumption and from r = 0.23 to 0.30 for drinking problems, and lowered it from r = 0.10 to 0.06 and r = 0.17 to 0.11 for positive reinforcement and social discomfort, respectively.
Conclusions
Solitary drinking among adolescents and young adults appears to be associated with psychosocial/alcohol problems and drinking to cope motives.
Concrete is the most widely used engineering material. While strong in compression, concrete is weak in tension and exhibits low ductility due to its low crack growth resistance. With increasing ...compressive strength, concrete becomes even more brittle, hence requiring appropriate reinforcement to enhance its ductility. This paper presents a new method for increasing the ductility of ultra-high-performance concrete by reinforcing it with 3D printed polymeric lattices made of either polylactic acid (PLA) or acrylonitrile butadiene styrene (ABS). These lattice-reinforced concrete specimens were then tested in compression and four-point bending. The effect of polymeric reinforcement ratios on mechanical properties was investigated by testing two lattice configurations. The lattices were very successful in transforming the brittle ultra-high-performance concrete (UHPC) into a ductile material with strain hardening behavior; all flexural specimens revealed multiple cracking and strain hardening behavior up to peak load. Increasing the ABS reinforcing ratio from 19.2% to 33.7% resulted in a 22% reduction in average compressive strength. However, in flexure, increasing the PLA reinforcing ratio from 19.2% to 33.7% resulted in a 38% increase in average peak load. The compression results of all specimens independent of their reinforcement ratio revealed smooth softening behavior in compression.
Display omitted
•Ultra-high-performance concrete was reinforced with 3D-printed, polymeric lattices, resulting in greatly increased ductility.•Ductility was optimized by deliberately orienting 3D printed polymer filaments in line with the expected tensile stresses.•The ductility-enhancing mechanisms during flexure are associated with multiple cracking and tortuous crack paths•This fabrication method allows easy pouring of the mortar mixture, unlike polymer fiber-reinforced composites.•This composite production method lends itself more readily to automated manufacturing than conventional steel rebar-reinforced concrete.
•3D concrete printing and structural testing of nine reinforced beams.•Aligned interlayer fibres and steel cables as interlayer shear reinforcement.•Full-field digital image correlation and precise ...analysis of crack kinematics.•Development of a mechanical model for interlayer shear reinforcement in 3D printed beams.
3D concrete printing (3DCP) offers many new possibilities. This technology could increase the productivity of the construction industry and reduce its environmental impact by producing optimised structures more efficiently. Despite significant developments in materials science, little effort has been put in developing reinforcement strategies compatible with 3DCP and on the characterisation of their structural behaviour. Consequently, 3DCD still lacks compliance with structural integrity requirements. This study presents an experimental investigation consisting of nine four-point bending tests on extrusion 3DCP beams reinforced with various types of reinforcement. As interlayer shear reinforcement, aligned end-hook fibres (0.3 and 0.6%) or steel cables (0.1%) placed between the layers of printed concrete were used. As longitudinal reinforcement, unbonded post-tensioning and conventional bonded passive reinforcement were explored. The crack patterns and their associated kinematics were tracked using digital image correlation. The results show that the post-tensioned beams failed in a brittle manner due to the crushing of concrete in bending, with deformations localised in a few bending cracks. In the beams with conventional bonded longitudinal reinforcement, both bending as well as shear cracks were generated, and the brittle failure of the interlayer shear reinforcement limited the ultimate load. Estimations based on the measured crack kinematics show that the interlayer shear reinforcement carried most of the applied shear force. Based on these results, a simple mechanical model is developed to understand the mechanical behaviour and to pre-design the required amount of interlayer shear reinforcement.
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are ...hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark.
Three experiments explored how training reinforcement schedules and context influence the elimination and recovery of human operant behavior. In Experiment 1, participants learned a discriminated ...operant response in Context A before the response was eliminated with extinction in Context B. They then received a final test in each context. Groups were trained with a discriminative stimulus that predicted a reinforced response on either every trial (continuous reinforcement CRF) or some of the trials (partial reinforcement PRF). Extinction was slower following PRF training (a partial reinforcement extinction effect PREE) and extinguished responding increased when tested in Context A ("ABA" renewal). Experiment 2 further confirmed the PREE was obtained equally whether extinction occurred in the training context (Context A) or a new context (Context B) which is consistent with trial-based accounts of the PREE. Experiment 3 used the same design as Experiment 1 to evaluate the influence of training reinforcement on response elimination with an omission contingency. Across the omission training phase in Context B, the decrease in responding occurred more slowly in the PRF-trained group in comparison to the CRF-trained group, perhaps the first demonstration of what might be termed a PRF omission effect. Again, ABA renewal was observed in Context A. Training reinforcement schedule therefore had a similar influence on response elimination with extinction and omission. Elimination and recovery of human instrumental behavior, with extinction or omission, are influenced by training reinforcement schedule and context.