We focus on the learning prediction problems in reinforcement learning with linear function approximation. In particular, the ℓ 1 -regularized problems in least-squares temporal difference with ...gradient correction (LS-TDC) are studied. Since LS-TDC contains gradient correction term, the convergence rate of LS-TDC is higher than that of least-squares temporal difference (LS-TD) algorithm. However, LS-TDC may over-fit to data as LS-TD does when the number of features is larger than that of samples. Thus, the regularization and feature selection of LS-TDC are studied. It is well known that ℓ 1 -regularization can produce sparse solutions and often serves as an automatic feature selection method in value function approximation. The ℓ 1 -regularized problem in LS-TDC adds a penalty term into the fixed-point function, but this augment function cannot be solved analytically. We turn to build the optimal solution incrementally by using an algorithm similar to Least Angle Regression (LARS) algorithm and LARS-TD algorithm. By using LARS algorithm, an ℓ 1 -regularized version of LS-TDC named LARS-TDC is proposed. Experiment results show that LARS-TDC is an effective method to solve the ℓ 1 -regularized problem.
The task of learning the value function under a fixed policy in continuous Markov decision processes (MDPs) is considered. Although ELM has fast learning speed and can avoid tuning issues of ...traditional artificial neural network (ANN), the randomness of the ELM parameters would result in fluctuating performance. In this paper, a least-squares temporal difference algorithm with eligibility traces based on regularized extreme learning machine (RELM-LSTD(X)) is proposed to overcome these problems caused by ELM in Reinforcement Learning problem. The proposed algorithm combined the LSTD(X) algorithm with RELM. The RELM is used to approximate value functions. Furthermore, the eligibility trace term is introduced to increase data efficiency. In experiments, the performances of the proposed algorithm are demonstrated and compared with those of LSTD and ELM-LSTD. Experiment results show that the proposed algorithm can achieve a more stable and better performance in approximating the value function under a fixed policy.
An 2-regularized policy evaluation algorithm, termed RRC (Regularized RC), is proposed for applying in the reinforcement learning problems. RBF network is used to construct VFA, and its weight vector ...is solved based on RC algorithm. An additional recursive step is used to achieve a different effect from traditional recursive least-square-based 2-regularized algorithm: the regularization term does not decrease throughout learning. Additionally, a fast counterpart algorithm with O(n2) complexity is also proposed, termed as Fast RRC (FRRC), which is more practical online algorithm than RRC. The convergence analysis and experiments results demonstrate the significant performances of RRC and FRRC.
In recent years, the policy gradient method in intensive learning has attracted wide attention with its good convergence performance. At the same time, regulation of hyper parameters is also a matter ...of concern. Based on the advantages of Actor-Critic structure (AC), the Natural-Gradient Actor-Critic algorithm (NAC) in the discount model is studied in this article. Then the Natural-Gradient Actor-Critic with ADADELTA (A-NAC) algorithm is proposed .The use of ADADELTA is adapted to adjust the learning rate in the actor network, and further improves the convergence speed of the NAC algorithm. Simulation results show that NAC/A-NAC have better learning efficiency and faster convergence rate than regular gradient AC methods.
We discuss the stability of fractional singular systems with time delay under the state feedback. Considering the singularity of the system, we decomposed the system into two subsystems. Through ...fractional Laplacian transformation and inverse Laplacian transformation on the subsystems, the expression of the state variables in time domain is obtained. According to the characteristics of Mittag-Leffler function, some inequalities that have important influence on stability are derived. Finally, we find a new sufficient condition to make the fractional singular systems with time delay stable when the fractional order belongs to 1 < α < 2. Correspondingly, we can also select the appropriate state feedback matrix under the condition that make system stable. All processes are proved and numerical examples are provided to show the validity and feasibility of the proposed method.
Reinforcement learning is considered to be one of the main methods of general artificial intelligence, which can realize self-learning of machines through interaction with the environment. In this ...paper, a modified version of deep reinforcement learning algorithm based on the Actor-Critic framework is proposed. Unlike traditional updated methods, the algorithm proposed in this paper adopts a special on-policy method, which we called Accelerated Linear Approximation Method in Deep Actor-Critic Framework (ALA-AC). When the network is trained to a certain extent, the networks' parameters of some layers are frozen, and the remaining layers' parameters are trained for better strategy and faster training speed.
Hierarchical, structurally colored materials offer a wide variety of visual effects that cannot be achieved with standard pigments or dyes. However, their fabrication requires simultaneous control ...over multiple length-scales. Here we introduce a robust strategy for the fabrication of hierarchical photonic pigments via the confined self-assembly of bottlebrush block copolymers within emulsified microdroplets. The bottlebrush block copolymer self-assembles into highly ordered concentric lamellae, giving rise to a near perfect photonic multilayer in the solid state, with reflectivity up to 100%. The reflected color can be readily tuned across the whole visible spectrum by either altering the molecular weight or by blending the bottlebrush block copolymers. Furthermore, the developed photonic pigments are responsive, with a selective and reversible color change observed upon swelling in different solvents. Our system is particularly suited for the scalable production of photonic pigments, arising from their rapid self-assembly mechanism and size-independent color.
Photonic materials with angular‐independent structural color are highly desirable because they offer the broad viewing angles required for application as colorants in paints, cosmetics, textiles, or ...displays. However, they are challenging to fabricate as they require isotropic nanoscale architectures with only short‐range correlation. Here, porous microparticles with such a structure are produced in a single, scalable step from an amphiphilic bottlebrush block copolymer. This is achieved by exploiting a novel “controlled micellization” self‐assembly mechanism within an emulsified toluene‐in‐water droplet. By restricting water permeation through the droplet interface, the size of the pores can be precisely addressed, resulting in structurally colored pigments. Furthermore, the reflected color can be tuned to reflect across the full visible spectrum using only a single polymer (Mn = 290 kDa) by altering the initial emulsification conditions. Such “photonic pigments” have several key advantages over their crystalline analogues, as they provide isotropic structural coloration that suppresses iridescence and improves color purity without the need for either refractive index matching or the inclusion of a broadband absorber.
Photonic microparticles are produced in a single, scalable step from an amphiphilic, low‐molecular‐weight bottlebrush block copolymer. By controlling the formation, swelling, and subsequent self‐assembly of giant reverse micelles within an emulsified toluene‐in‐water droplet, a highly porous structure with short‐range correlation is produced. This isotropic inverse photonic architecture allows for a full spectrum of angular‐independent, structurally colored pigments.
The change detection of polarimetric synthetic aperture radar (PolSAR) images is a longstanding and challenging task, not only because of the speckle issue but also due to the complex texture, which ...generally appears highly heterogeneous. There are two widely used approaches for the change detection of PolSAR images: one is the post classification comparison algorithm, and the other is the directly unsupervised change detection algorithm. In this paper, we focus on the latter and propose a region-based change detection method for PolSAR images by means of Wishart mixture models (WMMs). The WMMs fit the distribution of PolSAR images with less errors both in the homogeneous and the extremely heterogeneous area. More precisely, two PolSAR images are first segmented into compact local regions using the customized simple-linear-iterative-clustering algorithm, while the WMMs are used to model each local region. To generate a difference map, statistical distribution differences measured by information theoretic divergence are then computed for corresponding local region pairs. The Cauchy-Schwarz divergence is adopted as its analytic expression can be derived for WMMs. Finally, the change detection results are obtained by the Kittler-Illingworth thresholding method with Markov random field-based smoothing. The proposed scheme is tested on different PolSAR data sets. Qualitative and quantitative evaluations show its superior performance comparing to the traditional pixel-level approach.
Rye is a valuable food and forage crop, an important genetic resource for wheat and triticale improvement and an indispensable material for efficient comparative genomic studies in grasses. Here, we ...sequenced the genome of Weining rye, an elite Chinese rye variety. The assembled contigs (7.74 Gb) accounted for 98.47% of the estimated genome size (7.86 Gb), with 93.67% of the contigs (7.25 Gb) assigned to seven chromosomes. Repetitive elements constituted 90.31% of the assembled genome. Compared to previously sequenced Triticeae genomes, Daniela, Sumaya and Sumana retrotransposons showed strong expansion in rye. Further analyses of the Weining assembly shed new light on genome-wide gene duplications and their impact on starch biosynthesis genes, physical organization of complex prolamin loci, gene expression features underlying early heading trait and putative domestication-associated chromosomal regions and loci in rye. This genome sequence promises to accelerate genomic and breeding studies in rye and related cereal crops.