Offline reinforcement learning (RL) has seen notable advancements through return-conditioned supervised learning (RCSL) and value-based methods, yet each approach comes with its own set of practical ...challenges. Addressing these, we propose Value-Aided Conditional Supervised Learning (VCS), a method that effectively synergizes the stability of RCSL with the stitching ability of value-based methods. Based on the Neural Tangent Kernel analysis to discern instances where value function may not lead to stable stitching, VCS injects the value aid into the RCSL's loss function dynamically according to the trajectory return. Our empirical studies reveal that VCS not only significantly outperforms both RCSL and value-based methods but also consistently achieves, or often surpasses, the highest trajectory returns across diverse offline RL benchmarks. This breakthrough in VCS paves new paths in offline RL, pushing the limits of what can be achieved and fostering further innovations.
The recent success of Transformer in natural language processing has sparked its use in various domains. In offline reinforcement learning (RL), Decision Transformer (DT) is emerging as a promising ...model based on Transformer. However, we discovered that the attention module of DT is not appropriate to capture the inherent local dependence pattern in trajectories of RL modeled as a Markov decision process. To overcome the limitations of DT, we propose a novel action sequence predictor, named Decision ConvFormer (DC), based on the architecture of MetaFormer, which is a general structure to process multiple entities in parallel and understand the interrelationship among the multiple entities. DC employs local convolution filtering as the token mixer and can effectively capture the inherent local associations of the RL dataset. In extensive experiments, DC achieved state-of-the-art performance across various standard RL benchmarks while requiring fewer resources. Furthermore, we show that DC better understands the underlying meaning in data and exhibits enhanced generalization capability.
The problem of interference alignment in time-varying MIMO interference channels is considered. To reduce complexity, an adaptive algorithm for beam vector design is proposed based on our previous ...work of least squares approach to beam design for interference alignment and matrix perturbation theory. The proposed algorithm calculates interference-aligning beam vectors by additive update of the previous value and reduces complexity significantly. Numerical results are provided to validate the proposed algorithm. It is shown that the proposed adaptive algorithm yields almost the same performance as a non-adaptive method that calculates interference-aligning beam vectors at every time step.
Novel triangular ring resonators combining extremely small multimode-interference (MMI) coupler, low loss total internal reflection (TIR) mirrors, and semiconductor optical amplifiers are reported ...for the first time. The MMI length of 90 microm is among the shortest reported. The incidence angle of the TIR mirror inside the resonator is 22 degrees. A free-spectral range of approximately 2 nm is observed near 1550 nm along with an on-off ratio of 17 dB. The triangular resonators with a sharp angle are very attractive components due to their promise of compact size and high levels of integration. Therefore, large numbers of resonators can be integrated on a chip to increase functionality in future optical wavelength division multiplexing system.
Autoinflammatory Blau syndrome (BS) is associated with
NOD2
gene mutations that lead to constitutive NFκB activation. NOD2 functions as an intracellular receptor for the muramyl dipeptide (MDP) ...component of peptidoglycan (PGN). The objectives of this study are to analyse whether NFκB activation in BS affects immune cell functions, and whether NOD2 and toll-like receptor (TLR) pathways interact. Peripheral blood mononuclear cells (MNCs) from a BS patient and three normal donors were analyzed for their ability to produce pro- and anti-inflammatory cytokines in the presence and absence of MDP, PGN, and lipopolysaccharide (LPS). The results obtained showed that the basal TNF-α and IL-10 production by MNCs over 24 h of incubation was very low for both the patient and the normal donors. However, upon stimulation with MDP, LPS, and PGN, the cells from the BS patient produced much lower levels of TNF-α, IL-10, G-CSF, and IFN-γ than the normal donor cells. We conclude that the pathogenic mechanism responsible for the chronic inflammation that characterizes BS may relate to the impaired production of both pro- and anti-inflammatory cytokines to stimuli. The NOD2 pathway possibly interacts with either the TLR2 or TLR4 pathways.
Deep reinforcement learning (RL) has achieved remarkable success in solving complex tasks through its integration with deep neural networks (DNNs) as function approximators. However, the reliance on ...DNNs has introduced a new challenge called primacy bias, whereby these function approximators tend to prioritize early experiences, leading to overfitting. To mitigate this primacy bias, a reset method has been proposed, which performs periodic resets of a portion or the entirety of a deep RL agent while preserving the replay buffer. However, the use of the reset method can result in performance collapses after executing the reset, which can be detrimental from the perspective of safe RL and regret minimization. In this paper, we propose a new reset-based method that leverages deep ensemble learning to address the limitations of the vanilla reset method and enhance sample efficiency. The proposed method is evaluated through various experiments including those in the domain of safe RL. Numerical results show its effectiveness in high sample efficiency and safety considerations.
A new widely tunable laser diode structure that requires only two tuning currents is proposed. The laser diode consists of a sampled grating distributed feedback (SGDFB) laser diode monolithically ...integrated with a sampled grating distributed Bragg reflector (SGDBR). The phase control sections are properly inserted between the grating bursts of the SGDBR and SGDFB sections for the discrete and continuous tuning. To confirm the feasibility of the new structure, the split-step time domain model is used. The simulation result for a particular design shows that the tuning range as wide as 27 nm is possible with side-mode suppression ratio exceeding 35 dB. Furthermore, the output power is larger than that from SGDBR laser diodes with similar parameters.
In this paper, we propose a new mutual information framework for multi-agent reinforcement learning to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return ...with the simultaneous mutual information between multi-agent actions. By introducing a latent variable to induce nonzero mutual information between multi-agent actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. The derived tractable objective can be interpreted as maximum entropy reinforcement learning combined with uncertainty reduction of other agents actions. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic, which follows centralized learning with decentralized execution. We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms other MARL algorithms in multi-agent tasks requiring high-quality coordination.