This paper studies the problem of finding
a priori shortest paths to guarantee a given likelihood of arriving on-time in a stochastic network. Such “reliable” paths help travelers better plan their ...trips to prepare for the risk of running late in the face of stochastic travel times. Optimal solutions to the problem can be obtained from
local-reliable paths, which are a set of
non-dominated paths under first-order stochastic dominance. We show that Bellman’s principle of optimality can be applied to construct local-reliable paths. Acyclicity of local-reliable paths is established and used for proving finite convergence of solution procedures. The connection between the
a priori path problem and the corresponding
adaptive routing problem is also revealed. A label-correcting algorithm is proposed and its complexity is analyzed. A pseudo-polynomial approximation is proposed based on extreme-dominance. An extension that allows travel time distribution functions to vary over time is also discussed. We show that the time-dependent problem is decomposable with respect to arrival times and therefore can be solved as easily as its static counterpart. Numerical results are provided using typical transportation networks.
The minimization of a multiobjective Lagrangian with non-constant discount is studied. The problem is embedded into a set-valued framework and a corresponding definition of the value function is ...given. Bellman's optimality principle and Hopf-Lax formula are derived. The value function is shown to be a solution of a set-valued Hamilton-Jacobi equation.
This paper presents the first framework (up to the authors' knowledge) to address time-varying objectives in finite-horizon Deep Reinforcement Learning (DeepRL), based on a switching control solution ...developed on the ground of Bellman's principle of optimality. By augmenting the state space of the system with information on its visit time, the DeepRL agent is able to solve problems in which its task dynamically changes within the same episode. To address the scalability problems caused by the state space augmentation, we propose a procedure to partition the episode length to define separate sub-problems that are then solved by specialised DeepRL agents. Contrary to standard solutions, with the proposed approach the DeepRL agents correctly estimate the value function at each time-step and are hence able to solve time-varying tasks. Numerical simulations validate the approach in a classic RL environment.
The complete-lattice approach to optimization problems with a vector- or even set-valued objective already produced a variety of new concepts and results and was successfully applied in finance, ...statistics and game theory. So far, it has only been applied to set-valued dynamic risk measures within a stochastic framework, but not to deterministic calculus of variations and optimal control problems. In this paper, a multi-objective calculus of variations problem is considered which is turned into a set-valued problem by a straightforward extension. A new set-valued value function is introduced, for which a Bellman's optimality principle holds. Also the classical result of the Hopf-Lax formula holds for the generalized value function. Finally, a derivative with respect to the time and a directional derivative with respect to the state variable of the value function are defined. The value function is proved to be a solution of a corresponding Hamilton-Jacobi equation.
This paper considers the optimal feedback planning problem of a point robot among polygonal obstacles in
. In this problem, the Euclidean distance traveled by the robot is minimized. The approximate ...optimal feedback plan is computed using a piecewise linear approximation of the cost-to-go function. The approximate cost-to-go function, in turn, satisfies the discrete version of dynamic programming principle defined using a simplicial decomposition of the environment. Adaptations of Dijkstra's and A
∗
algorithms are introduced that solve the nonlinear system of discrete dynamic programming equations. Interpolation methods are carefully designed and analyzed so that they are proven to converge numerically. As a result, the computed feedback plan produces approximately optimal trajectories. The methods are implemented and demonstrated on 2D and 3D examples. As expected, the simplicial A
∗
algorithm significantly improves performance over the simplicial Dijkstra's algorithm.
Starting with a time-0 coherent risk measure defined for "value processes" we also define risk measurement processes. Two other constructions of measurement processes are given in terms of sets of ...test probabilities. These latter constructions are identical and are related to the former construction when the sets fulfill a stability condition also met in multiperiod treatment of ambiguity as in decision-making. We finally deduce risk measurements for the final value of locked-in positions and repeat a warning concerning Tail-Value-at-Risk. PUBLICATION ABSTRACT
In this paper, optimal control problems for multi-stage and continuous-time linear singular systems are both considered. The singular systems are assumed to be regular and impulse-free. First, a ...recurrence equation is derived according to Bellman's principle of optimality in dynamic programming. Then, by applying the recurrence equation, bang-bang optimal controls for the control problems with linear objective functions subject to two types of multi-stage singular systems are obtained. Second, employing the principle of optimality, a equation of optimality for settling the optimal control problem subject to a class of continuous-time singular systems is proposed. The optimal control problem may become simpler through solving this equation of optimality. Two numerical examples and a dynamic input-output model are presented to show the effectiveness of the results obtained.
One of the main reasons for stratifying the population is to produce a gain in precision of the estimates, in the sample surveys. For achieving this, one of the problem is determination of optimum ...strata boundaries. The strata boundaries should be obtained in such a way, so that it can reasonably expect to reduce the cost of the survey as much as possible without sacrificing the accuracy or alternatively, reducing the margin of error to the greatest possible extent for the same expected cost. In this paper, we have discussed the way of obtaining optimum strata boundaries when the cost of every unit varies in the whole strata. The problem is formulated as non-linear programming problem which is solved by using Bellman’s principle of optimality. For numerical illustration an example is presented for uniformly distributed study variable.
Over-the-counter (OTC) markets for derivatives, collateralized debt obligations, and repurchase agreements played a significant role in the global financial crisis. Rather than being traded through a ...centralized institution such as a stock exchange, OTC trades are negotiated privately between market participants who may be unaware of prices that are currently available elsewhere in the market. In these relatively opaque markets, investors can be in the dark about the most attractive available terms and who might be offering them. This opaqueness exacerbated the financial crisis, as regulators and market participants were unable to quickly assess the risks and pricing of these instruments.