This short note considers an efficient variant of the trust-region algorithm with dynamic accuracy proposed by Carter (SIAM J Sci Stat Comput 14(2):368–388, 1993) and by Conn et al. (Trust-region ...methods. MPS-SIAM series on optimization, SIAM, Philadelphia, 2000) as a tool for very high-performance computing, an area where it is critical to allow multi-precision computations for keeping the energy dissipation under control. Numerical experiments are presented indicating that the use of the considered method can bring substantial savings in objective function’s and gradient’s evaluation “energy costs” by efficiently exploiting multi-precision computations.
An adaptive regularization algorithm using inexact function and derivatives evaluations is proposed for the solution of composite nonsmooth nonconvex optimization. It is shown that this algorithm ...needs at most
O
(
|
log
(
ϵ
)
|
ϵ
-
2
)
evaluations of the problem’s functions and their derivatives for finding an
ϵ
-approximate first-order stationary point. This complexity bound therefore generalizes that provided by Bellavia et al. (Theoretical study of an adaptive cubic regularization method with dynamic inexact Hessian information.
arXiv:1808.06239
, 2018) for inexact methods for smooth nonconvex problems, and is within a factor
|
log
(
ϵ
)
|
of the optimal bound known for smooth and nonsmooth nonconvex minimization with exact evaluations. A practically more restrictive variant of the algorithm with worst-case complexity
O
(
|
log
(
ϵ
)
|
+
ϵ
-
2
)
is also presented.
Abstract
An adaptive regularization algorithm (AR$1p$GN) for unconstrained nonlinear minimization is considered, which uses a model consisting of a Taylor expansion of arbitrary degree and ...regularization term involving a possibly nonsmooth norm. It is shown that the nonsmoothness of the norm does not affect the ${\mathcal {O}}(\epsilon _1^{-(p+1)/p})$ upper bound on evaluation complexity for finding first-order $\epsilon _1$-approximate minimizers using $p$ derivatives, and that this result does not hinge on the equivalence of norms in $\mathbb {R}^n$. It is also shown that, if $p=2$, the bound of ${\mathcal {O}}(\epsilon _2^{-3})$ evaluations for finding second-order $\epsilon _2$-approximate minimizers still holds for a variant of AR$1p$GN named AR2GN, despite the possibly nonsmooth nature of the regularization term. Moreover, the adaptation of the existing theory for handling the nonsmoothness results is an interesting modification of the subproblem termination rules, leading to an even more compact complexity analysis. In particular, it is shown when the Newton’s step is acceptable for an adaptive regularization method. The approximate minimization of quadratic polynomials regularized with nonsmooth norms is then discussed, and a new approximate second-order necessary optimality condition is derived for this case. A specialized algorithm is then proposed to enforce first- and second-order conditions that are strong enough to ensure the existence of a suitable step in AR$1p$GN (when $p=2$) and in AR2GN, and its iteration complexity is analyzed. A final section discusses how practical approximate curvature measures may lead to weaker second-order optimality guarantees.
This paper considers optimization of nonconvex functionals in smooth infinite dimensional spaces. It is first proved that functionals in a class containing multivariate polynomials augmented with a ...sufficiently smooth regularization can be minimized by a simple linesearch-based algorithm. Sufficient smoothness depends on gradients satisfying a novel two-terms generalized Lipschitz condition. A first-order adaptive regularization method applicable to functionals with β-Hölder continuous derivatives is then proposed, that uses the linesearch approach to compute a suitable trial step. It is shown to find an ϵ-approximate first-order point in at most
evaluations of the functional and its first p derivatives.
In order to be provably convergent towards a second-order stationary point, optimization methods applied to nonconvex problems must necessarily exploit both first and second-order information. ...However, as revealed by recent complexity analyses of some of these methods, the overall effort to reach second-order points is significantly larger when compared to the one of approaching first-order ones. On the other hand, there are other algorithmic schemes, initially designed with first-order convergence in mind, that do not appear to maintain the same first-order performance when modified to take second-order information into account. In this paper, we propose a technique that separately computes first and second-order steps, and that globally converges to second-order stationary points: it consists in better connecting the steps to be taken and the stationarity criteria, potentially guaranteeing larger steps and decreases in the objective. Our approach is shown to lead to an improvement of the corresponding complexity bound with respect to the first-order optimality tolerance, while having a positive impact on the practical behavior. Although the applicability of our ideas is wider, we focus the presentation on trust-region methods with and without derivatives, and motivate in both cases the interest of our strategy.
The Gauss-Newton algorithm is an iterative method regularly used for solving nonlinear least squares problems. It is particularly well suited to the treatment of very large scale variational data ...assimilation problems that arise in atmosphere and ocean forecasting. The procedure consists of a sequence of linear least squares approximations to the nonlinear problem, each of which is solved by an "inner" direct or iterative process. In comparison with Newton's method and its variants, the algorithm is attractive because it does not require the evaluation of second-order derivatives in the Hessian of the objective function. In practice the exact Gauss-Newton method is too expensive to apply operationally in meteorological forecasting, and various approximations are made in order to reduce computational costs and to solve the problems in real time. Here we investigate the effects on the convergence of the Gauss-Newton method of two types of approximation used commonly in data assimilation. First, we examine "truncated" Gauss-Newton methods where the inner linear least squares problem is not solved exactly, and second, we examine "perturbed" Gauss-Newton methods where the true linearized inner problem is approximated by a simplified, or perturbed, linear least squares problem. We give conditions ensuring that the truncated and perturbed Gauss-Newton methods converge and also derive rates of convergence for the iterations. The results are illustrated by a simple numerical example. A practical application to the problem of data assimilation in a typical meteorological system is presented.
We consider solving unconstrained optimization problems by means of two popular globalization techniques: trust-region (TR) algorithms and adaptive regularized framework using cubics (ARC). Both ...techniques require the solution of a so-called “subproblem” in which a trial step is computed by solving an optimization problem involving an approximation of the objective function, called “the model”. The latter is supposed to be adequate in a neighborhood of the current iterate. In this paper, we address an important practical question related with the choice of the norm for defining the neighborhood. More precisely, assuming here that the Hessian
B
of the model is symmetric positive definite, we propose the use of the so-called “energy norm”—defined by
‖
x
‖
B
=
x
T
B
x
for all
x
∈
R
n
—in both TR and ARC techniques. We show that the use of this norm induces remarkable relations between the trial step of both methods that can be used to obtain efficient practical algorithms. We furthermore consider the use of truncated Krylov subspace methods to obtain an approximate trial step for large scale optimization. Within the energy norm, we obtain line search algorithms along the Newton direction, with a special backtracking strategy and an acceptability condition in the spirit of TR/ARC methods. The new line search algorithm, derived by ARC, enjoys a worst-case iteration complexity of
O
(
ϵ
-
3
/
2
)
. We show the good potential of the energy norm on a set of numerical experiments.
In this paper, we propose a new multilevel Levenberg-Marquardt optimizer for the training of artificial neural networks with quadratic loss function. This setting allows us to get further insight ...into the potential of multilevel optimization methods. Indeed, when the least squares problem arises from the training of artificial neural networks, the variables subject to optimization are not related by any geometrical constraints and the standard interpolation and restriction operators cannot be employed any longer. A heuristic, inspired by algebraic multigrid methods, is then proposed to construct the multilevel transfer operators. We test the new optimizer on an important application: the approximate solution of partial differential equations by means of artificial neural networks. The learning problem is formulated as a least squares problem, choosing the nonlinear residual of the equation as a loss function, whereas the multilevel method is employed as a training method. Numerical experiments show encouraging results related to the efficiency of the new multilevel optimization method compared to the corresponding one-level procedure in this context.
In this paper it is proposed to equip direct-search methods with a general procedure to minimize an objective function, possibly nonsmooth, without using derivatives and subject to constraints on the ...variables. One aims at considering constraints, most likely nonlinear or nonsmooth, for which the derivatives of the corresponding functions are also unavailable. The novelty of this contribution relies mostly on how relaxable constraints are handled. Such constraints, which can be relaxed during the course of the optimization, are taken care of by a merit function and, if necessary, by a restoration procedure. Constraints that are unrelaxable, when present, are treated by an extreme barrier approach. One is able to show that the resulting merit function direct-search algorithm exhibits global convergence properties for first-order stationary constraints. As in the progressive barrier method C. Audet and J. E. Dennis Jr.,SIAM J. Optim. , 20 (2009), pp. 445--472, we provide a mechanism to indicate the transfer of constraints from the relaxable set to the unrelaxable one.