General Unconstrained Minimization¶
Modeling¶
FirstOrderFunction
¶
-
class
FirstOrderFunction
¶ Instances of
FirstOrderFunction
implement the evaluation of a function and its gradient.class FirstOrderFunction { public: virtual ~FirstOrderFunction() {} virtual bool Evaluate(const double* const parameters, double* cost, double* gradient) const = 0; virtual int NumParameters() const = 0; };
-
bool
FirstOrderFunction::
Evaluate
(const double *parameters, double *cost, double *gradient) const¶ Evaluate the cost/value of the function. If
gradient
is notNULL
then evaluate the gradient too. If evaluation is successful return,true
else returnfalse
.cost
guaranteed to be neverNULL
,gradient
can beNULL
.
-
int
FirstOrderFunction::
NumParameters
() const¶ Number of parameters in the domain of the function.
GradientProblem
¶
-
class
GradientProblem
¶
class GradientProblem {
public:
explicit GradientProblem(FirstOrderFunction* function);
GradientProblem(FirstOrderFunction* function,
LocalParameterization* parameterization);
int NumParameters() const;
int NumLocalParameters() const;
bool Evaluate(const double* parameters, double* cost, double* gradient) const;
bool Plus(const double* x, const double* delta, double* x_plus_delta) const;
};
Instances of GradientProblem
represent general non-linear
optimization problems that must be solved using just the value of the
objective function and its gradient. Unlike the Problem
class, which can only be used to model non-linear least squares
problems, instances of GradientProblem
not restricted in the
form of the objective function.
Structurally GradientProblem
is a composition of a
FirstOrderFunction
and optionally a
LocalParameterization
.
The FirstOrderFunction
is responsible for evaluating the cost
and gradient of the objective function.
The LocalParameterization
is responsible for going back and
forth between the ambient space and the local tangent space. When a
LocalParameterization
is not provided, then the tangent space
is assumed to coincide with the ambient Euclidean space that the
gradient vector lives in.
The constructor takes ownership of the FirstOrderFunction
and
LocalParamterization
objects passed to it.
-
void
Solve
(const GradientProblemSolver::Options &options, const GradientProblem &problem, double *parameters, GradientProblemSolver::Summary *summary)¶ Solve the given
GradientProblem
using the values inparameters
as the initial guess of the solution.
Solving¶
GradientProblemSolver::Options
¶
-
class
GradientProblemSolver::
Options
¶ GradientProblemSolver::Options
controls the overall behavior of the solver. We list the various settings and their default values below.
-
bool
GradientProblemSolver::Options::
IsValid
(string *error) const¶ Validate the values in the options struct and returns true on success. If there is a problem, the method returns false with
error
containing a textual description of the cause.
-
LineSearchDirectionType
GradientProblemSolver::Options::
line_search_direction_type
¶ Default:
LBFGS
Choices are
STEEPEST_DESCENT
,NONLINEAR_CONJUGATE_GRADIENT
,BFGS
andLBFGS
.
-
LineSearchType
GradientProblemSolver::Options::
line_search_type
¶ Default:
WOLFE
Choices are
ARMIJO
andWOLFE
(strong Wolfe conditions). Note that in order for the assumptions underlying theBFGS
andLBFGS
line search direction algorithms to be guaranteed to be satisifed, theWOLFE
line search should be used.
-
NonlinearConjugateGradientType
GradientProblemSolver::Options::
nonlinear_conjugate_gradient_type
¶ Default:
FLETCHER_REEVES
Choices are
FLETCHER_REEVES
,POLAK_RIBIERE
andHESTENES_STIEFEL
.
-
int
GradientProblemSolver::Options::
max_lbfs_rank
¶ Default: 20
The L-BFGS hessian approximation is a low rank approximation to the inverse of the Hessian matrix. The rank of the approximation determines (linearly) the space and time complexity of using the approximation. Higher the rank, the better is the quality of the approximation. The increase in quality is however is bounded for a number of reasons.
- The method only uses secant information and not actual derivatives.
- The Hessian approximation is constrained to be positive definite.
So increasing this rank to a large number will cost time and space complexity without the corresponding increase in solution quality. There are no hard and fast rules for choosing the maximum rank. The best choice usually requires some problem specific experimentation.
-
bool
GradientProblemSolver::Options::
use_approximate_eigenvalue_bfgs_scaling
¶ Default:
false
As part of the
BFGS
update step /LBFGS
right-multiply step, the initial inverse Hessian approximation is taken to be the Identity. However, [Oren] showed that using instead \(I * \gamma\), where \(\gamma\) is a scalar chosen to approximate an eigenvalue of the true inverse Hessian can result in improved convergence in a wide variety of cases. Settinguse_approximate_eigenvalue_bfgs_scaling
to true enables this scaling inBFGS
(before first iteration) andLBFGS
(at each iteration).Precisely, approximate eigenvalue scaling equates to
\[\gamma = \frac{y_k' s_k}{y_k' y_k}\]With:
\[y_k = \nabla f_{k+1} - \nabla f_k\]\[s_k = x_{k+1} - x_k\]Where \(f()\) is the line search objective and \(x\) the vector of parameter values [NocedalWright].
It is important to note that approximate eigenvalue scaling does not always improve convergence, and that it can in fact significantly degrade performance for certain classes of problem, which is why it is disabled by default. In particular it can degrade performance when the sensitivity of the problem to different parameters varies significantly, as in this case a single scalar factor fails to capture this variation and detrimentally downscales parts of the Jacobian approximation which correspond to low-sensitivity parameters. It can also reduce the robustness of the solution to errors in the Jacobians.
-
LineSearchIterpolationType
GradientProblemSolver::Options::
line_search_interpolation_type
¶ Default:
CUBIC
Degree of the polynomial used to approximate the objective function. Valid values are
BISECTION
,QUADRATIC
andCUBIC
.
-
double
GradientProblemSolver::Options::
min_line_search_step_size
¶ The line search terminates if:
\[\begin{split}\|\Delta x_k\|_\infty < \text{min_line_search_step_size}\end{split}\]where \(\|\cdot\|_\infty\) refers to the max norm, and \(\Delta x_k\) is the step change in the parameter values at the \(k\)-th iteration.
-
double
GradientProblemSolver::Options::
line_search_sufficient_function_decrease
¶ Default:
1e-4
Solving the line search problem exactly is computationally prohibitive. Fortunately, line search based optimization algorithms can still guarantee convergence if instead of an exact solution, the line search algorithm returns a solution which decreases the value of the objective function sufficiently. More precisely, we are looking for a step size s.t.
\[f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]\]This condition is known as the Armijo condition.
-
double
GradientProblemSolver::Options::
max_line_search_step_contraction
¶ Default:
1e-3
In each iteration of the line search,
\[\text{new_step_size} \geq \text{max_line_search_step_contraction} * \text{step_size}\]Note that by definition, for contraction:
\[\begin{split}0 < \text{max_step_contraction} < \text{min_step_contraction} < 1\end{split}\]
-
double
GradientProblemSolver::Options::
min_line_search_step_contraction
¶ Default:
0.6
In each iteration of the line search,
\[\text{new_step_size} \leq \text{min_line_search_step_contraction} * \text{step_size}\]Note that by definition, for contraction:
\[\begin{split}0 < \text{max_step_contraction} < \text{min_step_contraction} < 1\end{split}\]
-
int
GradientProblemSolver::Options::
max_num_line_search_step_size_iterations
¶ Default:
20
Maximum number of trial step size iterations during each line search, if a step size satisfying the search conditions cannot be found within this number of trials, the line search will stop.
As this is an ‘artificial’ constraint (one imposed by the user, not the underlying math), if
WOLFE
line search is being used, and points satisfying the Armijo sufficient (function) decrease condition have been found during the current search (in \(\leq\)max_num_line_search_step_size_iterations
). Then, the step size with the lowest function value which satisfies the Armijo condition will be returned as the new valid step, even though it does not satisfy the strong Wolfe conditions. This behaviour protects against early termination of the optimizer at a sub-optimal point.
-
int
GradientProblemSolver::Options::
max_num_line_search_direction_restarts
¶ Default:
5
Maximum number of restarts of the line search direction algorithm before terminating the optimization. Restarts of the line search direction algorithm occur when the current algorithm fails to produce a new descent direction. This typically indicates a numerical failure, or a breakdown in the validity of the approximations used.
-
double
GradientProblemSolver::Options::
line_search_sufficient_curvature_decrease
¶ Default:
0.9
The strong Wolfe conditions consist of the Armijo sufficient decrease condition, and an additional requirement that the step size be chosen s.t. the magnitude (‘strong’ Wolfe conditions) of the gradient along the search direction decreases sufficiently. Precisely, this second condition is that we seek a step size s.t.
\[\|f'(\text{step_size})\| \leq \text{sufficient_curvature_decrease} * \|f'(0)\|\]Where \(f()\) is the line search objective and \(f'()\) is the derivative of \(f\) with respect to the step size: \(\frac{d f}{d~\text{step size}}\).
-
double
GradientProblemSolver::Options::
max_line_search_step_expansion
¶ Default:
10.0
During the bracketing phase of a Wolfe line search, the step size is increased until either a point satisfying the Wolfe conditions is found, or an upper bound for a bracket containing a point satisfying the conditions is found. Precisely, at each iteration of the expansion:
\[\text{new_step_size} \leq \text{max_step_expansion} * \text{step_size}\]By definition for expansion
\[\begin{split}\text{max_step_expansion} > 1.0\end{split}\]
-
int
GradientProblemSolver::Options::
max_num_iterations
¶ Default:
50
Maximum number of iterations for which the solver should run.
-
double
GradientProblemSolver::Options::
max_solver_time_in_seconds
¶ Default:
1e6
Maximum amount of time for which the solver should run.
-
double
GradientProblemSolver::Options::
function_tolerance
¶ Default:
1e-6
Solver terminates if
\[\frac{|\Delta \text{cost}|}{\text{cost}} \leq \text{function_tolerance}\]where, \(\Delta \text{cost}\) is the change in objective function value (up or down) in the current iteration of the line search.
-
double
GradientProblemSolver::Options::
gradient_tolerance
¶ Default:
1e-10
Solver terminates if
\[\|x - \Pi \boxplus(x, -g(x))\|_\infty \leq \text{gradient_tolerance}\]where \(\|\cdot\|_\infty\) refers to the max norm, \(\Pi\) is projection onto the bounds constraints and \(\boxplus\) is Plus operation for the overall local parameterization associated with the parameter vector.
-
double
GradientProblemSolver::Options::
parameter_tolerance
¶ Default:
1e-8
Solver terminates if
\[\|\Delta x\| \leq (\|x\| + \text{parameter_tolerance}) * \text{parameter_tolerance}\]where \(\Delta x\) is the step computed by the linear solver in the current iteration of the line search.
-
LoggingType
GradientProblemSolver::Options::
logging_type
¶ Default:
PER_MINIMIZER_ITERATION
-
bool
GradientProblemSolver::Options::
minimizer_progress_to_stdout
¶ Default:
false
By default the
Minimizer
progress is logged toSTDERR
depending on thevlog
level. If this flag is set to true, andGradientProblemSolver::Options::logging_type
is notSILENT
, the logging output is sent toSTDOUT
.The progress display looks like
0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e: 0 it: 2.98e-02 tt: 8.50e-02 1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e: 1 it: 4.54e-02 tt: 1.31e-01 2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e: 1 it: 4.96e-02 tt: 1.81e-01
Here
f
is the value of the objective function.d
is the change in the value of the objective function if the step computed in this iteration is accepted.g
is the max norm of the gradient.h
is the change in the parameter vector.s
is the optimal step length computed by the line search.it
is the time take by the current iteration.tt
is the total time taken by the minimizer.
-
vector<IterationCallback>
GradientProblemSolver::Options::
callbacks
¶ Callbacks that are executed at the end of each iteration of the
Minimizer
. They are executed in the order that they are specified in this vector. See the documentation forIterationCallback
for more details.The solver does NOT take ownership of these pointers.
GradientProblemSolver::Summary
¶
-
class
GradientProblemSolver::
Summary
¶ Summary of the various stages of the solver after termination.
-
string
GradientProblemSolver::Summary::
BriefReport
() const¶ A brief one line description of the state of the solver after termination.
-
string
GradientProblemSolver::Summary::
FullReport
() const¶ A full multiline description of the state of the solver after termination.
-
bool
GradientProblemSolver::Summary::
IsSolutionUsable
() const¶ Whether the solution returned by the optimization algorithm can be relied on to be numerically sane. This will be the case if GradientProblemSolver::Summary:termination_type is set to CONVERGENCE, USER_SUCCESS or NO_CONVERGENCE, i.e., either the solver converged by meeting one of the convergence tolerances or because the user indicated that it had converged or it ran to the maximum number of iterations or time.
-
TerminationType
GradientProblemSolver::Summary::
termination_type
¶ The cause of the minimizer terminating.
-
string
GradientProblemSolver::Summary::
message
¶ Reason why the solver terminated.
-
double
GradientProblemSolver::Summary::
initial_cost
¶ Cost of the problem (value of the objective function) before the optimization.
-
double
GradientProblemSolver::Summary::
final_cost
¶ Cost of the problem (value of the objective function) after the optimization.
-
vector<IterationSummary>
GradientProblemSolver::Summary::
iterations
¶ IterationSummary
for each minimizer iteration in order.
-
double
GradientProblemSolver::Summary::
total_time_in_seconds
¶ Time (in seconds) spent in the solver.
-
double
GradientProblemSolver::Summary::
cost_evaluation_time_in_seconds
¶ Time (in seconds) spent evaluating the cost vector.
-
double
GradientProblemSolver::Summary::
gradient_evaluation_time_in_seconds
¶ Time (in seconds) spent evaluating the gradient vector.
-
int
GradientProblemSolver::Summary::
num_parameters
¶ Number of parameters in the problem.
-
int
GradientProblemSolver::Summary::
num_local_parameters
¶ Dimension of the tangent space of the problem. This is different from
GradientProblemSolver::Summary::num_parameters
if aLocalParameterization
object is used.
-
LineSearchDirectionType
GradientProblemSolver::Summary::
line_search_direction_type
¶ Type of line search direction used.
-
LineSearchType
GradientProblemSolver::Summary::
line_search_type
¶ Type of the line search algorithm used.
-
LineSearchInterpolationType
GradientProblemSolver::Summary::
line_search_interpolation_type
¶ When performing line search, the degree of the polynomial used to approximate the objective function.
-
NonlinearConjugateGradientType
GradientProblemSolver::Summary::
nonlinear_conjugate_gradient_type
¶ If the line search direction is NONLINEAR_CONJUGATE_GRADIENT, then this indicates the particular variant of non-linear conjugate gradient used.
-
int
GradientProblemSolver::Summary::
max_lbfgs_rank
¶ If the type of the line search direction is LBFGS, then this indicates the rank of the Hessian approximation.