Skip to content

optimizers⚓︎

Gradient acceleration.

AdaMax ⚓︎

Bases: Adam

AdaMax optimizer kingma2014

Adam ⚓︎

A class implementing the Adam optimizer for gradient-based optimization kingma2014.

The Adam update equation for the control x using gradient g, iteration t, and small constants ε is given by:

m_t = β1 * m_{t-1} + (1 - β1) * g

v_t = β2 * v_{t-1} + (1 - β2) * g^2

m_t_hat = m_t / (1 - β1^t)

v_t_hat = v_t / (1 - β2^t)

x_{t+1} = x_t - α * m_t_hat / (sqrt(v_t_hat) + ε)

Attributes:

Name Type Description
step_size float

The initial step size provided during initialization.

beta1 float

The exponential decay rate for the first moment estimates.

beta2 float

The exponential decay rate for the second moment estimates.

vel1 1-D array_like

First moment estimate.

vel2 1-D array_like

Second moment estimate.

eps float

Small constant to prevent division by zero.

_step_size float

Private attribute for temporarily modifying step size.

temp_vel1 1-D array_like

Temporary first moment estimate.

temp_vel2 1-D array_like

Temporary Second moment estimate.

Methods:

Name Description
apply_update

Apply an Adam update to the control parameter.

apply_backtracking

Apply backtracking by reducing step size temporarily.

restore_parameters

Restore the original step size.

__init__(step_size, beta1=0.9, beta2=0.999) ⚓︎

A class implementing the Adam optimizer for gradient-based optimization. The Adam update equation for the control x using gradient g, iteration t, and small constants ε is given by:

m_t = β1 * m_{t-1} + (1 - β1) * g

v_t = β2 * v_{t-1} + (1 - β2) * g^2

m_t_hat = m_t / (1 - β1^t)

v_t_hat = v_t / (1 - β2^t)

x_{t+1} = x_t - α * m_t_hat / (sqrt(v_t_hat) + ε)

Parameters:

Name Type Description Default
step_size float

The step size (learning rate) for the optimization.

required
beta1 float

The exponential decay rate for the first moment estimates (default is 0.9).

0.9
beta2 float

The exponential decay rate for the second moment estimates (default is 0.999).

0.999

apply_backtracking() ⚓︎

Apply backtracking by reducing step size temporarily.

apply_update(control, gradient, **kwargs) ⚓︎

Apply a gradient update to the control parameter.

Note

This is the steepest decent update: x_new = x_old - x_step.

Parameters:

Name Type Description Default
control array_like

The current value of the parameter being optimized.

required
gradient array_like

The gradient of the objective function with respect to the control parameter.

required
**kwargs dict

Additional keyword arguments, including 'iter' for the current iteration.

{}

Returns:

Type Description
new_control, temp_velocity: tuple

The new value of the control parameter after the update, and the current state step.

restore_parameters() ⚓︎

Restore the original step size.

GradientAscent ⚓︎

A class for performing gradient ascent optimization with momentum and backtracking. The gradient descent update equation with momentum is given by:

\[ \begin{align} v_t &= \beta * v_{t-1} + \alpha * gradient \\ x_t &= x_{t-1} - v_t \end{align} \]

Attributes:

Name Type Description
step_size float

The initial step size provided during initialization.

momentum float

The initial momentum factor provided during initialization.

velocity array_like

Current velocity of the optimization process.

temp_velocity array_like

Temporary velocity

_step_size float

Private attribute for temporarily modifying step size.

_momentum float

Private attribute for temporarily modifying momentum.

Methods:

Name Description
apply_update

Apply a gradient update to the control parameter.

apply_backtracking

Apply backtracking by reducing step size and momentum temporarily.

restore_parameters

Restore the original step size and momentum values.

__init__(step_size, momentum) ⚓︎

Parameters:

Name Type Description Default
step_size float

The step size (learning rate) for the gradient ascent.

required
momentum float

The momentum factor to apply during updates.

required

apply_backtracking() ⚓︎

Apply backtracking by reducing step size and momentum temporarily.

apply_smc_update(control, gradient, **kwargs) ⚓︎

Apply a gradient update to the control parameter.

Parameters:

Name Type Description Default
control array_like

The current value of the parameter being optimized.

required
gradient array_like

The gradient of the objective function with respect to the control parameter.

required
**kwargs dict

Additional keyword arguments.

{}

Returns:

Name Type Description
new_control ndarray

The new value of the control parameter after the update.

apply_update(control, gradient, **kwargs) ⚓︎

Apply a gradient update to the control parameter.

Note

This is the steepest decent update: x_new = x_old - x_step.

Parameters:

Name Type Description Default
control array_like

The current value of the parameter being optimized.

required
gradient array_like

The gradient of the objective function with respect to the control parameter.

required
**kwargs dict

Additional keyword arguments.

{}

Returns:

Type Description
new_control, temp_velocity: tuple

The new value of the control parameter after the update, and the current state step.

restore_parameters() ⚓︎

Restore the original step size and momentum value.

Steihaug ⚓︎

A class implementing the Steihaug conjugate-gradient trust region optimizer. This code is based on the minfx optimisation library, https://gna.org/projects/minfx

__init__(maxiter=1000000.0, epsilon=1e-08, delta_max=100000.0, delta0=1.0) ⚓︎

Page 75 from 'Numerical Optimization' by Jorge Nocedal and Stephen J. Wright, 1999, 2nd ed. The CG-Steihaug algorithm is:

  • epsilon > 0
  • p0 = 0, r0 = g, d0 = -r0
  • if ||r0|| < epsilon:
    • return p = p0
  • while 1:
    • if djT.B.dj <= 0:
      • Find tau such that p = pj + tau.dj minimises m(p) in (4.9) and satisfies ||p|| = delta
      • return p
    • aj = rjT.rj / djT.B.dj
    • pj+1 = pj + aj.dj
    • if ||pj+1|| >= delta:
      • Find tau such that p = pj + tau.dj satisfies ||p|| = delta
      • return p
    • rj+1 = rj + aj.B.dj
    • if ||rj+1|| < epsilon.||r0||:
      • return p = pj+1
    • bj+1 = rj+1T.rj+1 / rjT.rj
    • dj+1 = rj+1 + bj+1.dj

Parameters:

Name Type Description Default
maxiter float

Maximum number of iterations.

1000000.0
epsilon float

Tolerance for iterations.

1e-08
delta_max float

Maximum thrust region size.

100000.0
delta0 float

Initial thrust region size.

1.0

apply_backtracking() ⚓︎

Apply backtracking by reducing step size temporarily.

apply_update(xk, dfk, **kwargs) ⚓︎

Apply a Steihaug update to the control vector.

Parameters:

Name Type Description Default
xk array_like

The current value of the parameter being optimized.

required
dfk array_like

The gradient of the objective function with respect to the control parameter.

required
**kwargs dict

Additional keyword arguments, including the hessian of the objective function with respect to the control parameter.

{}

Returns:

Type Description
new_control, step: tuple

The new value of the control parameter after the update, and the current state step.

get_tau(pj, dj) ⚓︎

Function to find tau such that p = pj + tau.dj, and ||p|| = delta.

restore_parameters() ⚓︎

Restore the original step size.