optimizers⚓︎

Gradient acceleration.

`AdaMax` ⚓︎

Bases: Adam

AdaMax optimizer kingma2014

`Adam` ⚓︎

A class implementing the Adam optimizer for gradient-based optimization kingma2014.

The Adam update equation for the control x using gradient g, iteration t, and small constants ε is given by:

m_t = β1 * m_{t-1} + (1 - β1) * g

v_t = β2 * v_{t-1} + (1 - β2) * g^2

m_t_hat = m_t / (1 - β1^t)

v_t_hat = v_t / (1 - β2^t)

x_{t+1} = x_t - α * m_t_hat / (sqrt(v_t_hat) + ε)

Attributes:

Name	Type	Description
`step_size`	`float`	The initial step size provided during initialization.
`beta1`	`float`	The exponential decay rate for the first moment estimates.
`beta2`	`float`	The exponential decay rate for the second moment estimates.
`vel1`	`1-D array_like`	First moment estimate.
`vel2`	`1-D array_like`	Second moment estimate.
`eps`	`float`	Small constant to prevent division by zero.
`_step_size`	`float`	Private attribute for temporarily modifying step size.
`temp_vel1`	`1-D array_like`	Temporary first moment estimate.
`temp_vel2`	`1-D array_like`	Temporary Second moment estimate.

Methods:

Name	Description
`apply_update`	Apply an Adam update to the control parameter.
`apply_backtracking`	Apply backtracking by reducing step size temporarily.
`restore_parameters`	Restore the original step size.

`init(step_size, beta1=0.9, beta2=0.999)` ⚓︎

A class implementing the Adam optimizer for gradient-based optimization. The Adam update equation for the control x using gradient g, iteration t, and small constants ε is given by:

m_t = β1 * m_{t-1} + (1 - β1) * g

v_t = β2 * v_{t-1} + (1 - β2) * g^2

m_t_hat = m_t / (1 - β1^t)

v_t_hat = v_t / (1 - β2^t)

x_{t+1} = x_t - α * m_t_hat / (sqrt(v_t_hat) + ε)

Parameters:

Name	Type	Description	Default
`step_size`	`float`	The step size (learning rate) for the optimization.	required
`beta1`	`float`	The exponential decay rate for the first moment estimates (default is 0.9).	`0.9`
`beta2`	`float`	The exponential decay rate for the second moment estimates (default is 0.999).	`0.999`

`apply_backtracking()` ⚓︎

Apply backtracking by reducing step size temporarily.

`apply_update(control, gradient, **kwargs)` ⚓︎

Apply a gradient update to the control parameter.

Note

This is the steepest descent update: x_new = x_old - x_step.

Parameters:

Name	Type	Description	Default
`control`	`array_like`	The current value of the parameter being optimized.	required
`gradient`	`array_like`	The gradient of the objective function with respect to the control parameter.	required
`**kwargs`	`dict`	Additional keyword arguments, including 'iter' for the current iteration.	`{}`

Returns:

Type	Description
`new_control, temp_velocity: tuple`	The new value of the control parameter after the update, and the current state step.

`restore_parameters()` ⚓︎

Restore the original step size.

`GradientDescent` ⚓︎

A class for performing gradient descent optimization with momentum and backtracking. The gradient descent update equation with momentum is given by:

\[ \begin{align} v_t &= \beta * v_{t-1} + \alpha * gradient \\ x_t &= x_{t-1} - v_t \end{align} \]

Attributes:

Name	Type	Description
`step_size`	`float`	The initial step size provided during initialization.
`momentum`	`float`	The initial momentum factor provided during initialization.
`velocity`	`array_like`	Current velocity of the optimization process.
`temp_velocity`	`array_like`	Temporary velocity
`_step_size`	`float`	Private attribute for temporarily modifying step size.
`_momentum`	`float`	Private attribute for temporarily modifying momentum.

Methods:

Name	Description
`apply_update`	Apply a gradient update to the control parameter.
`apply_backtracking`	Apply backtracking by reducing step size and momentum temporarily.
`restore_parameters`	Restore the original step size and momentum values.

`init(step_size, momentum)` ⚓︎

Parameters:

Name	Type	Description	Default
`step_size`	`float`	The step size (learning rate) for the gradient descent.	required
`momentum`	`float`	The momentum factor to apply during updates.	required

`apply_backtracking(shrink=0.5)` ⚓︎

Apply backtracking by reducing step size and momentum temporarily.

`apply_smc_update(control, gradient, **kwargs)` ⚓︎

Apply a gradient update to the control parameter.

Parameters:

Name	Type	Description	Default
`control`	`array_like`	The current value of the parameter being optimized.	required
`gradient`	`array_like`	The gradient of the objective function with respect to the control parameter.	required
`**kwargs`	`dict`	Additional keyword arguments.	`{}`

Returns:

Name	Type	Description
`new_control`	`ndarray`	The new value of the control parameter after the update.

`apply_update(control, gradient, **kwargs)` ⚓︎

Apply a gradient update to the control parameter.

Note

This is the steepest descent update: x_new = x_old - x_step.

Parameters:

Name	Type	Description	Default
`control`	`array_like`	The current value of the parameter being optimized.	required
`gradient`	`array_like`	The gradient of the objective function with respect to the control parameter.	required
`**kwargs`	`dict`	Additional keyword arguments.	`{}`

Returns:

Type	Description
`new_control, temp_velocity: tuple`	The new value of the control parameter after the update, and the current state step.

`restore_parameters()` ⚓︎

Restore the original step size and momentum value.

`Steihaug` ⚓︎

A class implementing the Steihaug conjugate-gradient trust region optimizer. This code is based on the minfx optimisation library, https://gna.org/projects/minfx

`init(maxiter=1000000.0, epsilon=1e-08, delta_max=100000.0, delta0=1.0)` ⚓︎

Page 75 from 'Numerical Optimization' by Jorge Nocedal and Stephen J. Wright, 1999, 2^nd ed. The CG-Steihaug algorithm is:

epsilon > 0
p0 = 0, r0 = g, d0 = -r0
if ||r0|| < epsilon:
- return p = p0
while 1:
- if djT.B.dj <= 0:
  - Find tau such that p = pj + tau.dj minimises m(p) in (4.9) and satisfies ||p|| = delta
  - return p
- aj = rjT.rj / djT.B.dj
- pj+1 = pj + aj.dj
- if ||pj+1|| >= delta:
  - Find tau such that p = pj + tau.dj satisfies ||p|| = delta
  - return p
- rj+1 = rj + aj.B.dj
- if ||rj+1|| < epsilon.||r0||:
  - return p = pj+1
- bj+1 = rj+1T.rj+1 / rjT.rj
- dj+1 = rj+1 + bj+1.dj

Parameters:

Name	Type	Description	Default
`maxiter`	`float`	Maximum number of iterations.	`1000000.0`
`epsilon`	`float`	Tolerance for iterations.	`1e-08`
`delta_max`	`float`	Maximum thrust region size.	`100000.0`
`delta0`	`float`	Initial thrust region size.	`1.0`

`apply_backtracking()` ⚓︎

Apply backtracking by reducing step size temporarily.

`apply_update(xk, dfk, **kwargs)` ⚓︎

Apply a Steihaug update to the control vector.

Parameters:

Name	Type	Description	Default
`xk`	`array_like`	The current value of the parameter being optimized.	required
`dfk`	`array_like`	The gradient of the objective function with respect to the control parameter.	required
`**kwargs`	`dict`	Additional keyword arguments, including the hessian of the objective function with respect to the control parameter.	`{}`

Returns:

Type	Description
`new_control, step: tuple`	The new value of the control parameter after the update, and the current state step.

`get_tau(pj, dj)` ⚓︎

Function to find tau such that p = pj + tau.dj, and ||p|| = delta.

`restore_parameters()` ⚓︎

Restore the original step size.

optimizers⚓︎

AdaMax ⚓︎

Adam ⚓︎

__init__(step_size, beta1=0.9, beta2=0.999) ⚓︎

apply_backtracking() ⚓︎

apply_update(control, gradient, **kwargs) ⚓︎

restore_parameters() ⚓︎

GradientDescent ⚓︎

__init__(step_size, momentum) ⚓︎

apply_backtracking(shrink=0.5) ⚓︎

apply_smc_update(control, gradient, **kwargs) ⚓︎

apply_update(control, gradient, **kwargs) ⚓︎

restore_parameters() ⚓︎

Steihaug ⚓︎

__init__(maxiter=1000000.0, epsilon=1e-08, delta_max=100000.0, delta0=1.0) ⚓︎

apply_backtracking() ⚓︎

apply_update(xk, dfk, **kwargs) ⚓︎

get_tau(pj, dj) ⚓︎

restore_parameters() ⚓︎

`AdaMax` ⚓︎

`Adam` ⚓︎

`init(step_size, beta1=0.9, beta2=0.999)` ⚓︎

`apply_backtracking()` ⚓︎

`apply_update(control, gradient, **kwargs)` ⚓︎

`restore_parameters()` ⚓︎

`GradientDescent` ⚓︎

`init(step_size, momentum)` ⚓︎

`apply_backtracking(shrink=0.5)` ⚓︎

`apply_smc_update(control, gradient, **kwargs)` ⚓︎

`apply_update(control, gradient, **kwargs)` ⚓︎

`restore_parameters()` ⚓︎

`Steihaug` ⚓︎

`init(maxiter=1000000.0, epsilon=1e-08, delta_max=100000.0, delta0=1.0)` ⚓︎

`apply_backtracking()` ⚓︎

`apply_update(xk, dfk, **kwargs)` ⚓︎

`get_tau(pj, dj)` ⚓︎

`restore_parameters()` ⚓︎