optimizers⚓︎
Gradient acceleration.
AdaMax
⚓︎
Adam
⚓︎
A class implementing the Adam optimizer for gradient-based optimization kingma2014
.
The Adam update equation for the control x using gradient g, iteration t, and small constants ε is given by:
m_t = β1 * m_{t-1} + (1 - β1) * g
v_t = β2 * v_{t-1} + (1 - β2) * g^2
m_t_hat = m_t / (1 - β1^t)
v_t_hat = v_t / (1 - β2^t)
x_{t+1} = x_t - α * m_t_hat / (sqrt(v_t_hat) + ε)
Attributes:
Name | Type | Description |
---|---|---|
step_size |
float
|
The initial step size provided during initialization. |
beta1 |
float
|
The exponential decay rate for the first moment estimates. |
beta2 |
float
|
The exponential decay rate for the second moment estimates. |
vel1 |
1-D array_like
|
First moment estimate. |
vel2 |
1-D array_like
|
Second moment estimate. |
eps |
float
|
Small constant to prevent division by zero. |
_step_size |
float
|
Private attribute for temporarily modifying step size. |
temp_vel1 |
1-D array_like
|
Temporary first moment estimate. |
temp_vel2 |
1-D array_like
|
Temporary Second moment estimate. |
Methods:
Name | Description |
---|---|
apply_update |
Apply an Adam update to the control parameter. |
apply_backtracking |
Apply backtracking by reducing step size temporarily. |
restore_parameters |
Restore the original step size. |
__init__(step_size, beta1=0.9, beta2=0.999)
⚓︎
A class implementing the Adam optimizer for gradient-based optimization. The Adam update equation for the control x using gradient g, iteration t, and small constants ε is given by:
m_t = β1 * m_{t-1} + (1 - β1) * g
v_t = β2 * v_{t-1} + (1 - β2) * g^2
m_t_hat = m_t / (1 - β1^t)
v_t_hat = v_t / (1 - β2^t)
x_{t+1} = x_t - α * m_t_hat / (sqrt(v_t_hat) + ε)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
step_size
|
float
|
The step size (learning rate) for the optimization. |
required |
beta1
|
float
|
The exponential decay rate for the first moment estimates (default is 0.9). |
0.9
|
beta2
|
float
|
The exponential decay rate for the second moment estimates (default is 0.999). |
0.999
|
apply_backtracking()
⚓︎
Apply backtracking by reducing step size temporarily.
apply_update(control, gradient, **kwargs)
⚓︎
Apply a gradient update to the control parameter.
Note
This is the steepest decent update: x_new = x_old - x_step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
control
|
array_like
|
The current value of the parameter being optimized. |
required |
gradient
|
array_like
|
The gradient of the objective function with respect to the control parameter. |
required |
**kwargs
|
dict
|
Additional keyword arguments, including 'iter' for the current iteration. |
{}
|
Returns:
Type | Description |
---|---|
new_control, temp_velocity: tuple
|
The new value of the control parameter after the update, and the current state step. |
restore_parameters()
⚓︎
Restore the original step size.
GradientAscent
⚓︎
A class for performing gradient ascent optimization with momentum and backtracking. The gradient descent update equation with momentum is given by:
Attributes:
Name | Type | Description |
---|---|---|
step_size |
float
|
The initial step size provided during initialization. |
momentum |
float
|
The initial momentum factor provided during initialization. |
velocity |
array_like
|
Current velocity of the optimization process. |
temp_velocity |
array_like
|
Temporary velocity |
_step_size |
float
|
Private attribute for temporarily modifying step size. |
_momentum |
float
|
Private attribute for temporarily modifying momentum. |
Methods:
Name | Description |
---|---|
apply_update |
Apply a gradient update to the control parameter. |
apply_backtracking |
Apply backtracking by reducing step size and momentum temporarily. |
restore_parameters |
Restore the original step size and momentum values. |
__init__(step_size, momentum)
⚓︎
apply_backtracking()
⚓︎
Apply backtracking by reducing step size and momentum temporarily.
apply_smc_update(control, gradient, **kwargs)
⚓︎
Apply a gradient update to the control parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
control
|
array_like
|
The current value of the parameter being optimized. |
required |
gradient
|
array_like
|
The gradient of the objective function with respect to the control parameter. |
required |
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
new_control |
ndarray
|
The new value of the control parameter after the update. |
apply_update(control, gradient, **kwargs)
⚓︎
Apply a gradient update to the control parameter.
Note
This is the steepest decent update: x_new = x_old - x_step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
control
|
array_like
|
The current value of the parameter being optimized. |
required |
gradient
|
array_like
|
The gradient of the objective function with respect to the control parameter. |
required |
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
new_control, temp_velocity: tuple
|
The new value of the control parameter after the update, and the current state step. |
restore_parameters()
⚓︎
Restore the original step size and momentum value.
Steihaug
⚓︎
A class implementing the Steihaug conjugate-gradient trust region optimizer. This code is based on the minfx optimisation library, https://gna.org/projects/minfx
__init__(maxiter=1000000.0, epsilon=1e-08, delta_max=100000.0, delta0=1.0)
⚓︎
Page 75 from 'Numerical Optimization' by Jorge Nocedal and Stephen J. Wright, 1999, 2nd ed. The CG-Steihaug algorithm is:
- epsilon > 0
- p0 = 0, r0 = g, d0 = -r0
- if ||r0|| < epsilon:
- return p = p0
- while 1:
- if djT.B.dj <= 0:
- Find tau such that p = pj + tau.dj minimises m(p) in (4.9) and satisfies ||p|| = delta
- return p
- aj = rjT.rj / djT.B.dj
- pj+1 = pj + aj.dj
- if ||pj+1|| >= delta:
- Find tau such that p = pj + tau.dj satisfies ||p|| = delta
- return p
- rj+1 = rj + aj.B.dj
- if ||rj+1|| < epsilon.||r0||:
- return p = pj+1
- bj+1 = rj+1T.rj+1 / rjT.rj
- dj+1 = rj+1 + bj+1.dj
- if djT.B.dj <= 0:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
maxiter
|
float
|
Maximum number of iterations. |
1000000.0
|
epsilon
|
float
|
Tolerance for iterations. |
1e-08
|
delta_max
|
float
|
Maximum thrust region size. |
100000.0
|
delta0
|
float
|
Initial thrust region size. |
1.0
|
apply_backtracking()
⚓︎
Apply backtracking by reducing step size temporarily.
apply_update(xk, dfk, **kwargs)
⚓︎
Apply a Steihaug update to the control vector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
xk
|
array_like
|
The current value of the parameter being optimized. |
required |
dfk
|
array_like
|
The gradient of the objective function with respect to the control parameter. |
required |
**kwargs
|
dict
|
Additional keyword arguments, including the hessian of the objective function with respect to the control parameter. |
{}
|
Returns:
Type | Description |
---|---|
new_control, step: tuple
|
The new value of the control parameter after the update, and the current state step. |
get_tau(pj, dj)
⚓︎
Function to find tau such that p = pj + tau.dj, and ||p|| = delta.
restore_parameters()
⚓︎
Restore the original step size.