Java 8 Neural Networks with CuDNN and Aparapi
These notebooks document the components related to optimization and training.
Description: Experimental. The idea behind this class is to track for each row the change in value for an objective function overthe course of a single training epoch. Rows which are observed to have a high delta are inferred to be "interesting"rows, subject to retention when re-sampling training data between epochs.
Description: The type Holdover stochastic gradient descent run.
Description: Abstract base class for a trainable wrapper that adds per-layer L1 and L2 normalization constants. It allows theimplementing class to choose the coefficients for each layer.
Description: This type handles the data selection part of stochastic gradient descent training. Between each epoch, a "reset"method is called to re-sample the training data and pass it to the inner Trainable implementation.
Description: The type Simple stochastic gradient descent run.
Description: An exact line search method which ignores the quantity of the derivative, using only sign. Signs are sufficient tofind and detect bracketing conditions. When the solution is bracketed, the next iteration always tests the midpoint.
Description: The type Bisection line search run.
Description: This exact line search method uses a linear interpolation of the derivative to find the extrema, where dx\/dy = 0.Bracketing conditions are established with logic that largely ignores derivatives, due to heuristic observations.
Description: The type Quadratic line search run.
Description: A very basic line search which uses a static rate, searching lower rates when iterations do not result inimprovement.
Description: The type Static rate run.
Description: The most basic type of orientation, which uses the raw function gradient.
Description: The type Gd run.
Description: An implementation of the Limited-Memory Broyden\u2013Fletcher\u2013Goldfarb\u2013Shanno algorithmhttps:\/\/en.m.wikipedia.org\/wiki\/Limited-memory_BFGS
Description: The type Lbfgs run.
Description: A simple momentum module which uses a cumulative decay algorithm to add a momentum term to any orientation strategy(if it yields a SimpleLineSearch cursor)
Description: The type Momentum run.
Description: Orthant-Wise Limited-memory Quasi-Newton optimization This is a modified L-BFGS algorithm which uses orthant trustregions to bound the cursor path during the line search phase of each iteration
Description: The type Owlqn run.
Description: Quadratic Quasi-Newton optimizationThis method hybridizes pure gradient descent with higher-order quasinewton implementations such as L-BFGS. Duringeach iteration, a quadratic curve is interpolated which aligns with the gradient’s direction prediction andintersects with the quasinewton’s optimal point prediction. A simple parameteric quadratic function blends both innercursors into a simple nonlinear path which should combine the stability of both methods.
Description: The type Qqn run.
Description: An recursive optimization strategy which projects the current space into a reduced-dimensional subspace for asub-optimization batch run.
Description: This trust region uses recent position history to define an ellipsoid volume for the n+1 line search
Description: The type Trust sphere run.
Description: This constrains a weight vector based on a single hyperplane which prevents immediate increases to the L1 magnitude.(Note: This region can allow effective L1 increases, if at least one weight changes sign; this allows for our entiresearch space to be reachable.)
Description: The type Linear sum constraint run.
Description: A Single-orthant trust region. These are used in OWL-QN to proven effect in training sparse models where an exactvalue of zero for many weights is desired.
Description: The type Single orthant trust region run.