Sunday, September 23, 2018

Backpropagation


Backpropagation

UAT

The Universal Approximation Theorem (UAT) states that the simplest version of the structure of weights & non-linear functions used in deep learning. The UAT states roughly that this equation can be solved, to an approximation:
logit(numbers[] * input_weights[][]) * hidden_weights[][]) ~= other_numbers[]

Backpropagation

The UAT does not tell how to do this, only that it is possible. Later, the backpropagation (BP) algorithm was created to realize the possibility of the UAT. BP, as created, is only intended to solve one equation of N inputs and M unknowns. Backprogagation uses "hill-climing", which assumes that the solution of an equation is at the top of a hill. Wherever it starts, if it climbs uphill, it will eventually hit the top.

In a typical Deep Learning application, we take thousands of input samples which are similar, and turn them into equations of the above form. For a simple image classifier, this would be typical. The right-hand size corresponds to whether the image is a picture of a cat, a building or a car.
logit(image_pixels[] * input_weight_matrix[]) * hidden_weight_matrix[] ~= [1,0,0]
Wait a minute! In the math of equation-solving, each of these equations is a separate subspace, to be solved independently.

How did we get from approximating an equation (UAT) to approximating thousands of equations? The key idea of Deep Learning is to pretend that all of these equations are in one subspace, and that there is a mythical equation that is the centroid of all of these equations.
Deep Learning takes many (thousand) sample equations which are mathematically different subspaces, and essentially averages the result of the application of BP to each equation. To put it another way, each equation tugs the BP algorithm toward itself.

This seems like a misuse of BP- a misuse that has proven very very fruitful.

References

See the wikipedia pages for more info on the UAT, Backpropagation, and Gradient Descent:

https://en.wikipedia.org/wiki/Universal_approximation_theorem
https://en.wikipedia.org/wiki/Backpropagation
https://en.wikipedia.org/wiki/Gradient_descent








No comments:

Post a Comment