Google Answers: Shape of error function in neural networks (or others optimizations)

View Question

Q: Shape of error function in neural networks (or others optimizations) ( No Answer, 1 Comment )

Question

Subject: Shape of error function in neural networks (or others optimizations)
Category: Science > Math
Asked by: jccq-ga
List Price: $25.00

Posted: 13 Mar 2003 09:46 PST
Expires: 12 Apr 2003 10:46 PDT
Question ID: 175677

Most of the optimization performed in engineering follows the
"gradient" of the error function. This is a function of the parameters
involved. I would like to know if there exist a paper or something
treating
the properties (as shape for example) of the error functions in a
variety of optimization problems (or under a theoretical point of
view).
My main interest would be considerations about how the shape changes
given more layers in a neural network, how do "local minima" form
according to the topology.
For example a single percepron has a hyperparabolic error function
which means it hasonly one minimum, but when the're cascaded (MLP)
problems arise. I do not have a clear understanding how this happens
(although it ovviously "makes sense") i think it has to do with a
"variable base approximation" theory.. you're trying to project
something an a base.. that is actually moving.

Request for Question Clarification by mathtalk-ga on 14 Mar 2003 15:12 PST

Actually there are some well-known techniques in numerical analysis
(engineering, math or statistics) which supplement the "gradient" with
other "search directions".

I'm not sure how you might expect these methods to be presented in
relationship to "neural networks".  The subject line adds "or others
optimizations", which suggests to me that a general analysis or
discussion might be of interest.  Please let me know if discussion and
references for some of these general optimization techniques would be
of interest to you.

regard, mathtalk-ga

Clarification of Question by jccq-ga on 14 Mar 2003 16:52 PST

Hi there, yes i know about the existence of conjugate gradients,
higher order methods or tricks like tabo search to speed up the
convergence.. but What i am looking for is to understand the "shape"
of the error function really in relationship to the topology of a
neural network.

I said "or other tecniques" becouse the error gradient with respect to
each parameter of an optimization is clearly not something that only
applies to NNs (GA dont work on gradients.. but the might have in  a
sense.. some stocastic one) but lets say i limit the scope of the
interest here to neaural networks.

How does the error function "complicates" itself as layers are added..
 is the kind of nonlinearity used important in determining the
appearance of long large difficult to handle platou (Excuse the
spelling here). I mean if the error function has frequent and deep
holes its very difficult that one would reach a global minimum. Of
course talking about the shape of a function of N (where N is  usually
rather large) must be difficult.

I suspect the shape of the error function is also tied to the
dimensionality of the data and of the network itself. say the network
has a large number of units, compared to the data it has to learn,
well its going to overfit, by learning all the data by heart. While
this is undesiderable, this means that the net will reach a global
minimimum.. actually one of the MANY MANY ones.. are equivalent to
each other. On the other hand if less neurons to adapts are available,
the error function will have less minumums and probably a "smoother"
error surface... the extreme case would be only 1 unit.. a perceptron
so the error surface would be just a hypterparabola.

To be completely honest, this might be a multipart question with the
second question being: is the first one well posed? :)

Answer

There is no answer at this time.

Comments

Subject: Re: Shape of error function in neural networks (or others optimizations)
From: babior-ga on 08 Apr 2003 06:58 PDT

Answer:
A shape of the cost function is affected by activation function
dramatically.
If smooth activation function like sigmoid "output=1/(1-exp(input)):
is used
actual shape of the activation function (the cost functions too) 
is affected by magnitudes of the weights. While training the single
layer perceptron
starting from small initial weights, components of the weight vector
are increasing.
Large weights, however, decrease the gradient and learning speed. 
Therefore, with an increase in a number of training epochs learning
process slows down.
Consequently in single layer perceptron one can get 7 different
classification algortithms.
if one solves regression task one can get six different regressions.
In multilayer perceptron training, magnitudes of the didden layer
networks also affect
"actual shape" of activation function and non-linearity of decision
boundaries if classification
task is being solved. Similar phenomenon we have in aplying MLP to
solve
regression (prediction) tasks. In MLP as wel as in SLP, large weights
make
local minima of the cost function flat. The "weights phenomenon" is
useful also in
analyzing aging problem in artificial and natural life.
 
References: 
S. Raudys (2000). Evolution and generalization of a single neurone.
III. Primitive,
regularized, standard, robust and minimax regressions. Neural
Networks.
Pergamon-Elsevier Science LTD, Oxford,  ISSN: 0893-6080. Vol. 13(3/4),
p. 507-523.
S.Raudys (2000). Classifier's complexity control while training
multilayer perceptrons.
Lecture Notes in Computer Science. Springer-Verlag, Berlin,
Heidelberg,
NY, ISSN: 0302-9743. Vol. 1876, p. 32-44.
S .Raudys (1998). Evolution and generalization of a single neurone. I.
SLP as seven
statistical classifiers. Neural Networks. Pergamon-Elsevier Science
LTD, Oxford,
ISSN: 0893-6080. Vol. 11(2), p. 283-296. 
S. Raudys. (2001 ) Statistical and Neural Classifiers: An integrated
approach to design.
Springer-Verlag, NY. ISBN 1-85233-297-2, 312 p.
S. Raudys (2002). An adaptation model for simulation of aging process.
International
Journal of Modern Physiscs, C. World Scientific Publ. CO PTE LTD,
Singapore.
ISSN: 0129-1831, Vol. 13, p. 1075-1086.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy