• s Amato, B Apolloni, G Caporali, U Madesani and A Zanaboni (1991): “Simulated annealing approach in backpropagation”, Neurocomputing 3, 207-220
• A J Annema, K Hoen and H Wallinga (1994): “Learning behaviour and temporary minima of two-layer neural networks”. Neural Networks, 1 (9), 1387-1404
• J Barhen, A Fijany and N Toomarian (1994): "Globally optimal neural learning". Proceedings ofWCNN ‘94, San Diego, June 1994, III 370 III 375
• E Barnard (1992): “Optimization for training neural nets”, IEEE Transactions on Neural Networks, 3 (2), March 1992, 232-240
• R Battiti and F Masulli (1990): “BFGS optimization for faster and automated supervised learning”. Proceedings of the International Neural Network Conference (INNC 90), Paris, France, 757-760
• R Battiti (1992): “First- and second-order methods for learning: between steepest descent and Newton’s method”. Neural Computation, 4 (2), 141-166
• R K Belew, J Mclnemey and N N Schraudolph (1991): "Evolving networks: using the genetic algorithm with connectionist learning", in C G Langton, C Taylor, J D Farmer and S Rasmussen, eds.. Artificial Life II, SFI Studies in the Sciences of Complexity, vol. X, Addison-Wesley, Reading, Mass., 511-547
• M Berggren (n.d.): “An efficient method of training feed-forward neural networks using conjugate gradients”. Laboratoire de L’Accélérateur Linéaire, Centre d’Orsay, Orsay, France, pre-print
• C Bishop (1992): “Exact calculation of the Hessian matrix for the mutilayer perceptron”. Neural Computation, 4, 494-501
• R P Brent (1973): Algorithms for Minimization Without Derivatives, Prentice-Hall, Inc., Englewood Cliffs, New Jersey
• R P Brent (1991): “Fast training algorithms for multilayer neural nets”, IEEE Transactions on Neural Networks, 2 (3), May 1991, 346-354
• R M Burton and G J Mpitsos (1992): “Event-dependent control of noise enhances learning in neural networks”, Neural Networks, 5, 627-637
• E C Cetin, J Barhen and J W Burdick (1993): "Terminal repeller unconstrained subenergy tunneling (TRUST) for fast global optimization", Journal o f Optimization Theory and Applications, 77 (1), April 1993, 97-126
• C Darken, J Chang and J Moody (1992): “Learning rate schedules for faster stochastic gradient search”, in S Y Kung, F Fallside, J A Sorensen and C A Kamm, eds.. Neural Networks for Signal Processing, 2, IFFF Workshop, EEFF Press, 3-13
• J F Dennis and R B Schnabel (1983): Numerical Methods fo r Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Inc., Englewood Cliffs, New Jersey
• L C W Dixon (1972): “The choice of step length, a crucial factor in the performance of variable metric algorithms”, in Lootsma, ed., 149-170
• W Finhoff and H G Zimmerman (1992): "Homotopy methods in circuit analysis", pre print, Siemens, Munich
• R Fletcher (1980): Practical Methods o f Optimization, vol. 1, Unconstrained Optimization, John Wiley & Sons, New York
• P F Gill, W Murray and M H Wright (1981): Practical Optimization, Academic Press, London
• M Gori and A Tesi (1992): “On the problem of local minima in backpropagation”, IEEE Transactions on Pattern Analysis and Machine Learning, 14 (1), 76-86 • D Gorse (1992): “Classical and stochastic search in conjugate gradient algorithms”.
Proceedings o f IJCNN ‘92, Beijing, November 1992, 435-440
• D Gorse and A Shepherd (1992): “Adding stochastic search to conjugate gradient algorithms”. Neural Network World, 2, 599-605
• D Gorse, A Shepherd and J Taylor (1993a): “Avoiding local minima using a range expansion algorithm”. Neural Network World, 5, 503-510
• D Gorse, A Shepherd and J Taylor (1993b); “Tracking global minima by progressive range expansion”, Proceedings ofIJCNN ‘93, Portland, Oregon, July 1993, IV-350- rV-353
• D Gorse, A Shepherd and J Taylor (1994a): “Avoiding local minima by a classical range expansion algorithm”. Proceedings of ICANN ‘94, Sorrento, 1, May 1994, 525- 528 (RN/94/14)
• D Gorse, A Shepherd and J Taylor (1994b): “A classical algorithm for avoiding local minima”. Proceedings ofWCNN ‘94, San Diego, June 1994, III 364 HI 369 (RN/94/15)
• D Gorse, A Shepherd and J Taylor (1995): "The new ERA in supervised learning", submitted to Neural Networks
• L G Hamey (1995): "Analysis of the error surface of the XOR network with two hidden nodes". Computing Report 95/167C, Department of Computing, Macqarie University NSW 2109 Australia
• R Hecht-Nielsen (1990): Neurocomputing, Addison-Wesley, Reading, Mass.
• D M Himmelblau (1972): “A uniform evaluation of unconstrained optimization techniques”, in Lootsma, ed., 69-97
• Y Hirose, K Yamashita and S Hijiya (1991): “Back-propagation algorithm which varies the number of hidden units”. Neural Networks, 4, 61-66
• J Holland (1975): Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor MI
• S Huang and Y Huang (91): “Bounds on the number of hidden neurons in multilayer perceptrons”, IEEE Transactions on Neural Networks, 2 (1), January 1991, 47-55
• D Jacobs, ed. (1977).' The State of the Art in Numerical Analysis: Proceedings o f the Conference on The State o f the Art in Numerical Analysis held at the University o f York, April 1976, organized by The Institute o f Mathematics and its Applications, Academic Press, London
• E M Johansson, F U Dowla and D M Goodman (1992): “Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method”, International Journal of Neural Systems, 2 (4), 291-301
• B Kappen and T Heskes (1992): “Learning rules, stochastic processes, and local minima”, in I Aleksander and J G Taylor, eds.. Artificial Neural Networks, 2: Proceedings o f the 1992 International Conference on Artificial Neural Networks (ICANN-92) Brighton, United Kingdom, 4-7 September, 1992, 2 vols., Elsevier Science Publishers B.V., Amsterdam, 1-71-1-77
• J A Kinsella (1992): “Comparison and evaluation of variants of the conjugate gradient method for efficient learning in feed-forward neural networks with backward error propagation”. Network, 3, February 1992, 27-36
• S Kirkpatrick, C D Gelatt, Jr. and M P Vecchi (1983): “Optimization by simulated annealing”. Science, 220, 671-680
• D E Knuth (1981): Semi-Numerical Algorithms, 2nd edition, vol. 2 of The Art of Computer Programming, Addison-Wesley, Reading, Mass.
• J F Kolen and J B Pollack (1990): “Backpropagation is sensitive to initial conditions”. Complex Systems, 4, 269-280
• S Kollias and D Anastassiou (1989): “An adaptive least squares algorithm for the efficient training of artificial neural networks”, IEEE Transactions on Circuit and Systems, 36 (8), August 1989, 1092-1101
• P J G Lisboa and S J Perantonis (1991): “Complete solution of the local minima in the XOR problem”. Network, 2, February 1991, 119-124
• P J G Lisboa, ed. (1992): Neural Networks: Current Applications, Chapman & Hall, London
• F A Lootsma, ed. (1972): Numerical Methods fo r Non-linear Optimization, Academic Press, New York
• D G Luenberger (1984): Linear and Nonlinear Programming, 2nd edition, Addison- Wesley, Reading, Mass.
• J M Mclnemey, K G Haines, S Biafore and R Hecht-Nielsen (1989): “Error surfaces of multi-layer networks can have local minima”. Technical Report No. CS89-157, Department of Computer Science and Engineering, University of California
• M M0 ller (1993a): “A scaled conjugate gradient algorithm for fast supervised learning”. Neural Networks, 6 (4), June 1993, 525-533 (reprinted in [Mpller, 93e], appendix A)
• M M0 ller (1993b): “Supervised learning on large redundant training sets”. International Journal o f Neural Systems, 4(1), 15-25 (reprinted in [M0ller, 93e], appendix B)
• M M0ller (1993c): “Exact calculation of the product of the Hessian matrix of feed forward network error functions and a vector in 0(N) time”. Technical Report, Daimi PB-432, Computer Science Department, Aarhus University, 1993 (reprinted in [M0 ller, 93e], appendix C)
• M M0 ller (1993d): “Adaptive Preconditioning of the Hessian Matrix”, submitted to Neural Computation (reprinted in [M0ller, 93e], Appendix D)
M M0 ller (1993e): Efficient Training o f Feed-Forward Neural Networks, PhD Thesis, Daimi PB-464, Computer Science Department, Aarhus University, December
1993
J J More (1983): “Recent developments in algorithms and software for tmst region methods”, in A Bachem, M Grotschel and B Korte, eds.. Mathematical programming: the state o f the art, Bonn 1982, Springer-Verlag, Berlin, 258-287
W Murray, ed. (1972): Numerical Methods for Unconstrained Optimization, Academic Press, London
J C Nash (1990): Compact Numerical Methods fo r Computers: Linear Algebra and Function Minimisation, Adam Hilger, Bristol
M R Osborne (1976): “Nonlinear least squares - the Levenberg algorithm revisited”. Journal o f the Australian Mathematical Society, 19 (Series B), 343-357
• D B Parker (1987): “Optimal algorithms for adaptive networks: second order back propagation, second order direct propagation, and second order Hebbian learning”. Proceedings o f the First IEEE International Conference on Neural Networks, IEEE Press, New York, June 1987, II-593-II-600
• C S Pattichis, C Charalambous, C N Schizas and L T Middleton (1991): “EMG diagnosis using conjugate gradient backpropagation neural network learning algorithm”, in T Kohonen, K Makisara, O Simula and J Kangas, eds.. Artificial Neural Networks: Proceedings o f the 1991 International Conference on Artificial Neural Networks (ICANN-9I) Espoo, Finland, 24-28 June, I99I, 2 vols., Elsevier Science Publishers B.V., Amsterdam, 1621-1624
• T Poston, C N Lee, Y Choie and Y Kwon (1991): “Local minima and back
propagation”, in Proceedings ofIJCNN ‘91, Seattle, WA, July 1991, II-173-11-176
• M J D Powell (1977): “Restart procedures for the conjugate gradient method”. Mathematical Programming, 12, April 1977, 241-254
• W H Press, B P Flannery, S A Teukolsky and W T Vetterling (1988): Numerical Recipes in C, Cambridge University Press, Cambridge
• A K Rigler, J M Irvine and T P Vogl (1991): “Rescaling of variables in back propagation learning”. Neural Networks, 4, 225-230
• D E Rumelhart, G E Hinton and R J Williams (1986): “Learning internal
representations by error propagation”, in D E Rumelhart and J L McClelland, eds.. Parallel Distributed Processing: Explorations in the Microstructure o f Cognition, vol. 1, MIT Press, Cambridge MA, 318-362
• S Santini (1992): “The bearable lightness of being: reducing the number of weights in backpropagation networks”, in I Alexander and J G Taylor eds., Artificial Neural Networks, 2: Proceedings o f the 1992 International Conference on Artificial Neural Networks (ICANN-92) Brighton, United Kingdom, 4-7 September, 1992, 2 vols., Elsevier Science Publishers B.V., Amsterdam, I-139-1-142
• R W H Sargent and D J Sebastian (1972): “Numerical experience with algorithms for unconstrained minimization”, in Lootsma, ed., 45-68
• A Shepherd (1992); “Towards a hybrid conjugate gradient/ backpropagation algorithm for training feed-forward neural networks”, MSc Thesis, Department of Computer Science, University College London
• S A Solla, E Levin and M Fleisher (1988): "Accelerated learning in layered neural networks". Complex Systems, 2, 625-639
• H Szu and R Hartley (1987): “Fast simulated annealing”. Physics Letters, 122, 157- 162
• P P van der Smagt (1990): “Neural implementation of Powell’s conjugate gradient minimisation”. Report UvA-sma-2-9012-2, Department of Computer Science and Mathematics, University of Amsterdam
• P P van der Smagt (1994): “Minimisation methods for training feedforward neural Neural Networks, 7(1), 1-11
• P D Wasserman (1989): Neural Computing: Theory and Practice, Van Nostrand Reinhold, New York
• R L Watrous (1987): “Learning algorithms for connectionist networks: applied gradient methods of nonlinear optimization”. Proceedings o f the International Conference on Neural Networks, IEEE Press, New York, July 1987, II-619-11-627 • L E A Wessels and E Barnard (1992): “Avoiding false local minima by proper
initialization of connections”, IEEE Transactions on Neural Networks, 3 (6 ), November 1992, 899-905
• L E A Wessels, E Barnard and E van Rooyen (1990): “The physical correlates of local minima”, in Proceedings o f the International Neural Network Conference (Paris), 985
• M A Wolfe (1978): Numerical Methods for Unconstrained Optimization: an introduction. Van Nostrand Reinhold, New York
• X Yu (1992): “Can backpropagation error surface not have local minima” (sic), IEEE Transactions on Neural Networks, 3 (6), November 1992, 1019-1021