Methods for controlling the bias/variance tradeoff typically assume that overfitting or over training is a global phenomenon. For multi-layer perceptron (MLP) neural networks, global parameters such as the training time (e.g. based on validation tests), network size, or the amount of weight decay are commonly used to control the bias/variance tradeoff. However, the degree of overfitting can vary significantly throughout the input space of the model. We show that over selection of the degrees of freedom for an MLP trained with backpropagation can improve the approximation in regions of under fitting, while not significantly overfitting in other regions. This can be a significant advantage over other models. Furthermore, we show that “better” learning algorithms such as conjugate gradient can in fact lead to worse generalization, because they can be more prone to creating varying degrees of overfitting in different regions of the input space. While experimental results cannot cover all practical situations, our results do help to explain common behavior that does not agree with theoretical expectations. Our results suggest one important reason for the relative success of MLPs, bring into question common beliefs about neural network training regarding training algorithms, overfitting, and optimal network size, suggest alternate guidelines for practical use (in terms of the training algorithm and network size selection), and help to direct future work (e.g. regarding the importance of the MLP/BP training bias, the possibility of worse performance for “better” training algorithms, local “smoothness” criteria, and further investigation of localized overfitting).
Citation:
Steve Lawrence, C. Lee Giles, "Overfitting and Neural Networks: Conjugate Gradient and Backpropagation," ijcnn, vol. 1, pp.1114, IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 1, 2000