Chapter 9: The Man Who Set Back Deep Learning (Not Really)
Loading audio…
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Cybenko rigorously proved that a neural network with a single hidden layer, equipped with sufficient neurons and a non-linear activation function such as the sigmoid, could approximate any continuous function with arbitrary precision. Though now recognized as a foundational theoretical result, the theorem was frequently misinterpreted during its initial reception, with some researchers concluding that shallow networks were theoretically sufficient and therefore deeper architectures unnecessary—a misunderstanding that ironically may have delayed the practical advancement of deep learning systems. The chapter delves into the mathematical machinery underlying this proof, including the treatment of functions as vectors inhabiting infinite-dimensional spaces, the role of sigmoid-based neuron assemblies in constructing function approximations, and the elegant logical structure of Cybenko's proof by contradiction within functional analysis. Ananthaswamy uses this historical narrative to illuminate a persistent tension that characterizes modern machine learning: the divergence between theoretical predictions and empirical observations. Classical theory suggests that networks should succumb to overfitting and suffer from the curse of dimensionality in high-dimensional spaces, yet contemporary deep neural networks consistently achieve remarkable generalization despite containing millions of parameters and operating in extremely high-dimensional settings. By tracing how one influential theorem both inspired and confused generations of researchers, this chapter reveals the intricate relationship between mathematical abstraction and practical innovation in the development of modern artificial intelligence.