# Neural Network with Unbounded Activations is Universal Approximator

@article{Sonoda2015NeuralNW, title={Neural Network with Unbounded Activations is Universal Approximator}, author={Sho Sonoda and N. Murata}, journal={ArXiv}, year={2015}, volume={abs/1505.03654} }

This paper investigates the approximation property of the neural network with unbounded activation functions, such as the rectied linear unit (ReLU), which is new de-facto standard of deep learning. The ReLU network can be analyzed by the ridgelet transform with respect to Lizorkin distributions, which is introduced in this paper. By showing two reconstruction formulas by using the Fourier slice theorem and the Radon transform, it is shown that the neural network with unbounded activations… Expand

#### Figures, Tables, and Topics from this paper

#### 111 Citations

Integral representation of shallow neural network that attains the global minimum.

- Mathematics, Computer Science
- 2018

The modified ridgelet transform has an explicit expression that can be computed by numerical integration, which suggests that the global minimizer of BP, without BP, can be obtained. Expand

Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions

- Computer Science
- Neural Networks
- 2021

It is proved that a family of deep neural networks consisting of two groups of convolutional layers, a downsampling operator, and a fully connected layer can outperform fully connected shallow networks in approximating radial functions with Q(x) = |x|2, when the dimension d of data from Rd is large. Expand

Integral representation of the global minimizer

- Mathematics, Computer Science
- ArXiv
- 2018

The obtained integral representation provides an explicit expression of the global minimizer, without linearity-like assumptions, such as partial linearity and monotonicity, and indicates that the ordinary ridgelet transform provides the minimum norm solution to the original training equation. Expand

Deep Convolutional Neural Nets

- 2015

Neural nets are a class of predictors that have been shown empirically to achieve very good performance on tasks whose inputs are images, speech, or audio signals. They have also been applied to… Expand

Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems

- Mathematics, Computer Science
- 2019

It is demonstrated that the separability assumption using a Neural tangent model is more reasonable than the positivity condition of the neural tangent kernel and provides a refined convergence analysis of the gradient descent for two-layer networks with smooth activations. Expand

The global optimum of shallow neural network is attained by ridgelet transform

- Computer Science
- 2018

By introducing a continuous model of neural networks, this work reduces the training problem to a convex optimization in an infinite dimensional Hilbert space, and obtains the explicit expression of the global optimizer via the ridgelet transform. Expand

Fast generalization error bound of deep learning without scale invariance of activation functions

- Medicine, Computer Science
- Neural Networks
- 2020

It is shown that scale invariance of the activation functions is not essential to obtain a fast rate of convergence, and it is concluded that the theoretical framework proposed by Suzuki (2018) can be widely applied to the analysis of deep learning with general activation functions. Expand

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case

- Computer Science, Mathematics
- ICLR
- 2020

This paper characterize the norm required to realize a function as a single hidden-layer ReLU network with an unbounded number of units, but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm. Expand

Greedy Shallow Networks: A New Approach for Constructing and Training Neural Networks

- Computer Science, Mathematics
- ArXiv
- 2019

A novel greedy approach to obtain a single layer neural network approximation to a target function with the use of a ReLU activation function and an integral representation of the network, based on the ridgelet transform, that significantly reduces the cardinality of the dictionary and hence promotes feasibility of the proposed method. Expand

Effect of Activation Functions on the Training of Overparametrized Neural Nets

- Computer Science, Mathematics
- ICLR
- 2020

This paper provides theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks and discusses a number of extensions and applications of these results. Expand

#### References

SHOWING 1-10 OF 65 REFERENCES

Harmonic Analysis of Neural Networks

- Mathematics
- 1999

Abstract It is known that superpositions of ridge functions (single hidden-layer feedforward neural networks) may give good approximations to certain kinds of multivariate functions. It remains… Expand

Universal approximation bounds for superpositions of a sigmoidal function

- Mathematics, Computer Science
- IEEE Trans. Inf. Theory
- 1993

The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption. Expand

Harmonic Analysis of Neural Networks

- Mathematics
- 1998

It is known that superpositions of ridge functions (single hidden-layer feedforward neural networks) may give good approximations to certain kinds of multivariate functions. It remains unclear,… Expand

On the approximate realization of continuous mappings by neural networks

- Mathematics, Computer Science
- Neural Networks
- 1989

It is proved that any continuous mapping can be approximately realized by Rumelhart-Hinton-Williams' multilayer neural networks with at least one hidden layer whose output functions are sigmoid functions. Expand

Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function

- Mathematics, Computer Science
- Neural Networks
- 1993

It is shown that a standard multilayer feedforward network can approximate any continuous function to any degree of accuracy if and only if the network's activation functions are not polynomial. Expand

An Integral Representation of Functions Using Three-layered Networks and Their Approximation Bounds

- Mathematics, Medicine
- Neural Networks
- 1996

A new theorem on an integral transform of ridge functions is presented and an approximation bound, which evaluates the quantitative relationship between the approximation accuracy and the number of elements in the hidden layer, can be obtained. Expand

Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory

- Mathematics, Computer Science
- Neural Networks
- 1991

Abstract The starting point of this article is the inversion formula of the Radon transform; the article aims to contribute to the theory of three-layered neural networks. Let H be the Heaviside… Expand

On rectified linear units for speech processing

- Computer Science
- 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013

This work shows that it can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units. Expand

ImageNet classification with deep convolutional neural networks

- Computer Science
- Commun. ACM
- 2012

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand

Approximation theory of the MLP model in neural networks

- Computer Science
- 1999

In this survey we discuss various approximation-theoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. The MLP model is one of the more popular and… Expand