Global Universality of the Two-Layer Neural Network with the <span class="nowrap"><svg xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" style="vertical-align:-0.2723999pt" id="M1" height="12.533pt" version="1.1" viewBox="-0.0657574 -12.2606 8.79534 12.533" width="8.79534pt"><g transform="matrix(.017,0,0,-0.017,0,0)"><path id="g113-108" d="M480 416C480 431 465 448 438 448C388 448 312 383 252 330C217 299 188 273 155 237H153L257 680C262 700 263 712 253 712C240 712 183 684 97 674L92 648L126 647C166 646 172 645 163 606L23 -6L29 -12C51 -5 77 2 107 8C115 62 130 128 142 180C153 193 179 220 204 241C231 170 259 106 288 54C317 0 336 -12 358 -12C381 -12 423 2 477 80L460 100C434 74 408 54 398 54C385 54 374 65 351 107C326 154 282 241 263 299C296 332 351 377 403 377C424 377 436 372 445 368C449 366 456 368 462 375C472 386 480 402 480 416Z"/></g></svg>-</span>Rectified Linear Unit

Hatano, Naoya; Ikeda, Masahiro; Ishikawa, Isao; Sawano, Yoshihiro

doi:https://doi.org/10.1155/2024/3262798

Journal of Function Spaces

On this page

Abstract Introduction Conclusion Discussion Data Availability Disclosure Conflicts of Interest Authors’ Contributions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2024 | Article ID 3262798 | https://doi.org/10.1155/2024/3262798

Global Universality of the Two-Layer Neural Network with the -Rectified Linear Unit

Naoya Hatano,¹Masahiro Ikeda,^2,3Isao Ishikawa,^3,4and Yoshihiro Sawano^1,2,3

Academic Editor: Andrea Scapellato

Received07 Jan 2022

Revised07 Feb 2023

Accepted27 Nov 2023

Published18 Jan 2024

Abstract

This paper concerns the universality of the two-layer neural network with the -rectified linear unit activation function with with a suitable norm without any restriction on the shape of the domain in the real line. This type of result is called global universality, which extends the previous result for by the present authors. This paper covers -sigmoidal functions as an application of the fundamental result on -rectified linear unit functions.

1. Introduction

The goal of this note is to specify the closure of linear subspaces generated by the -rectified linear unit functions under various norms. As in [1], for , we set

The function is called the -rectified linear unit (-ReLU for short), which is introduced to compensate for the properties that ReLU does not have. Our approach will be a completely mathematical one. Recently, increasing attention has been paid to the -ReLU function as well as the original ReLU function. For example, if , the function -ReLU is in the class , so that it is smoother than the ReLU function. When we study neural networks, the function -ReLU is called an activation function. As in [2], -ReLU functions are used to reduce the amount of computation. Using this smoothness property, Siegel and Xu investigated the error estimates of the approximation [1]. Mhaskar and Micchelli worked in compact sets in , while in the present work, we consider the approximation on the whole real line.

A problem arises when we deal with -ReLU as a function over the whole line. The function -ReLU is not bounded on . Our goal in this paper is to propose a Banach space that allows us to handle such unbounded functions. Actually, for , we let equipped with the norm and define

Note that any element in , divided by , is a continuous function over . Our main result in this paper is as follows:

Theorem 1. The linear subspace is dense in .

Understanding the structure of is important in the field machine learning in the last decade. We refer to [4, 5] for example. Furthermore, dealing with unbounded activation functions is important from the viewpoint of application (see [6]). Remark that the approximation over bounded domains has a long history (see [7]).

As is seen from the definition of the norm , when we have a function , with ease, we can find a function such that . However, after choosing such a function , we have to look for a way to control inside any compact interval by a function . Although consists of unbounded functions, we can manage to do so by induction on . Actually, we will find such that is sufficiently small once we are given a compact interval.

Theorem 1 says that the space is mathematically suitable when we consider the activation function -ReLU. We compare Theorem 1 with the following fundamental result by Cybenko. For a function space over the real line and an open set , stands for the restriction of each element to , that is, and the norm is given by

Theorem 2 (see Cybenko [8]). Let be a compact set and be a continuous sigmoidal function. Then, for all and , there exists such that We remark that Theorem 1 is not a direct consequence of Theorem 2. Theorem 2 concerns the uniform approximation over compact intervals, while Theorem 2 deals with the uniform approximation over the whole real line. We will prove Theorem 1 without using Theorem 2.
Let . Our results readily can be carried over to the case of -sigmoidal functions. As in Definition 4.1 in [7], a continuous function is -sigmoidal if Needless to say, is -sigmoidal. If , then we say that is a continuous sigmoidal. As a corollary of Theorem 1, we extend this theorem to the case of -sigmoidal.

Theorem 3. If is -sigmoidal, then the linear subspace is dense in .

We can transplant Theorem 3 to various Banach lattices over any open set on the real line . Here and below, denotes the set of all Lebesgue measurable functions from to . Let be a Banach space contained in endowed with the norm . We say that is a Banach lattice if for any and satisfying the estimate , i.e., , , and the estimate holds. We refer to [3] for the case where X is the variable exponent Lebesgue spaces. See [9] for the function spaces to which Theorem 1 is applicable.

We write

Theorem 4 (Universality on Banach lattices). Let be an open set. Assume that is continuously embedded into . Assume that . Then,

It is noteworthy that we can deal with the case of .

Remark 5. (1)The condition that is a natural condition, since (2)If , then we saw in [9] that our result recaptures the result by Funahashi [10]. So, our result includes a further extension of his result

Remark 6. Let be a Banach lattice, and let be a -sigmoidal. We put Then, by the result for the case of ,

2. Proof of Theorem 1

We need the following lemmas: we embed into a function space over .

Lemma 7. The operator , , is an isomorphism.

If , then this can be found in Lemma 3 in [9].

Proof. Observe that the inverse is given for as follows: Since the operator , , preserves the norms, we see that this operator is an isomorphism.
We set We will use the following algebraic relation for .

Lemma 8. Let . Then, for all ,

Proof of Lemma 8. By comparing the coefficients, we may reduce the matter to the proof of the following two equalities: for each . We compute and then, Hence, for each .

Although is unbounded, if we consider suitable linear combinations, we can approximate any function in .

Lemma 9. Any function in can be approximated uniformly over by the functions in . More precisely, if a function is contained in an interval and , then there exists such that and that .

For the proof, we will use the following observation: if then, by the definition of ,

Proof. We induct on . The base case was proved already [9]. Suppose that we have with for . In fact, we can approximate with the functions in supported in . Let be given. By mollification and dilation, we may assume . By the induction assumption, there exists such that where . Note that is a function in . Note that Integrating estimate (22), we obtain for . In particular, Thus, . Using Lemma 8, the dilation and translation, we choose , which depends on , , and , such that and that agrees with over . If , then for , If , then Finally, if , then Therefore, the function is a function in satisfying and , where depends on , , and , that is, and .

We will prove Theorems 1 and 3.

Proof of Theorem 1. We identify with as in Lemma 7. We have to show that any finite Borel measure in which annihilates is zero. Since is contained in the closure of the space as we have seen in Lemma 9, is not supported on . Therefore, we have only to show that and that . However, since we have shown that is not supported on , this is a direct consequence of the following observations: Thus, .

Proof of Theorem 3. We identify with as in Lemma 7 once again. Then to show that is dense in under this identification, it suffices to show that any finite measure over is zero if it annihilates .
Assuming that annihilates , we see that for any and . Since is -sigmoidal, for any fixed . Furthermore, Therefore, by the Lebesgue convergence theorem, letting in (32), we have This means that annihilates . Thus, by Theorem 1, .

3. Proof of Theorem 4—Application of Theorem 1

We show

We have by Lemma 9. Hence,

Thus, we prove the opposite inclusion.

For any , there exist such that is a polynomial of degree both on and on for . Fix for the time being. Then, we have satisfying

We define

By the use of Lemma 9, we choose a compactly supported function supported on so that

Then, we have

Since , , and as long as is large enough, . Thus, we obtain (36).

From (36), we deduce

Thus, the proof is complete if . For general Banach lattices , we use a routine approximation procedure. We prove

Let and . Then since , there exists such that

Since we know that there exist constants and such that . Hence for such and , we have . Since we assume that is continuously embedded into , we have . Therefore, we have

4. Conclusion

We specified the closure of under the norm . This is useful when we consider the approximation by functions in the function space . We illustrated this situation using Banach lattices. Our result contains the existing result on the approximation by means of a variable exponent Lebesgue space. It is also remarkable that our attempt can be located as an attempt of understanding the neural network. For example, Carroll and Dikinson used the Radon transform [11], and other research employed some other topologies (see [12, 13]).

Remark that this note is submitted as a preprint coded: https://arxiv.org/abs/2212.13713.

5. Discussion

So far, we can manage to handle the case where is a non-negative integer. Our discussion heavily depended on the algebraic relation such as Lemma 8. So, we do not know how to handle the case where is not an integer. Even for the case where , the problem is difficult.

Data Availability

No data and material were used to support this study.

Disclosure

This paper is posted as https://export.arxiv.org/pdf/2212.13713 (see [14]).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

The four authors contributed equally to this paper. All of them read the whole manuscript and approved the content of the paper.

Acknowledgments

This work was supported by a JST CREST Grant (Number JPMJCR1913, Japan). This work was also supported by the RIKEN Junior Research Associate Program. The second author was supported by a Grant-in-Aid for Young Scientists Research (No. 19K14581), Japan Society for the Promotion of Science. The fourth author was supported by a Grant-in-Aid for Scientific Research (C) (19K03546), Japan Society for the Promotion of Science.

References

J. W. Siegel and J. Xu, “High-order approximation rates for neural network with ReLU^k activation functions,” https://arxiv.org/abs/2012.07205.
View at: Google Scholar
G. Singh, R. Ganvir, M. Püschel, and M. Vechev, “Beyond the single neuron convex barrier for neural network certification,” https://files.sri.inf.ethz.ch/website/papers/neurips19_krelu.pdf.
View at: Google Scholar
Á. Capel and J. Ocáriz, “Approximation with neural networks in variable Lebesgue spaces,” https://arxiv.org/abs/2007.04166v1.
View at: Google Scholar
B. Hanin, “Universal function approximation by deep neural nets with bounded width and ReLU activations,” Mathematics, vol. 7, no. 10, p. 992, 2019.
View at: Publisher Site | Google Scholar
A. Pinkus, “Approximation theory of the MLP model in neural networks,” Acta Numerica, vol. 8, pp. 143–195, 1999.
View at: Publisher Site | Google Scholar
S. Sonoda and N. Murata, “Neural network with unbounded activation functions is universal approximator,” Applied and Computational Harmonic Analysis, vol. 43, no. 2, pp. 233–268, 2017.
View at: Publisher Site | Google Scholar
H. N. Mhaskar and C. A. Micchelli, “Approximation by superposition of sigmoidal and radial basis functions,” Advances in Applied Mathematics, vol. 13, no. 3, pp. 350–373, 1992.
View at: Publisher Site | Google Scholar
G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals, and Systems, vol. 2, no. 4, pp. 303–314, 1989.
View at: Publisher Site | Google Scholar
N. Hatano, M. Ikeda, I. Ishikawa, and Y. Sawano, “A global universality of two-layer neural networks with ReLU activations,” Journal of Function Spaces, vol. 2021, Article ID 6637220, 3 pages, 2021.
View at: Publisher Site | Google Scholar
K. Funahashi, “On the approximate realization of continuous mappings by neural networks,” Neural Networks, vol. 2, pp. 183–192, 1989.
View at: Google Scholar
M. Carroll and B. W. Dikinson, “Construction of neural nets using the Radon transform,” in Proceedings of the IEEE 1989 International Joint Conference on Neural Networks, pp. 607–611, New York, 1989.
View at: Google Scholar
K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
View at: Publisher Site | Google Scholar
Y. T. Sun, A. Gilbert, and A. Tewari, “On the approximation properties of random ReLU features,” https://arxiv.org/pdf/1810.04374.pdf.
View at: Google Scholar
N. Hatano, M. Ikeda, I. Ishikawa, and Y. Sawano, “Global universality of the two-layer neural network with the -rectified linear unit,” https://export.arxiv.org/pdf/2212.13713.
View at: Google Scholar

Copyright

Copyright © 2024 Naoya Hatano et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

73

Downloads

119

Citations