Abstract

This paper concerns the universality of the two-layer neural network with the -rectified linear unit activation function with with a suitable norm without any restriction on the shape of the domain in the real line. This type of result is called global universality, which extends the previous result for by the present authors. This paper covers -sigmoidal functions as an application of the fundamental result on -rectified linear unit functions.

1. Introduction

The goal of this note is to specify the closure of linear subspaces generated by the -rectified linear unit functions under various norms. As in [1], for , we set

The function is called the -rectified linear unit (-ReLU for short), which is introduced to compensate for the properties that ReLU does not have. Our approach will be a completely mathematical one. Recently, increasing attention has been paid to the -ReLU function as well as the original ReLU function. For example, if , the function -ReLU is in the class , so that it is smoother than the ReLU function. When we study neural networks, the function -ReLU is called an activation function. As in [2], -ReLU functions are used to reduce the amount of computation. Using this smoothness property, Siegel and Xu investigated the error estimates of the approximation [1]. Mhaskar and Micchelli worked in compact sets in , while in the present work, we consider the approximation on the whole real line.

A problem arises when we deal with -ReLU as a function over the whole line. The function -ReLU is not bounded on . Our goal in this paper is to propose a Banach space that allows us to handle such unbounded functions. Actually, for , we let equipped with the norm and define

Note that any element in , divided by , is a continuous function over . Our main result in this paper is as follows:

Theorem 1. The linear subspace is dense in .

Understanding the structure of is important in the field machine learning in the last decade. We refer to [4, 5] for example. Furthermore, dealing with unbounded activation functions is important from the viewpoint of application (see [6]). Remark that the approximation over bounded domains has a long history (see [7]).

As is seen from the definition of the norm , when we have a function , with ease, we can find a function such that . However, after choosing such a function , we have to look for a way to control inside any compact interval by a function . Although consists of unbounded functions, we can manage to do so by induction on . Actually, we will find such that is sufficiently small once we are given a compact interval.

Theorem 1 says that the space is mathematically suitable when we consider the activation function -ReLU. We compare Theorem 1 with the following fundamental result by Cybenko. For a function space over the real line and an open set , stands for the restriction of each element to , that is, and the norm is given by

Theorem 2 (see Cybenko [8]). Let be a compact set and be a continuous sigmoidal function. Then, for all and , there exists such that We remark that Theorem 1 is not a direct consequence of Theorem 2. Theorem 2 concerns the uniform approximation over compact intervals, while Theorem 2 deals with the uniform approximation over the whole real line. We will prove Theorem 1 without using Theorem 2.
Let . Our results readily can be carried over to the case of -sigmoidal functions. As in Definition 4.1 in [7], a continuous function is -sigmoidal if Needless to say, is -sigmoidal. If , then we say that is a continuous sigmoidal. As a corollary of Theorem 1, we extend this theorem to the case of -sigmoidal.

Theorem 3. If is -sigmoidal, then the linear subspace is dense in .

We can transplant Theorem 3 to various Banach lattices over any open set on the real line . Here and below, denotes the set of all Lebesgue measurable functions from to . Let be a Banach space contained in endowed with the norm . We say that is a Banach lattice if for any and satisfying the estimate , i.e., , , and the estimate holds. We refer to [3] for the case where X is the variable exponent Lebesgue spaces. See [9] for the function spaces to which Theorem 1 is applicable.

We write

Theorem 4 (Universality on Banach lattices). Let be an open set. Assume that is continuously embedded into . Assume that . Then,

It is noteworthy that we can deal with the case of .

Remark 5. (1)The condition that is a natural condition, since (2)If , then we saw in [9] that our result recaptures the result by Funahashi [10]. So, our result includes a further extension of his result

Remark 6. Let be a Banach lattice, and let be a -sigmoidal. We put Then, by the result for the case of ,

2. Proof of Theorem 1

We need the following lemmas: we embed into a function space over .

Lemma 7. The operator , , is an isomorphism.

If , then this can be found in Lemma 3 in [9].

Proof. Observe that the inverse is given for as follows: Since the operator , , preserves the norms, we see that this operator is an isomorphism.
We set We will use the following algebraic relation for .

Lemma 8. Let . Then, for all ,

Proof of Lemma 8. By comparing the coefficients, we may reduce the matter to the proof of the following two equalities: for each . We compute and then, Hence, for each .

Although is unbounded, if we consider suitable linear combinations, we can approximate any function in .

Lemma 9. Any function in can be approximated uniformly over by the functions in . More precisely, if a function is contained in an interval and , then there exists such that and that .

For the proof, we will use the following observation: if then, by the definition of ,

Proof. We induct on . The base case was proved already [9]. Suppose that we have with for . In fact, we can approximate with the functions in supported in . Let be given. By mollification and dilation, we may assume . By the induction assumption, there exists such that where . Note that is a function in . Note that Integrating estimate (22), we obtain for . In particular, Thus, . Using Lemma 8, the dilation and translation, we choose , which depends on , , and , such that and that agrees with over . If , then for , If , then Finally, if , then Therefore, the function is a function in satisfying and , where depends on , , and , that is, and .

We will prove Theorems 1 and 3.

Proof of Theorem 1. We identify with as in Lemma 7. We have to show that any finite Borel measure in which annihilates is zero. Since is contained in the closure of the space as we have seen in Lemma 9, is not supported on . Therefore, we have only to show that and that . However, since we have shown that is not supported on , this is a direct consequence of the following observations: Thus, .

Proof of Theorem 3. We identify with as in Lemma 7 once again. Then to show that is dense in under this identification, it suffices to show that any finite measure over is zero if it annihilates .
Assuming that annihilates , we see that for any and . Since is -sigmoidal, for any fixed . Furthermore, Therefore, by the Lebesgue convergence theorem, letting in (32), we have This means that annihilates . Thus, by Theorem 1, .

3. Proof of Theorem 4—Application of Theorem 1

We show

We have by Lemma 9. Hence,

Thus, we prove the opposite inclusion.

For any , there exist such that is a polynomial of degree both on and on for . Fix for the time being. Then, we have satisfying

We define

By the use of Lemma 9, we choose a compactly supported function supported on so that

Then, we have

Then, we have

Since , , and as long as is large enough, . Thus, we obtain (36).

From (36), we deduce

Thus, the proof is complete if . For general Banach lattices , we use a routine approximation procedure. We prove

Let and . Then since , there exists such that

Since we know that there exist constants and such that . Hence for such and , we have . Since we assume that is continuously embedded into , we have . Therefore, we have

4. Conclusion

We specified the closure of under the norm . This is useful when we consider the approximation by functions in the function space . We illustrated this situation using Banach lattices. Our result contains the existing result on the approximation by means of a variable exponent Lebesgue space. It is also remarkable that our attempt can be located as an attempt of understanding the neural network. For example, Carroll and Dikinson used the Radon transform [11], and other research employed some other topologies (see [12, 13]).

Remark that this note is submitted as a preprint coded: https://arxiv.org/abs/2212.13713.

5. Discussion

So far, we can manage to handle the case where is a non-negative integer. Our discussion heavily depended on the algebraic relation such as Lemma 8. So, we do not know how to handle the case where is not an integer. Even for the case where , the problem is difficult.

Data Availability

No data and material were used to support this study.

Disclosure

This paper is posted as https://export.arxiv.org/pdf/2212.13713 (see [14]).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

The four authors contributed equally to this paper. All of them read the whole manuscript and approved the content of the paper.

Acknowledgments

This work was supported by a JST CREST Grant (Number JPMJCR1913, Japan). This work was also supported by the RIKEN Junior Research Associate Program. The second author was supported by a Grant-in-Aid for Young Scientists Research (No. 19K14581), Japan Society for the Promotion of Science. The fourth author was supported by a Grant-in-Aid for Scientific Research (C) (19K03546), Japan Society for the Promotion of Science.