Skip to main content
Log in

Kernel Selection in Nonparametric Regression

  • Published:
Mathematical Methods of Statistics Aims and scope Submit manuscript

Abstract

In the regression model \(Y=b(X)+\sigma(X)\varepsilon\), where \(X\) has a density \(f\), this paper deals with an oracle inequality for an estimator of \(bf\), involving a kernel in the sense of Lerasle et al. [13], selected via the PCO method. In addition to the bandwidth selection for kernel-based estimators already studied in Lacour et al. [12] and Comte and Marie [3], the dimension selection for anisotropic projection estimators of \(f\) and \(bf\) is covered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. G. Chagny, ‘‘Warped Bases for Conditional Density Estimation,’’ Math. Methods Statist. 22, 253–282 (2013).

    Article  MathSciNet  Google Scholar 

  2. F. Comte, ‘‘Estimation non-paramétrique,’’ Spartacus IDH (2014).

    MATH  Google Scholar 

  3. F. Comte and N. Marie, ‘‘Bandwidth selection for the Wolverton–Wagner estimator,’’ Journal of Statistical Planning and Inference 207, 198–214 (2020).

    Article  MathSciNet  Google Scholar 

  4. F. Comte and N. Marie, ‘‘On a Nadaraya–Watson estimator with two bandwidths,’’ Submitted (2020).

  5. F. Comte and T. Rebafka, ‘‘Nonparametric weighted estimators for Biased data,’’ Journal of Statistical Planning and Inference 174, 104–128 (2016).

    Article  MathSciNet  Google Scholar 

  6. R. A. DeVore and G. G. Lorentz, Constructive Approximation (Springer-Verlag, 1993).

  7. U. Einmahl and D. M. Mason, ‘‘An empirical process approach to the uniform consistency of kernel-type function estimators,’’ Journal of Theoretical Probability 13, 1–37 (2000).

    Article  MathSciNet  Google Scholar 

  8. U. Einmahl and D. M. Mason, ‘‘Uniform in Bandwidth Consistency of Kernel-Type Function Estimators,’’ Annals of Statistics 33, 1380–1403 (2005).

    Article  MathSciNet  Google Scholar 

  9. E. Giné and R. Nickl, Mathematical Foundations of Infinite-Dimensional Statistical Models (Cambridge University Press, 2015).

    MATH  Google Scholar 

  10. A. Goldenshluger and O. Lepski, ‘‘Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality,’’ The Annals of Statistics 39, 1608–1632 (2011).

    Article  MathSciNet  Google Scholar 

  11. C. Houdré and P. Reynaud-Bouret, Exponential Inequalities, with Constants, for U-statistics of Order Two. Stochastic Inequalities and Applications, vol. 56 of Progr. Proba., Birkhauser, 2003, pp. 55–69.

  12. C. Lacour, P. Massart, and V. Rivoirard, ‘‘Estimator Selection: a New Method with Applications to Kernel Density Estimation,’’ Sankhya A 79 (2), 298–335 (2017).

    Article  MathSciNet  Google Scholar 

  13. M. Lerasle, N.M. Magalhaes, and P. Reynaud-Bouret. Optimal Kernel Selection for Density Estimation. High Dimensional Probabilities VII: The Cargese Volume, vol. 71 of Prog. Proba., Birkhauser, 2016, pp. 435–460.

  14. P. Massart, Concentration Inequalities and Model Selection. Lecture Notes in Mathematics 1896 (Springer, 2007).

  15. E.A. Nadaraya, ‘‘On a regression estimate,’’ (Russian) Verojatnost. i Primenen. 9, 157–159 (1964).

    MathSciNet  MATH  Google Scholar 

  16. E. Parzen, ‘‘On the estimation of a probability density function and the mode,’’ The Annals of Mathematical Statistics 33, 1065–1076 (1962).

    Article  MathSciNet  Google Scholar 

  17. M. Rosenblatt, ‘‘Remarks on some nonparametric estimates of a density function,’’ The Annals of Mathematical Statistics 27, 832–837 (1956).

    Article  MathSciNet  Google Scholar 

  18. A. Tsybakov, Introduction to Nonparametric Estimation (Springer, 2009).

    Book  Google Scholar 

  19. S. Varet, C. Lacour, P. Massart, and V. Rivoirard, Numerical Performance of Penalized Comparison to Overfitting for Multivariate Density Estimation. Preprint, 2020.

  20. G. S. Watson, ‘‘Smooth regression analysis,’’ Sankhya A 26, 359–372 (1964).

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to H. Halconruy or N. Marie.

APPENDIX A.

A. DETAILS ON KERNELS SETS: PROOFS OF PROPOSITIONS 2.2, 2.3, 2.6, AND 3.6

A.1. Proof of Proposition 2.2

Consider \(K,K^{\prime}\in\mathcal{K}_{k}(h_{\textrm{min}})\). Then, there exist \(h,h^{\prime}\in\mathcal{H}(h_{\textrm{min}})^{d}\) such that

$$K(x^{\prime},x)=k_{h}(x^{\prime}-x)\quad\textrm{and}\quad K^{\prime}(x^{\prime},x)=k_{h^{\prime}}(x^{\prime}-x)$$

for every \(x,x^{\prime}\in\mathbb{R}^{d}\), where

$$k_{h}(x):=\prod_{q=1}^{d}\frac{1}{h_{q}}k\left(\frac{x_{q}}{h_{q}}\right);\quad\forall x\in\mathbb{R}^{d}.$$

(1) For every \(x^{\prime}\in\mathbb{R}^{d}\), since \(nh_{\textrm{min}}^{d}\geqslant 1\),

$$||K(x^{\prime},.)||_{2}^{2}=\left(\prod_{q=1}^{d}\frac{1}{h_{q}^{2}}\right)\left[\int\limits_{\mathbb{R}^{d}}\prod_{q=1}^{d}k\left(\frac{x_{q}^{\prime}-x_{q}}{h_{q}}\right)^{2}\lambda_{d}(dx)\right]=||k||_{2}^{2d}\prod_{q=1}^{d}\frac{1}{h_{q}}$$

$${}\leqslant||k||_{2}^{2d}\frac{1}{h_{\textrm{min}}^{d}}\leqslant||k||_{2}^{2d}n.$$
(A.1)

(2) Since \(s_{K,\ell}=K\ast s\) and by Young’s inequality, \(||s_{K,\ell}||_{2}^{2}\leqslant||k||_{1}^{2d}||s||_{2}^{2}\).

(3) On the one hand, thanks to Eq. (A.1),

$$\overline{s}_{K^{\prime},\ell}=\mathbb{E}(||K^{\prime}(X_{1},.)\ell(Y_{1})||_{2}^{2})=||k||_{2}^{2d}\mathbb{E}(\ell(Y_{1})^{2})\prod_{q=1}^{d}\frac{1}{h_{q}^{\prime}}.$$

On the other hand, for every \(x,x^{\prime}\in\mathbb{R}^{d}\),

$$\langle K(x,.),K^{\prime}(x^{\prime},.)\rangle_{2}=\int\limits_{\mathbb{R}^{d}}k_{h}(x-x^{\prime\prime})k_{h^{\prime}}(x^{\prime}-x^{\prime\prime})\lambda_{d}(dx^{\prime\prime})=(k_{h}\ast k_{h^{\prime}})(x-x^{\prime}).$$

Then,

$$\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})=\mathbb{E}((k_{h}\ast k_{h^{\prime}})(X_{1}-X_{2})^{2}\ell(Y_{2})^{2})$$

$${}=\int\limits_{\mathbb{R}^{d+1}}\left[\ell(y)^{2}\int\limits_{\mathbb{R}^{d}}(k_{h}\ast k_{h^{\prime}})(x^{\prime}-x)^{2}f(x^{\prime})\lambda_{d}(dx^{\prime})\right]\mathbb{P}_{(X_{2},Y_{2})}(dx,dy)$$

$${}\leqslant||f||_{\infty}||k_{h}\ast k_{h^{\prime}}||_{2}^{2}\mathbb{E}(\ell(Y_{2})^{2})\leqslant||f||_{\infty}||k||_{1}^{2d}\overline{s}_{K^{\prime},\ell}.$$

(4) For every \(\psi\in\mathbb{L}^{2}(\mathbb{R}^{d})\),

$$\mathbb{E}(\langle K(X_{1},.),\psi\rangle_{2}^{2})=\mathbb{E}((k_{h}\ast\psi)(X_{1})^{2})$$

$${}\leqslant||f||_{\infty}||k_{h}\ast\psi||_{2}^{2}\leqslant||f||_{\infty}||k||_{1}^{2d}||\psi||_{2}^{2}.$$

A.2. Proof of Proposition 2.3

Consider \(K,K^{\prime}\in\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})\). Then, there exist \(m,m^{\prime}\in\{1,\dots,m_{\textrm{max}}\}^{d}\) such that

$$K(x^{\prime},x)=\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}\varphi_{j}^{m_{q}}(x_{q})\varphi_{j}^{m_{q}}(x_{q}^{\prime})\quad\textrm{and}\quad K^{\prime}(x^{\prime},x)=\prod_{q=1}^{d}\sum_{j=1}^{m_{q}^{\prime}}\varphi_{j}^{m_{q}^{\prime}}(x_{q})\varphi_{j}^{m_{q}^{\prime}}(x_{q}^{\prime})$$

for every \(x,x^{\prime}\in\mathbb{R}^{d}\).

(1) For every \(x^{\prime}\in\mathbb{R}^{d}\), since \(m_{\textrm{max}}^{d}\leqslant n\),

$$||K(x^{\prime},.)||_{2}^{2}=\prod_{q=1}^{d}\sum_{j,j^{\prime}=1}^{m_{q}}\varphi_{j^{\prime}}^{m_{q}}(x_{q}^{\prime})\varphi_{j}^{m_{q}}(x_{q}^{\prime})\int\limits_{-\infty}^{\infty}\varphi_{j^{\prime}}^{m_{q}}(x)\varphi_{j}^{m_{q}}(x)dx=\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}\varphi_{j}^{m_{q}}(x_{q}^{\prime})^{2}$$

$${}\leqslant\mathfrak{m}_{\mathcal{B}}^{d}\prod_{q=1}^{d}m_{q}\leqslant\mathfrak{m}_{\mathcal{B}}^{d}n.$$
(A.2)

(2) Since

$$s_{K,\ell}(.)=\sum_{j_{1}=1}^{m_{1}}\cdots\sum_{j_{d}=1}^{m_{d}}\langle s,\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}}\rangle_{2}(\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}})(.)$$

by Pythagoras theorem, \(||s_{K,\ell}||_{2}^{2}\leqslant||s||_{2}^{2}\).

(3) First of all, thanks to Eq. (A.2),

$$\overline{s}_{K^{\prime},\ell}=\mathbb{E}\left[\ell(Y_{1})^{2}\prod_{q=1}^{d}\sum_{j=1}^{m_{q}^{\prime}}\varphi_{j}^{m_{q}^{\prime}}(X_{1,q})^{2}\right]\leqslant\mathfrak{m}_{\mathcal{B}}^{d}\mathbb{E}(\ell(Y_{1})^{2})\prod_{q=1}^{d}m_{q}^{\prime}.$$

On the one hand, under condition (5) on \(\mathcal{B}_{1},\dots,\mathcal{B}_{n}\), for any \(j\in\{1,\dots,m\}\), \(\varphi_{j}^{m}\) doesn’t depend on \(m\), so it can be denoted by \(\varphi_{j}\), and then

$$\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})=\int\limits_{\mathbb{R}^{d}}\mathbb{E}\left[\left(\prod_{q=1}^{d}\sum_{j=1}^{m_{q}\wedge m_{q}^{\prime}}\varphi_{j}(x_{q}^{\prime})\varphi_{j}(X_{2,q})\right)^{2}\ell(Y_{2})^{2}\right]f(x^{\prime})\lambda_{d}(dx^{\prime})$$

$${}\leqslant||f||_{\infty}\mathbb{E}\left[\ell(Y_{2})^{2}\prod_{q=1}^{d}\sum_{j,j^{\prime}=1}^{m_{q}\wedge m_{q}^{\prime}}\varphi_{j^{\prime}}(X_{2,q})\varphi_{j}(X_{2,q})\int\limits_{-\infty}^{\infty}\varphi_{j^{\prime}}(x^{\prime})\varphi_{j}(x^{\prime})dx^{\prime}\right]$$

$${}\leqslant||f||_{\infty}\overline{s}_{K^{\prime},\ell}.$$

On the other hand, under condition (6) on \(\mathcal{B}_{1},\dots,\mathcal{B}_{n}\), since \(X_{1}\) and \((X_{2},Y_{2})\) are independent, and since \(K(x,x)\geqslant 0\) for every \(x\in\mathbb{R}^{d}\),

$$\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})\leqslant\mathbb{E}(||K(X_{1},.)||_{2}^{2}||K^{\prime}(X_{2},.)||_{2}^{2}\ell(Y_{2})^{2})$$

$${}=\mathbb{E}(K(X_{1},X_{1}))\mathbb{E}(||K^{\prime}(X_{2},.)||_{2}^{2}\ell(Y_{2})^{2})\leqslant\overline{\mathfrak{m}}_{\mathcal{B}}\overline{s}_{K^{\prime},\ell}.$$

(4) For every \(\psi\in\mathbb{L}^{2}(\mathbb{R}^{d})\),

$$\mathbb{E}(\langle K(X_{1},.),\psi\rangle_{2}^{2})=\mathbb{E}\left[\left|\sum_{j_{1}=1}^{m_{1}}\cdots\sum_{j_{d}=1}^{m_{d}}\langle\psi,\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}}\rangle_{2}(\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}})(X_{1})\right|^{2}\right]$$

$${}\leqslant||f||_{\infty}\left|\left|\sum_{j_{1}=1}^{m_{1}}\cdots\sum_{j_{d}=1}^{m_{d}}\langle\psi,\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}}\rangle_{2}(\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}})(.)\right|\right|_{2}^{2}\leqslant||f||_{\infty}||\psi||_{2}^{2}.$$

A.3. Proof of Proposition 2.6

For the sake of readability, assume that \(d=1\). Consider \(m\in\{1,\dots,m_{\textrm{max}}\}\). Since each Legendre’s polynomial is uniformly bounded by \(1\),

$$\left|\mathbb{E}\left[\sum_{j=1}^{m}\xi_{j}^{m}(X_{1})\xi_{j}^{m}(x^{\prime})\right]\right|\leqslant\sum_{j=1}^{m}\frac{2j+1}{2}\left|\int\limits_{-1}^{1}Q_{j}(x)f(x)dx\right|.$$

Moreover, since \(Q_{j}\) is a solution to Legendre’s differential equation for any \(j\in\{1,\dots,m\}\), thanks to the integration by parts formula,

$$\int\limits_{-1}^{1}Q_{j}(x)f(x)dx=-\frac{1}{j(j+1)}\int\limits_{-1}^{1}\frac{d}{dx}[(1-x^{2})Q_{j}^{\prime}(x)]f(x)dx$$

$${}=-\frac{1}{j(j+1)}[(1-x^{2})Q_{j}^{\prime}(x)f(x)]_{-1}^{1}+\frac{1}{j(j+1)}\int\limits_{-1}^{1}(1-x^{2})Q_{j}^{\prime}(x)f^{\prime}(x)dx$$

$${}=-\frac{1}{j(j+1)}\int\limits_{-1}^{1}Q_{j}(x)\frac{d}{dx}[(1-x^{2})f^{\prime}(x)]dx.$$

Then,

$$\left|\int\limits_{-1}^{1}Q_{j}(x)f(x)dx\right|\leqslant\frac{2\mathfrak{c}_{1}}{j(j+1)}||Q_{j}||_{2}=\frac{2\sqrt{2}\mathfrak{c}_{1}}{j(j+1)(2j+1)^{1/2}}$$

with \(\mathfrak{c}_{1}=\max\{2||f^{\prime}||_{\infty},||f^{\prime\prime}||_{\infty}\}\). So,

$$\left|\mathbb{E}\left[\sum_{j=1}^{m}\xi_{j}^{m}(X_{1})\xi_{j}^{m}(x^{\prime})\right]\right|\leqslant 2\mathfrak{c}_{1}\sum_{j=1}^{m}\frac{1}{j^{3/2}}\leqslant 2\mathfrak{c}_{1}\zeta\left(\frac{3}{2}\right),$$

where \(\zeta\) is Riemann’s zeta function. Thus, Legendre’s basis satisfies condition (6).

A.4. Proof of Proposition 3.6

The proof of Proposition 3.6 relies on the following technical lemma.

Lemma A.1. For every \(x\in[0,2\pi]\) and \(p,q\in\mathbb{N}^{*}\) such that \(q>p\),

$$\left|\sum_{j=p+1}^{q}\frac{\sin(jx)}{j}\right|\leqslant\frac{2}{(1+p)\sin(x/2)}.$$

See Subsubsection A.4.1. for a proof.

For the sake of readability, assume that \(d=1\). Consider \(K,K^{\prime}\in\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})\). Then, there exist \(m,m^{\prime}\in\{1,\dots,m_{\textrm{max}}\}\) such that

$$K(x^{\prime},x)=\sum_{j=1}^{m}\chi_{j}(x)\chi_{j}(x^{\prime})\quad\textrm{and}\quad K^{\prime}(x^{\prime},x)=\sum_{j=1}^{m^{\prime}}\chi_{j}(x)\chi_{j}(x^{\prime});\quad\forall x,x^{\prime}\in\mathbb{R}.$$

First, there exist \(\mathfrak{m}_{1}(m,m^{\prime})\in\{0,\dots,n\}\) and \(\mathfrak{c}_{1}>0\), not depending on \(n\), \(K\) and \(K^{\prime}\), such that for any \(x^{\prime}\in[0,1]\),

$$|\langle K(x^{\prime},.),s_{K^{\prime},\ell}\rangle_{2}|=\left|\sum_{j=1}^{m\wedge m^{\prime}}\mathbb{E}(\ell(Y_{1})\chi_{j}(X_{1}))\chi_{j}(x^{\prime})\right|$$

$${}\leqslant\mathfrak{c}_{1}+2\left|\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\mathbb{E}(\ell(Y_{1})(\cos(2\pi jX_{1})\cos(2\pi jx^{\prime})+\sin(2\pi jX_{1})\sin(2\pi jx^{\prime}))\mathbf{1}_{[0,1]}(X_{1}))\right|$$

$${}=\mathfrak{c}_{1}+2\left|\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\mathbb{E}(\ell(Y_{1})\cos(2\pi j(X_{1}-x^{\prime}))\mathbf{1}_{[0,1]}(X_{1}))\right|.$$

Moreover, for any \(j\in\{2,\dots,\mathfrak{m}_{1}(m,m^{\prime})\}\),

$$\mathbb{E}(\ell(Y_{1})\cos(2\pi j(X_{1}-x^{\prime}))\mathbf{1}_{[0,1]}(X_{1}))=\int\limits_{0}^{1}\cos(2\pi j(x-x^{\prime}))s(x)dx$$

$${}=\frac{1}{j}\left[\frac{\sin(2\pi j(x-x^{\prime}))}{2\pi}s(x)\right]_{0}^{1}$$

$${}+\frac{1}{j^{2}}\left[\frac{\cos(2\pi j(x-x^{\prime}))}{4\pi^{2}}s^{\prime}(x)\right]_{0}^{1}-\frac{1}{j^{2}}\int\limits_{0}^{1}\frac{\cos(2\pi j(x-x^{\prime}))}{4\pi^{2}}s^{\prime\prime}(x)dx$$

$${}=\frac{s(0)-s(1)}{2\pi}\frac{\alpha_{j}(x^{\prime})}{j}+\frac{\beta_{j}(x^{\prime})}{j^{2}}$$

where \(\alpha_{j}(x^{\prime}):=\sin(2\pi jx^{\prime})\) and

$$\beta_{j}(x^{\prime}):=\frac{1}{4\pi^{2}}\left((s^{\prime}(1)-s^{\prime}(0))\cos(2\pi jx^{\prime})-\int\limits_{0}^{1}\cos(2\pi j(x-x^{\prime}))s^{\prime\prime}(x)dx\right).$$

Then, there exists a deterministic constant \(\mathfrak{c}_{2}>0\), not depending on \(n\), \(K\), \(K^{\prime}\), and \(x^{\prime}\), such that

$$\langle K(x^{\prime},.),s_{K^{\prime},\ell}\rangle_{2}^{2}\leqslant\mathfrak{c}_{2}\left[1+\left(\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\alpha_{j}(x^{\prime})}{j}\right)^{2}+\left(\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\beta_{j}(x^{\prime})}{j^{2}}\right)^{2}\right].$$
(A.3)

Let us show that each term of the right-hand side of inequality (A.3) is uniformly bounded in \(x^{\prime}\), \(m\), and \(m^{\prime}\). On the one hand,

$$\left|\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\beta_{j}(x^{\prime})}{j^{2}}\right|\leqslant\max_{j\in\{1,\dots,n\}}||\beta_{j}||_{\infty}\sum_{j=1}^{n}\frac{1}{j^{2}}\leqslant\frac{1}{24}(2||s^{\prime}||_{\infty}+||s^{\prime\prime}||_{\infty}).$$

On the other hand, for every \(x\in]0,\pi[\) such that \([\pi/x]+1\leqslant\mathfrak{m}_{1}(m,m^{\prime})\) (without loss of generality), by Lemma A.1,

$$\left|\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\sin(jx)}{j}\right|\leqslant\left|\sum_{j=1}^{[\pi/x]}\frac{\sin(jx)}{j}\right|+\left|\sum_{j=[\pi/x]+1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\sin(jx)}{j}\right|$$

$${}\leqslant x\left[\frac{\pi}{x}\right]+\frac{2}{(1+[\pi/x])\sin(x/2)}\leqslant\pi+2.$$
(A.4)

Since \(x\mapsto\sin(x)\) is continuous, odd and \(2\pi\)-periodic, inequality (A.4) holds true for every \(x\in\mathbb{R}\). So,

$$\left|\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\alpha_{j}(x^{\prime})}{j}\right|\leqslant\pi+2.$$

Therefore,

$$\mathbb{E}\left[\sup_{K,K^{\prime}\in\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})}\langle K(X_{1},.),s_{K^{\prime},\ell}\rangle_{2}^{2}\right]\leqslant\mathfrak{c}_{2}\left(1+(\pi+2)^{2}+\frac{1}{24^{2}}(2||s^{\prime}||_{\infty}+||s^{\prime\prime}||_{\infty})^{2}\right).$$

A.4.1. Proof of Lemma A.1. For any \(x\in[0,2\pi]\) and \(q\in\mathbb{N}^{*}\), consider

$$f_{q}(x):=\sum_{j=1}^{q}\frac{\sin(jx)}{j}\textrm{, }g_{q}(x):=\sum_{j=1}^{q}\left(\frac{1}{j}-\frac{1}{j+1}\right)h_{j}(x)\quad\textrm{and}\quad h_{q}(x):=\sum_{j=1}^{q}\sin(jx).$$

On the one hand,

$$g_{q}(x)=h_{1}(x)-\frac{1}{q+1}h_{q}(x)+\sum_{j=2}^{q}\frac{1}{j}(h_{j}(x)-h_{j-1}(x)).$$

Then,

$$f_{q}(x)=g_{q}(x)+\frac{1}{q+1}h_{q}(x).$$

On the other hand,

$$h_{q}(x)=\textrm{Im}\left(\sum_{j=1}^{q}e^{\mathbf{i}jx}\right)=\textrm{Im}\left[e^{\mathbf{i}(q+1)x/2}\frac{\sin(qx/2)}{\sin(x/2)}\right]$$

$${}=\frac{\sin((q+1)x/2)\sin(qx/2)}{\sin(x/2)}=\frac{\cos(x/2)-\cos((q+1/2)x)}{2\sin(x/2)}.$$

Then,

$$\sin\left(\frac{x}{2}\right)|h_{q}(x)|\leqslant 1$$

and, for any \(p\in\mathbb{N}^{*}\) such that \(q>p\),

$$\sin\left(\frac{x}{2}\right)|g_{q}(x)-g_{p}(x)|\leqslant\frac{1}{p+1}-\frac{1}{q+1}.$$

Therefore,

$$\sin\left(\frac{x}{2}\right)|f_{q}(x)-f_{p}(x)|\leqslant\sin\left(\frac{x}{2}\right)|g_{q}(x)-g_{p}(x)|+\sin\left(\frac{x}{2}\right)\frac{|h_{q}(x)|}{q+1}+\sin\left(\frac{x}{2}\right)\frac{|h_{p}(x)|}{p+1}$$

$${}\leqslant\frac{2}{p+1}.$$

In conclusion,

$$\left|\sum_{j=p+1}^{q}\frac{\sin(jx)}{k}\right|\leqslant\frac{2}{(1+p)\sin(x/2)}.$$

B. PROOFS OF RISK BOUNDS

B.1. Preliminary Results

This subsection provides three lemmas used several times in the sequel.

Lemma B.1. Consider

$$U_{K,K^{\prime},\ell}(n):=\sum_{i\not=j}\langle K(X_{i},.)\ell(Y_{i})-s_{K,\ell},K^{\prime}(X_{j},.)\ell(Y_{j})-s_{K^{\prime},\ell}\rangle_{2};\quad\forall K,K^{\prime}\in\mathcal{K}_{n}.$$
(B.1)

Under Assumptions 2.1.(1)–2.1.(3), if \(s\in\mathbb{L}^{2}(\mathbb{R}^{d})\) and if there exists \(\alpha>0\) such that \(\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))<\infty\), then there exists a deterministic constant \(\mathfrak{c}_{B.1}>0\), not depending on \(n\), such that for every \(\theta\in]0,1[\),

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\left\{\frac{|U_{K,K^{\prime},\ell}(n)|}{n^{2}}-\frac{\theta}{n}\overline{s}_{K^{\prime},\ell}\right\}\right)\leqslant\mathfrak{c}_{B.1}\frac{\log(n)^{5}}{\theta n}.$$

Lemma B.2. Consider

$$V_{K,\ell}(n):=\frac{1}{n}\sum_{i=1}^{n}||K(X_{i},.)\ell(Y_{i})-s_{K,\ell}||_{2}^{2};\quad\forall K\in\mathcal{K}_{n}.$$

Under Assumptions 2.1.(1), 2.1.(2), if \(s\in\mathbb{L}^{2}(\mathbb{R}^{d})\) and if there exists \(\alpha>0\) such that \(\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))<\infty\), then there exists a deterministic constant \(\mathfrak{c}_{B.2}>0\), not depending on \(n\), such that for every \(\theta\in]0,1[\),

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\frac{1}{n}|V_{K,\ell}(n)-\overline{s}_{K,\ell}|-\frac{\theta}{n}\overline{s}_{K,\ell}\right\}\right)\leqslant\mathfrak{c}_{B.2}\frac{\log(n)^{3}}{\theta n}.$$

Lemma B.3. Consider

$$W_{K,K^{\prime},\ell}(n):=\langle\widehat{s}_{K,\ell}(n;.)-s_{K,\ell},s_{K^{\prime},\ell}-s\rangle_{2};\quad\forall K,K^{\prime}\in\mathcal{K}_{n}.$$
(B.2)

Under Assumptions 2.1.(1), 2.1.(2), 2.1.(4), if \(s\in\mathbb{L}^{2}(\mathbb{R}^{d})\) and if there exists \(\alpha>0\) such that \(\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))<\infty\), then there exists a deterministic constant \(\mathfrak{c}_{B.3}>0\), not depending on \(n\), such that for every \(\theta\in]0,1[\),

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\{|W_{K,K^{\prime},\ell}(n)|-\theta||s_{K^{\prime},\ell}-s||_{2}^{2}\}\right)\leqslant\mathfrak{c}_{B.3}\frac{\log(n)^{4}}{\theta n}.$$

B.1.1. Proof of Lemma B.1. The proof of Lemma B.1 relies on the following concentration inequality for U-statistics, proved in dimension \(1\) in Houdré and Reynaud-Bouret [11] first, and then extended to the infinite-dimensional framework by Giné and Nickl [9].

Lemma B.4. Let \(\xi_{1},\dots,\xi_{n}\) be i.i.d. random variables on a Polish space \(\Xi\) equipped with its Borel \(\sigma\)-algebra. Let \(f_{i,j}\), \(1\leqslant i\not=j\leqslant n\), be some bounded and symmetric measurable maps from \(\Xi^{2}\) into \(\mathbb{R}\) such that, for every \(i\not=j\),

$$f_{i,j}=f_{j,i}\quad{and}\quad\mathbb{E}(f_{i,j}(z,\xi_{1}))=0 \, dz\textrm{-a.e.}$$

Consider the totally degenerate second order U-statistic

$$U_{n}:=\sum_{i\not=j}f_{i,j}(\xi_{i},\xi_{j}).$$

There exists a universal constant \(\mathfrak{m}>0\) such that for every \(\lambda>0\),

$$\mathbb{P}(U_{n}\leqslant\mathfrak{m}(\mathfrak{c}_{n}\lambda^{1/2}+\mathfrak{d}_{n}\lambda+\mathfrak{b}_{n}\lambda^{3/2}+\mathfrak{a}_{n}\lambda^{2}))\geqslant 1-2.7e^{-\lambda},$$

where

$$\mathfrak{a}_{n}=\sup_{i,j=1,\dots,n}\left\{\sup_{z,z^{\prime}\in\Xi}|f_{i,j}(z,z^{\prime})|\right\},$$

$$\mathfrak{b}_{n}^{2}=\max\left\{\sup_{i,z}\sum_{j=1}^{i-1}\mathbb{E}(f_{i,j}(z,\xi_{j})^{2});\ \sup_{j,z^{\prime}}\sum_{i=j+1}^{n}\mathbb{E}(f_{i,j}(\xi_{i},z^{\prime})^{2})\right\},$$

$$\mathfrak{c}_{n}^{2}=\sum_{i\not=j}\mathbb{E}(f_{i,j}(\xi_{i},\xi_{j})^{2}){ and}$$

$$\mathfrak{d}_{n}=\sup_{(a,b)\in\mathcal{A}}\mathbb{E}\left[\sum_{i<j}f_{i,j}(\xi_{i},\xi_{j})a_{i}(\xi_{i})b_{j}(\xi_{j})\right]$$

with

$$\mathcal{A}=\left\{(a,b):\mathbb{E}\left(\sum_{i=1}^{n-1}a_{i}(\xi_{i})^{2}\right)\leqslant 1\quad{and}\quad\mathbb{E}\left(\sum_{j=2}^{n}b_{j}(\xi_{j})^{2}\right)\leqslant 1\right\}.$$

See Giné and Nickl [9], Theorem 3.4.8 for a proof.

Consider \(\mathfrak{m}(n):=8\log(n)/\alpha\). For any \(K,K^{\prime}\in\mathcal{K}_{n}\),

$$U_{K,K^{\prime},\ell}(n)=U_{K,K^{\prime},\ell}^{1}(n)+U_{K,K^{\prime},\ell}^{2}(n)+U_{K,K^{\prime},\ell}^{3}(n)+U_{K,K^{\prime},\ell}^{4}(n),$$

where

$$U_{K,K^{\prime},\ell}^{l}(n):=\sum_{i\not=j}g_{K,K^{\prime},\ell}^{l}(n;X_{i},Y_{i},X_{j},Y_{j}),\quad l=1,2,3,4,$$

with, for every \((x^{\prime},y),(x^{\prime\prime},y^{\prime})\in E=\mathbb{R}^{d}\times\mathbb{R}\),

$$g_{K,K^{\prime},\ell}^{1}(n;x^{\prime},y,x^{\prime\prime},y^{\prime})$$

$${}:=\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}-s_{K,\ell}^{+}(n;.),K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}-s_{K^{\prime},\ell}^{+}(n;.)\rangle_{2},$$

$$g_{K,K^{\prime},\ell}^{2}(n;x^{\prime},y,x^{\prime\prime},y^{\prime})$$

$${}:=\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}-s_{K,\ell}^{-}(n;.),K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}-s_{K^{\prime},\ell}^{+}(n;.)\rangle_{2},$$

$$g_{K,K^{\prime},\ell}^{3}(n;x^{\prime},y,x^{\prime\prime},y^{\prime})$$

$${}:=\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}-s_{K,\ell}^{+}(n;.),K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}-s_{K^{\prime},\ell}^{-}(n;.)\rangle_{2},$$

$$g_{K,K^{\prime},\ell}^{4}(n;x^{\prime},y,x^{\prime\prime},y^{\prime})$$

$${}:=\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}-s_{K,\ell}^{-}(n;.),K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}-s_{K^{\prime},\ell}^{-}(n;.)\rangle_{2}$$

and, for every \(k\in\mathcal{K}_{n}\),

$$s_{k,\ell}^{+}(n;.):=\mathbb{E}(k(X_{1},.)\ell(Y_{1})\mathbf{1}_{|\ell(Y_{1})|\leqslant\mathfrak{m}(n)})\quad\textrm{and}\quad s_{k,\ell}^{-}(n;.):=\mathbb{E}(k(X_{1},.)\ell(Y_{1})\mathbf{1}_{|\ell(Y_{1})|>\mathfrak{m}(n)}).$$

On the one hand, since \(\mathbb{E}(g_{K,K^{\prime},\ell}^{1}(n;x^{\prime},y,X_{1},Y_{1}))=0\) for every \((x^{\prime},y)\in E\), by Lemma B.4, there exists a universal constant \(\mathfrak{m}\geqslant 1\) such that for any \(\lambda>0\), with probability larger than \(1-5.4e^{-\lambda}\),

$$\frac{|U_{K,K^{\prime},\ell}^{1}(n)|}{n^{2}}\leqslant\frac{\mathfrak{m}}{n^{2}}(\mathfrak{c}_{K,K^{\prime},\ell}(n)\lambda^{1/2}+\mathfrak{d}_{K,K^{\prime},\ell}(n)\lambda+\mathfrak{b}_{K,K^{\prime},\ell}(n)\lambda^{3/2}+\mathfrak{a}_{K,K^{\prime},\ell}(n)\lambda^{2}),$$

where the constants \(\mathfrak{a}_{K,K^{\prime},\ell}(n)\), \(\mathfrak{b}_{K,K^{\prime},\ell}(n)\), \(\mathfrak{c}_{K,K^{\prime},\ell}(n)\), and \(\mathfrak{d}_{K,K^{\prime},\ell}(n)\) are defined and controlled later. First, note that

$$U_{K,K^{\prime},\ell}^{1}(n)=\sum_{i\not=j}(\varphi_{K,K^{\prime},\ell}(n;X_{i},Y_{i},X_{j},Y_{j})$$

$${}-\psi_{K,K^{\prime},\ell}(n;X_{i},Y_{i})-\psi_{K^{\prime},K,\ell}(n;X_{j},Y_{j})+\mathbb{E}(\varphi_{K,K^{\prime},\ell}(n;X_{i},Y_{i},X_{j},Y_{j}))),$$
(B.3)

where

$$\varphi_{K,K^{\prime},\ell}(n;x^{\prime},y,x^{\prime\prime},y^{\prime\prime}):=\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)},K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y^{\prime})|\leqslant\mathfrak{m}(n)}\rangle_{2}$$

and

$$\psi_{k,k^{\prime},\ell}(n;x^{\prime},y):=\langle k(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)},s_{k^{\prime},\ell}^{+}(n;.)\rangle_{2}=\mathbb{E}(\varphi_{k,k^{\prime},\ell}(n;x^{\prime},y,X_{1},Y_{1}))$$

for every \(k,k^{\prime}\in\mathcal{K}_{n}\) and \((x^{\prime},y),(x^{\prime\prime},y^{\prime})\in E\). Let us now control \(\mathfrak{a}_{K,K^{\prime},\ell}(n)\), \(\mathfrak{b}_{K,K^{\prime},\ell}(n)\), \(\mathfrak{c}_{K,K^{\prime},\ell}(n)\), and \(\mathfrak{d}_{K,K^{\prime},\ell}(n)\).

  • The constant \(\mathfrak{a}_{K,K^{\prime},\ell}(n)\). Consider

    $$\mathfrak{a}_{K,K^{\prime},\ell}(n):=\sup_{(x^{\prime},y),(x^{\prime\prime},y^{\prime})\in E}|g_{K,K^{\prime},\ell}^{1}(n;x^{\prime},y,x^{\prime\prime},y^{\prime})|.$$

    By (B.3), Cauchy–Schwarz’s inequality and Assumption 2.1.(1),

    $$\mathfrak{a}_{K,K^{\prime},\ell}(n)\leqslant 4\sup_{(x^{\prime},y),(x^{\prime\prime},y^{\prime})\in E}|\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)},K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y^{\prime})|\leqslant\mathfrak{m}(n)}\rangle_{2}|$$
    $${}\leqslant 4\mathfrak{m}(n)^{2}\left(\sup_{x^{\prime}\in\mathbb{R}^{d}}||K(x^{\prime},.)||_{2}\right)\left(\sup_{x^{\prime\prime}\in\mathbb{R}^{d}}||K^{\prime}(x^{\prime\prime},.)||_{2}\right)\leqslant 4\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}n.$$

    So,

    $$\frac{1}{n^{2}}\mathfrak{a}_{K,K^{\prime},\ell}(n)\lambda^{2}\leqslant\frac{4}{n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}\lambda^{2}.$$
  • The constant \(\mathfrak{b}_{K,K^{\prime},\ell}(n)\). Consider

    $$\mathfrak{b}_{K,K^{\prime},\ell}(n)^{2}:=n\sup_{(x^{\prime},y)\in E}\mathbb{E}(g_{K,K^{\prime},\ell}^{1}(n;x^{\prime},y,X_{1},Y_{1})^{2}).$$

    By (B.3), Jensen’s inequality, Cauchy–Schwarz’s inequality and Assumption 2.1.(1),

    $$\mathfrak{b}_{K,K^{\prime},\ell}(n)^{2}\leqslant 16n\sup_{(x^{\prime},y)\in E}\mathbb{E}(\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)},K^{\prime}(X_{1},.)\ell(Y_{1})\mathbf{1}_{|\ell(Y_{1})|\leqslant\mathfrak{m}(n)}\rangle_{2}^{2})$$
    $${}\leqslant 16n\mathfrak{m}(n)^{2}\sup_{x^{\prime}\in\mathbb{R}^{d}}||K(x^{\prime},.)||_{2}^{2}\mathbb{E}(||K^{\prime}(X_{1},.)\ell(Y_{1})\mathbf{1}_{|\ell(Y_{1})|\leqslant\mathfrak{m}(n)}||_{2}^{2})\leqslant 16\mathfrak{m}_{\mathcal{K},\ell}n^{2}\mathfrak{m}(n)^{2}\overline{s}_{K^{\prime},\ell}.$$

    So, for any \(\theta\in]0,1[\),

    $$\frac{1}{n^{2}}\mathfrak{b}_{K,K^{\prime},\ell}(n)\lambda^{3/2}\leqslant 2\left(\frac{3\mathfrak{m}}{\theta}\right)^{1/2}\frac{2}{n^{1/2}}\mathfrak{m}_{\mathcal{K},\ell}^{1/2}\mathfrak{m}(n)\lambda^{3/2}\times\left(\frac{\theta}{3\mathfrak{m}}\right)^{1/2}\frac{1}{n^{1/2}}\overline{s}_{K^{\prime},\ell}^{1/2}$$
    $${}\leqslant\frac{\theta}{3\mathfrak{m}n}\overline{s}_{K^{\prime},\ell}+\frac{12\mathfrak{m}\lambda^{3}}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$
  • The constant \(\mathfrak{c}_{K,K^{\prime},\ell}(n)\). Consider

    $$\mathfrak{c}_{K,K^{\prime},\ell}(n)^{2}:=n^{2}\mathbb{E}(g_{K,K^{\prime},\ell}^{1}(n;X_{1},Y_{1},X_{2},Y_{2})^{2}).$$

    By (B.3), Jensen’s inequality and Assumption 2.1.(3),

    $$\mathfrak{c}_{K,K^{\prime},\ell}(n)^{2}\leqslant 16n^{2}\mathbb{E}(\langle K(X_{1},.)\ell(Y_{1})\mathbf{1}_{|\ell(Y_{1})|\leqslant\mathfrak{m}(n)},K^{\prime}(X_{2},.)\ell(Y_{2})\mathbf{1}_{|\ell(Y_{2})|\leqslant\mathfrak{m}(n)}\rangle_{2}^{2})$$
    $${}\leqslant 16n^{2}\mathfrak{m}(n)^{2}\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})\leqslant 16\mathfrak{m}_{\mathcal{K},\ell}n^{2}\mathfrak{m}(n)^{2}\overline{s}_{K^{\prime},\ell}.$$

    So,

    $$\frac{1}{n^{2}}\mathfrak{c}_{K,K^{\prime},\ell}(n)\lambda^{1/2}\leqslant\frac{\theta}{3\mathfrak{m}n}\overline{s}_{K^{\prime},\ell}+\frac{12\mathfrak{m}\lambda}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$
  • The constant \(\mathfrak{d}_{K,K^{\prime},\ell}(n)\). Consider

    $$\mathfrak{d}_{K,K^{\prime},\ell}(n):=\sup_{(a,b)\in\mathcal{A}}\mathbb{E}\left[\sum_{i<j}a_{i}(X_{i},Y_{i})b_{j}(X_{j},Y_{j})g_{K,K^{\prime},\ell}^{1}(n;X_{i},Y_{i},X_{j},Y_{j})\right],$$

    where

    $$\mathcal{A}:=\left\{(a,b):\sum_{i=1}^{n-1}\mathbb{E}(a_{i}(X_{i},Y_{i})^{2})\leqslant 1\quad\textrm{and}\quad\sum_{j=2}^{n}\mathbb{E}(b_{j}(X_{j},Y_{j})^{2})\leqslant 1\right\}.$$

    By (B.3), Jensen’s inequality, Cauchy-Schwarz’s inequality and Assumption 2.1.(3),

    $$\mathfrak{d}_{K,K^{\prime},\ell}(n)\leqslant 4\sup_{(a,b)\in\mathcal{A}}\mathbb{E}\left[\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}|a_{i}(X_{i},Y_{i})b_{j}(X_{j},Y_{j})\varphi_{K,K^{\prime},\ell}(n;X_{i},Y_{i},X_{j},Y_{j})|\right]$$
    $${}\leqslant 4n\mathfrak{m}(n)\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})^{1/2}\leqslant 4\mathfrak{m}_{\mathcal{K},\ell}^{1/2}n\mathfrak{m}(n)\overline{s}_{K^{\prime},\ell}^{1/2}.$$

    So,

    $$\frac{1}{n^{2}}\mathfrak{d}_{K,K^{\prime},\ell}(n)\lambda\leqslant\frac{\theta}{3\mathfrak{m}n}\overline{s}_{K^{\prime},\ell}+\frac{12\mathfrak{m}\lambda^{2}}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$

Then, since \(\mathfrak{m}\geqslant 1\) and \(\lambda>0\), with probability larger than \(1-5.4e^{-\lambda}\),

$$\frac{|U_{K,K^{\prime},\ell}^{1}(n)|}{n^{2}}\leqslant\frac{\theta}{n}\overline{s}_{K^{\prime},\ell}+\frac{40\mathfrak{m}^{2}}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}(1+\lambda)^{3}.$$

So, with probability larger than \(1-5.4|\mathcal{K}_{n}|^{2}e^{-\lambda}\),

$$S_{\mathcal{K},\ell}(n,\theta):=\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\left\{\frac{|U_{K,K^{\prime},\ell}^{1}(n)|}{n^{2}}-\frac{\theta}{n}\overline{s}_{K^{\prime},\ell}\right\}\leqslant\frac{40\mathfrak{m}^{2}}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}(1+\lambda)^{3}.$$

For every \(t\in\mathbb{R}_{+}\), consider

$$\lambda_{\mathcal{K},\ell}(n,\theta,t):=-1+\left(\frac{t}{\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\right)^{1/3}\textrm{\ with\ }\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)=\frac{40\mathfrak{m}^{2}}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$

Then, for any \(T>0\),

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant T+\int\limits_{T}^{\infty}\mathbb{P}(S_{\mathcal{K},\ell}(n,\theta)\geqslant(1+\lambda_{\mathcal{K},\ell}(n,\theta,t))^{3}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta))dt$$

$${}\leqslant T+5.4|\mathcal{K}_{n}|^{2}\int\limits_{T}^{\infty}\exp(-\lambda_{\mathcal{K},\ell}(n,\theta,t))dt$$

$${}=T+5.4|\mathcal{K}_{n}|^{2}\int\limits_{T}^{\infty}\exp\left(-\frac{t^{1/3}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/3}}\right)\exp\left(1-\frac{t^{1/3}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/3}}\right)dt$$

$${}\leqslant T+5.4\mathfrak{c}_{1}|\mathcal{K}_{n}|^{2}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\exp\left(-\frac{T^{1/3}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/3}}\right)\textrm{\ with\ }\mathfrak{c}_{1}=\int\limits_{0}^{\infty}e^{1-r^{1/3}/2}dr.$$

Moreover,

$$\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\leqslant\mathfrak{c}_{2}\frac{\log(n)^{2}}{\theta n}\textrm{\ with\ }\mathfrak{c}_{2}=\frac{40\times 8^{2}\mathfrak{m}^{2}}{\alpha^{2}}\mathfrak{m}_{\mathcal{K},\ell}.$$

So, by taking

$$T=2^{4}\mathfrak{c}_{2}\frac{\log(n)^{5}}{\theta n},$$

and since \(|\mathcal{K}_{n}|\leqslant n\),

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant 2^{4}\mathfrak{c}_{2}\frac{\log(n)^{5}}{\theta n}+5.4\mathfrak{c}_{1}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\frac{|\mathcal{K}_{n}|^{2}}{n^{2}}\leqslant(2^{4}+5.4\mathfrak{c}_{1})\mathfrak{c}_{2}\frac{\log(n)^{5}}{\theta n}.$$

On the other hand, by Assumption 2.1.(1), Cauchy–Schwarz’s inequality and Markov’s inequality,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}|g_{K,K^{\prime},\ell}^{2}(n;X_{1},Y_{1},X_{2},Y_{2})|\right)$$

$${}\leqslant 4\mathfrak{m}(n)\sum_{K,K^{\prime}\in\mathcal{K}_{n}}\mathbb{E}(|\ell(Y_{1})|\mathbf{1}_{|\ell(Y_{1})|>\mathfrak{m}(n)}|\langle K(X_{1},.),K^{\prime}(X_{2},.)\rangle_{2}|)$$

$${}\leqslant 4\mathfrak{m}(n)\mathfrak{m}_{\mathcal{K},\ell}n|\mathcal{K}_{n}|^{2}\mathbb{E}(\ell(Y_{1})^{2})^{1/2}\mathbb{P}(|\ell(Y_{1})|>\mathfrak{m}(n))^{1/2}\leqslant\mathfrak{c}_{3}\frac{\log(n)}{n}$$

with

$$\mathfrak{c}_{3}=\frac{32}{\alpha}\mathfrak{m}_{\mathcal{K},\ell}\mathbb{E}(\ell(Y_{1})^{2})^{1/2}\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))^{1/2}.$$

So,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\frac{|U_{K,K^{\prime},\ell}^{2}(n)|}{n^{2}}\right)\leqslant\mathfrak{c}_{3}\frac{\log(n)}{n}$$

and, symmetrically,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\frac{|U_{K,K^{\prime},\ell}^{3}(n)|}{n^{2}}\right)\leqslant\mathfrak{c}_{3}\frac{\log(n)}{n}.$$

By Assumption 2.1.(1), Cauchy–Schwarz’s inequality and Markov’s inequality,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}|g_{K,K^{\prime},\ell}^{4}(n;X_{1},Y_{1},X_{2},Y_{2})|\right)$$

$${}\leqslant 4\sum_{K,K^{\prime}\in\mathcal{K}_{n}}\mathbb{E}(|\ell(Y_{1})\ell(Y_{2})|\mathbf{1}_{|\ell(Y_{1})|,|\ell(Y_{2})|>\mathfrak{m}(n)}|\langle K(X_{1},.),K^{\prime}(X_{2},.)\rangle_{2}|)$$

$${}\leqslant 4\mathfrak{m}_{\mathcal{K},\ell}n|\mathcal{K}_{n}|^{2}\mathbb{E}(\ell(Y_{1})^{2})\mathbb{P}(|\ell(Y_{1})|>\mathfrak{m}(n))\leqslant\frac{\mathfrak{c}_{4}}{n^{5}}$$

with

$$\mathfrak{c}_{4}=4\mathfrak{m}_{\mathcal{K},\ell}\mathbb{E}(\ell(Y_{1})^{2})\mathbb{E}(\exp(\alpha|\ell(Y_{1})|)).$$

So,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\frac{|U_{K,K^{\prime},\ell}^{4}(n)|}{n^{2}}\right)\leqslant\frac{\mathfrak{c}_{4}}{n^{5}}.$$

Therefore,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\left\{\frac{|U_{K,K^{\prime},\ell}(n)|}{n^{2}}-\frac{\theta}{n}\overline{s}_{K^{\prime},\ell}\right\}\right)\leqslant(2^{4}+5.4\mathfrak{c}_{1})\mathfrak{c}_{2}\frac{\log(n)^{5}}{\theta n}+2\mathfrak{c}_{3}\frac{\log(n)}{n}+\frac{\mathfrak{c}_{4}}{n^{5}}.$$

B.1.2. Proof of Lemma B.2. First, the two following results are used several times in the sequel:

$$||s_{K,\ell}||_{2}^{2}\leqslant\mathbb{E}(\ell(Y_{1})^{2})\int\limits_{\mathbb{R}^{d}}f(x^{\prime})\int\limits_{\mathbb{R}^{d}}K(x^{\prime},x)^{2}\lambda_{d}(dx)\lambda_{d}(dx^{\prime})\leqslant\mathbb{E}(\ell(Y_{1})^{2})\mathfrak{m}_{\mathcal{K},\ell}n$$
(B.4)

and

$$\mathbb{E}(V_{K,\ell}(n))=\mathbb{E}(||K(X_{1},.)\ell(Y_{1})-s_{K,\ell}||_{2}^{2})$$

$${}=\mathbb{E}(||K(X_{1},.)\ell(Y_{1})||_{2}^{2})+||s_{K,\ell}||_{2}^{2}-2\int\limits_{\mathbb{R}^{d}}s_{K,\ell}(x)\mathbb{E}(K(X_{1},x)\ell(Y_{1}))\lambda_{d}(dx)=\overline{s}_{K,\ell}-||s_{K,\ell}||_{2}^{2}.$$
(B.5)

Consider \(\mathfrak{m}(n):=2\log(n)/\alpha\) and

$$v_{K,\ell}(n):=V_{K,\ell}(n)-\mathbb{E}(V_{K,\ell}(n))=v_{K,\ell}^{1}(n)+v_{K,\ell}^{2}(n),$$

where

$$v_{K,\ell}^{j}(n)=\frac{1}{n}\sum_{i=1}^{n}(g_{K,\ell}^{j}(n;X_{i},Y_{i})-\mathbb{E}(g_{K,\ell}^{j}(n;X_{i},Y_{i})));\quad j=1,2,$$

with, for every \((x^{\prime},y)\in E\),

$$g_{K,\ell}^{1}(n;x^{\prime},y):=||K(x^{\prime},.)\ell(y)-s_{K,\ell}||_{2}^{2}\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}$$

and

$$g_{K,\ell}^{2}(n;x^{\prime},y):=||K(x^{\prime},.)\ell(y)-s_{K,\ell}||_{2}^{2}\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}.$$

On the one hand, by Bernstein’s inequality, for any \(\lambda>0\), with probability larger than \(1-2e^{-\lambda}\),

$$|v_{K,\ell}^{1}(n)|\leqslant\sqrt{\frac{2\lambda}{n}\mathfrak{v}_{K,\ell}(n)}+\frac{\lambda}{n}\mathfrak{c}_{K,\ell}(n),$$

where

$$\mathfrak{c}_{K,\ell}(n)=\frac{||g_{K,\ell}^{1}(n;.)||_{\infty}}{3}\quad\textrm{and}\quad\mathfrak{v}_{K,\ell}(n)=\mathbb{E}(g_{K,\ell}^{1}(n;X_{1},Y_{1})^{2}).$$

Moreover,

$$\mathfrak{c}_{K,\ell}(n)=\frac{1}{3}\sup_{(x^{\prime},y)\in E}||K(x^{\prime},.)\ell(y)-s_{K,\ell}||_{2}^{2}\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}$$

$${}\leqslant\frac{2}{3}\left(\mathfrak{m}(n)^{2}\sup_{x^{\prime}\in\mathbb{R}^{d}}||K(x^{\prime},.)||_{2}^{2}+||s_{K,\ell}||_{2}^{2}\right)\leqslant\frac{2}{3}(\mathfrak{m}(n)^{2}+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}n$$

by inequality (B.4), and

$$\mathfrak{v}_{K,\ell}(n)\leqslant||g_{K,\ell}^{1}(n;.)||_{\infty}\mathbb{E}(V_{K,\ell}(n))$$

$${}\leqslant 2(\mathfrak{m}(n)^{2}+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}n(\overline{s}_{K,\ell}-||s_{K,\ell}||_{2}^{2})$$

by inequality (B.4) and equality (B.5). Then, for any \(\theta\in]0,1[\),

$$|v_{K,\ell}^{1}(n)|\leqslant 2\sqrt{\lambda(\mathfrak{m}(n)^{2}+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}(\overline{s}_{K,\ell}-||s_{K,\ell}||_{2}^{2})}+\frac{2\lambda}{3}(\mathfrak{m}(n)^{2}+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}$$

$${}\leqslant\theta\overline{s}_{K,\ell}+\frac{5\lambda}{3\theta}(1+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}$$

with probability larger than \(1-2e^{-\lambda}\). So, with probability larger than \(1-2|\mathcal{K}_{n}|e^{-\lambda}\),

$$S_{\mathcal{K},\ell}(n,\theta):=\sup_{K\in\mathcal{K}_{n}}\left\{\frac{|v_{K,\ell}^{1}(n)|}{n}-\frac{\theta}{n}\overline{s}_{K,\ell}\right\}\leqslant\frac{5\lambda}{3\theta n}(1+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$

For every \(t\in\mathbb{R}_{+}\), consider

$$\lambda_{\mathcal{K},\ell}(n,\theta,t):=\frac{t}{\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\textrm{\ with\ }\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)=\frac{5}{3\theta n}(1+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$

Then, for any \(T>0\),

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant T+\int\limits_{T}^{\infty}\mathbb{P}(S_{\mathcal{K},\ell}(n,\theta)\geqslant\lambda_{\mathcal{K},\ell}(n,\theta,t)\mathfrak{m}_{\mathcal{K},\ell}(n,\theta))dt$$

$${}\leqslant T+2|\mathcal{K}_{n}|\int\limits_{T}^{\infty}\exp(-\lambda_{\mathcal{K},\ell}(n,\theta,t))dt$$

$${}=T+2|\mathcal{K}_{n}|\int\limits_{T}^{\infty}\exp\left(-\frac{t}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\right)\exp\left(-\frac{t}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\right)dt$$

$${}\leqslant T+2\mathfrak{c}_{1}|\mathcal{K}_{n}|\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\exp\left(-\frac{T}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\right)\textrm{\ with\ }\mathfrak{c}_{1}=\int\limits_{0}^{\infty}e^{-r/2}dr=2.$$
(B.6)

Moreover,

$$\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\leqslant\mathfrak{c}_{2}\frac{\log(n)^{2}}{\theta n}\textrm{\ with\ }\mathfrak{c}_{2}=\frac{20}{3\alpha^{2}}(1+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}.$$

So, by taking

$$T=2\mathfrak{c}_{2}\frac{\log(n)^{3}}{\theta n},$$

and since \(|\mathcal{K}_{n}|\leqslant n\),

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant 2\mathfrak{c}_{2}\frac{\log(n)^{3}}{\theta n}+4\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\frac{|\mathcal{K}_{n}|}{n}\leqslant 6\mathfrak{c}_{2}\frac{\log(n)^{3}}{\theta n}.$$

On the other hand, by inequality (B.4) and Markov’s inequality,

$$\mathbb{E}\left[\sup_{K\in\mathcal{K}_{n}}\frac{|v_{K,\ell}^{2}(n)|}{n}\right]\leqslant\frac{2}{n}\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}||K(X_{1},.)\ell(Y_{1})-s_{K,\ell}||_{2}^{2}\mathbf{1}_{|\ell(Y_{1})|>\mathfrak{m}(n)}\right)$$

$${}\leqslant\frac{4}{n}\mathbb{E}\left[\left|\ell(Y_{1})^{2}\sup_{K\in\mathcal{K}_{n}}||K(X_{1},.)||_{2}^{2}+\sup_{K\in\mathcal{K}_{n}}||s_{K,\ell}||_{2}^{2}\right|^{2}\right]^{1/2}\mathbb{P}(|\ell(Y_{1})|>\mathfrak{m}(n))^{1/2}\leqslant\frac{\mathfrak{c}_{3}}{n}$$

with

$$\mathfrak{c}_{3}=8\mathfrak{m}_{\mathcal{K},\ell}\mathbb{E}(\ell(Y_{1})^{4})^{1/2}\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))^{1/2}.$$

Therefore,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\frac{|v_{K,\ell}(n)|}{n}-\frac{\theta}{n}\overline{s}_{K,\ell})\right\}\right)\leqslant 6\mathfrak{c}_{2}\frac{\log(n)^{3}}{\theta n}+\frac{\mathfrak{c}_{3}}{n}$$

and, by equality (B.5), the definition of \(v_{K,\ell}(n)\) and Assumption 2.1.(2),

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\frac{1}{n}|V_{K,\ell}(n)-\overline{s}_{K,\ell}|-\frac{\theta}{n}\overline{s}_{K,\ell}\right\}\right)\leqslant 6\mathfrak{c}_{2}\frac{\log(n)^{3}}{\theta n}+\frac{\mathfrak{c}_{3}+\mathfrak{m}_{\mathcal{K},\ell}}{n}.$$

Remark B.5. As mentioned in Remark 2.10, replacing the exponential moment condition by the weaker \(q\)-th moment condition with \(q=(12-4\varepsilon)/\beta\), \(\varepsilon\in]0,1[\) and \(0<\beta<\varepsilon/2\), allows to get a rate of convergence of order \(1/n^{1-\varepsilon}\). Indeed, by inequality (B.6), with \(\mathfrak{m}(n)=n^{\beta}\) and

$$T=\frac{2\mathfrak{c}_{1}}{\theta n^{1-\varepsilon}}{\, with\, }\mathfrak{c}_{1}=\frac{5}{3}(1+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell},$$

and by letting \(\alpha=1+2\beta-\varepsilon\), there exist \(n_{\varepsilon,\alpha}\in\mathbb{N}^{*}\) and \(\mathfrak{c}_{\varepsilon,\alpha}>0\) not depending on \(n\), such that for any \(n\geqslant n_{\varepsilon,\alpha}\),

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant\frac{2\mathfrak{c}_{1}}{\theta n^{1-\varepsilon}}+4\mathfrak{c}_{1}|\mathcal{K}_{n}|\frac{n^{2\beta-1}}{\theta}\exp(-n^{\varepsilon-2\beta})$$

$${}\leqslant\frac{2\mathfrak{c}_{1}}{\theta n^{1-\varepsilon}}+4\mathfrak{c}_{1}\mathfrak{c}_{\varepsilon,\alpha}\frac{n^{2\beta}}{\theta n^{\alpha}}=\frac{2\mathfrak{c}_{1}(1+2\mathfrak{c}_{\varepsilon,\alpha})}{\theta n^{1-\varepsilon}}.$$

Furthermore, by Markov’s inequality,

$$\mathbb{P}(|\ell(Y_{1})|>n^{\beta})\leqslant\frac{\mathbb{E}(|\ell(Y_{1})|^{(12-4\varepsilon)/\beta})}{n^{12-4\varepsilon}}.$$

So, as previously, there exists a deterministic constant \(\mathfrak{c}_{2}>0\) such that

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}|W_{K,K^{\prime},\ell}^{2}(n)|\right)\leqslant\mathfrak{c}_{2}|\mathcal{K}_{n}|^{2}\mathbb{P}(|\ell(Y_{1})|>\mathfrak{m}(n))^{1/4}\leqslant\frac{\mathfrak{c}_{3}\mathbb{E}(|\ell(Y_{1})|^{(12-4\varepsilon)/\beta})^{1/4}}{n^{1-\varepsilon}},$$

and then

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\{|W_{K,K^{\prime},\ell}(n)|-\theta||s_{K^{\prime},\ell}-s||_{2}^{2}\}\right)$$

$${}\leqslant\frac{\mathfrak{c}_{3}}{\theta n^{1-\varepsilon}}{\, with\, }\mathfrak{c}_{3}=2\mathfrak{c}_{1}(1+2\mathfrak{c}_{\varepsilon,\alpha})+\mathfrak{c}_{2}\mathbb{E}(|\ell(Y_{1})|^{(12-4\varepsilon)/\beta})^{1/4}.$$

B.1.3. Proof of Lemma B.3. Consider \(\mathfrak{m}(n)=12\log(n)/\alpha\). For any \(K,K^{\prime}\in\mathcal{K}_{n}\),

$$W_{K,K^{\prime},\ell}(n)=W_{K,K^{\prime},\ell}^{1}(n)+W_{K,K^{\prime},\ell}^{2}(n),$$

where

$$W_{K,K^{\prime},\ell}^{j}(n):=\frac{1}{n}\sum_{i=1}^{n}(g_{K,K^{\prime},\ell}^{j}(n;X_{i},Y_{i})-\mathbb{E}(g_{K,K^{\prime},\ell}^{j}(n;X_{i},Y_{i})));\quad j=1,2,$$

with, for every \((x^{\prime},y)\in E\),

$$g_{K,K^{\prime},\ell}^{1}(n;x^{\prime},y):=\langle K(x^{\prime},.)\ell(y),s_{K^{\prime},\ell}-s\rangle_{2}\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}$$

and

$$g_{K,K^{\prime},\ell}^{2}(n;x^{\prime},y):=\langle K(x^{\prime},.)\ell(y),s_{K^{\prime},\ell}-s\rangle_{2}\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}.$$

On the one hand, by Bernstein’s inequality, for any \(\lambda>0\), with probability larger than \(1-2e^{-\lambda}\),

$$|W_{K,K^{\prime},\ell}^{1}(n)|\leqslant\sqrt{\frac{2\lambda}{n}\mathfrak{v}_{K,K^{\prime},\ell}(n)}+\frac{\lambda}{n}\mathfrak{c}_{K,K^{\prime},\ell}(n),$$

where

$$\mathfrak{c}_{K,K^{\prime},\ell}(n)=\frac{||g_{K,K^{\prime},\ell}^{1}(n;.)||_{\infty}}{3}\quad\textrm{and}\quad\mathfrak{v}_{K,K^{\prime},\ell}(n)=\mathbb{E}(g_{K,K^{\prime},\ell}^{1}(n;X_{1},Y_{1})^{2}).$$

Moreover,

$$\mathfrak{c}_{K,K^{\prime},\ell}(n)=\frac{1}{3}\sup_{(x^{\prime},y)\in E}|\langle K(x^{\prime},.)\ell(y),s_{K^{\prime},\ell}-s\rangle_{2}|\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}$$

$${}\leqslant\frac{1}{3}\mathfrak{m}(n)||s_{K^{\prime},\ell}-s||_{2}\sup_{x^{\prime}\in\mathbb{R}^{d}}||K(x^{\prime},.)||_{2}\leqslant\frac{1}{3}\mathfrak{m}_{\mathcal{K},\ell}^{1/2}n^{1/2}\mathfrak{m}(n)||s_{K^{\prime},\ell}-s||_{2}$$

by Assumption 2.1.(1) and

$$\mathfrak{v}_{K,\ell}(n)\leqslant\mathbb{E}(\langle K(X_{1},.)\ell(Y_{1}),s_{K^{\prime},\ell}-s\rangle_{2}^{2}\mathbf{1}_{|\ell(Y_{1})|\leqslant\mathfrak{m}(n)})$$

$${}\leqslant\mathfrak{m}(n)^{2}\mathfrak{m}_{\mathcal{K},\ell}||s_{K^{\prime},\ell}-s||_{2}^{2}$$

by Assumption 2.1.(4). Then, since \(\lambda>0\), for any \(\theta\in]0,1[\),

$$|W_{K,K^{\prime},\ell}^{1}(n)|\leqslant\sqrt{\frac{2\lambda}{n}\mathfrak{m}(n)^{2}\mathfrak{m}_{\mathcal{K},\ell}||s_{K^{\prime},\ell}-s||_{2}^{2}}+\frac{\lambda}{3n^{1/2}}\mathfrak{m}_{\mathcal{K},\ell}^{1/2}\mathfrak{m}(n)||s_{K^{\prime},\ell}-s||_{2}$$

$${}\leqslant\theta||s_{K^{\prime},\ell}-s||_{2}^{2}+\frac{\mathfrak{m}_{\mathcal{K},\ell}}{2\theta n}\mathfrak{m}(n)^{2}(1+\lambda)^{2}$$

with probability larger than \(1-2e^{-\lambda}\). So, with probability larger than \(1-2|\mathcal{K}_{n}|^{2}e^{-\lambda}\),

$$S_{\mathcal{K},\ell}(n,\theta):=\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\{|W_{K,K^{\prime},\ell}^{1}(n)|-\theta||s_{K^{\prime},\ell}-s||_{2}^{2}\}\leqslant\frac{\mathfrak{m}_{\mathcal{K},\ell}}{2\theta n}\mathfrak{m}(n)^{2}(1+\lambda)^{2}.$$

For every \(t\in\mathbb{R}_{+}\), consider

$$\lambda_{\mathcal{K},\ell}(n,\theta,t):=-1+\left(\frac{t}{\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\right)^{1/2}\textrm{\ with\ }\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)=\frac{\mathfrak{m}_{\mathcal{K},\ell}}{2\theta n}\mathfrak{m}(n)^{2}.$$

Then, for any \(T>0\),

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant T+\int\limits_{T}^{\infty}\mathbb{P}(S_{\mathcal{K},\ell}(n,\theta)\geqslant(1+\lambda_{\mathcal{K},\ell}(n,\theta,t))^{2}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta))dt$$

$${}\leqslant T+2|\mathcal{K}_{n}|^{2}\int\limits_{T}^{\infty}\exp(-\lambda_{\mathcal{K},\ell}(n,\theta,t))dt$$

$${}=T+2|\mathcal{K}_{n}|^{2}\int\limits_{T}^{\infty}\exp\left(-\frac{t^{1/2}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/2}}\right)\exp\left(1-\frac{t^{1/2}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/2}}\right)dt$$

$${}\leqslant T+2\mathfrak{c}_{1}|\mathcal{K}_{n}|^{2}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\exp\left(-\frac{T^{1/2}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/2}}\right)\textrm{\ with\ }\mathfrak{c}_{1}=\int\limits_{0}^{\infty}e^{1-r^{1/2}/2}dr.$$

Moreover,

$$\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\leqslant\mathfrak{c}_{2}\frac{\log(n)^{2}}{\theta n}\textrm{\ with\ }\mathfrak{c}_{2}=\frac{12^{2}}{2\alpha^{2}}\mathfrak{m}_{\mathcal{K},\ell}.$$

So, by taking

$$T=2^{3}\mathfrak{c}_{2}\frac{\log(n)^{4}}{\theta n},$$

and since \(|\mathcal{K}_{n}|\leqslant n\),

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant 2^{3}\mathfrak{c}_{2}\frac{\log(n)^{4}}{\theta n}+2\mathfrak{c}_{1}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\frac{|\mathcal{K}_{n}|^{2}}{n^{2}}\leqslant(2^{3}+2\mathfrak{c}_{1})\mathfrak{c}_{2}\frac{\log(n)^{4}}{\theta n}.$$

On the other hand, by Assumption 2.1.(2), 2.1.(4), Cauchy–Schwarz’s inequality and Markov’s inequality,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}|W_{K,K^{\prime},\ell}^{2}(n)|\right)\leqslant 2\mathbb{E}(\ell(Y_{1})^{2}\mathbf{1}_{|\ell(Y_{1})|>\mathfrak{m}(n)})^{1/2}\sum_{K,K^{\prime}\in\mathcal{K}_{n}}\mathbb{E}(\langle K(X_{1},.),s_{K^{\prime},\ell}-s\rangle_{2}^{2})^{1/2}$$

$${}\leqslant 2\mathfrak{m}_{\mathcal{K},\ell}^{1/2}||s_{K^{\prime},\ell}-s||_{2}\mathbb{E}(\ell(Y_{1})^{4})^{1/4}|\mathcal{K}_{n}|^{2}\mathbb{P}(|\ell(Y_{1})|>\mathfrak{m}(n))^{1/4}\leqslant\frac{\mathfrak{c}_{3}}{n}$$

with

$$\mathfrak{c}_{3}=2\mathfrak{m}_{\mathcal{K},\ell}^{1/2}(\mathfrak{m}_{\mathcal{K},\ell}^{1/2}+||s||_{2})\mathbb{E}(\ell(Y_{1})^{4})^{1/4}\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))^{1/4}.$$

Therefore,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\{|W_{K,K^{\prime},\ell}(n)|-\theta||s_{K^{\prime},\ell}-s||_{2}^{2}\}\right)\leqslant(2^{3}+2\mathfrak{c}_{1})\mathfrak{c}_{2}\frac{\log(n)^{4}}{\theta n}+\frac{\mathfrak{c}_{3}}{n}\leqslant\mathfrak{c}_{4}\frac{\log(n)^{4}}{\theta n}$$

with \(\mathfrak{c}_{4}=(2^{3}+2\mathfrak{c}_{1})\mathfrak{c}_{2}+\mathfrak{c}_{3}\).

B.2. Proof of Proposition 2.7

For any \(K\in\mathcal{K}_{n}\),

$$||\widehat{s}_{K,\ell}(n;.)-s_{K,\ell}||_{2}^{2}=\frac{U_{K,\ell}(n)}{n^{2}}+\frac{V_{K,\ell}(n)}{n}$$
(B.7)

with \(U_{K,\ell}(n)=U_{K,K,\ell}(n)\) and \(V_{K,\ell}(n)=V_{K,K,\ell}(n)\). Then, by Lemmas B.1 and B.2,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\left|||\widehat{s}_{K,\ell}(n;.)-s_{K,\ell}||_{2}^{2}-\frac{\overline{s}_{K,\ell}}{n}\right|-\frac{\theta}{n}\overline{s}_{K,\ell}\right\}\right)\leqslant\mathfrak{c}_{2.7}\frac{\log(n)^{5}}{\theta n}$$

with \(\mathfrak{c}_{2.7}=\mathfrak{c}_{B.1}+\mathfrak{c}_{B.2}\).

B.3. Proof of Theorem 2.8

On the one hand, for every \(K\in\mathcal{K}_{n}\),

$$||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}-(1+\theta)\left(||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}\right)$$

can be written

$$||\widehat{s}_{K,\ell}(n;.)-s_{K,\ell}||_{2}^{2}-(1+\theta)\frac{\overline{s}_{K,\ell}}{n}+2W_{K,\ell}(n)-\theta||s_{K,\ell}-s||_{2}^{2},$$

where \(W_{K,\ell}(n):=W_{K,K,\ell}(n)\) (see (14)). Then, by Proposition 2.7 and Lemma B.3,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}-(1+\theta)\left(||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}\right)\right\}\right)\leqslant\mathfrak{c}_{2.8}\frac{\log(n)^{5}}{\theta n}$$

with \(\mathfrak{c}_{2.8}=\mathfrak{c}_{2.7}+\mathfrak{c}_{B.3}\). On the other hand, for any \(K\in\mathcal{K}_{n}\),

$$||s_{K,\ell}-s||_{2}^{2}=||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}-||\widehat{s}_{K,\ell}(n;.)-s_{K,\ell}||_{2}^{2}-W_{K,\ell}(n).$$

Then,

$$(1-\theta)\left(||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}\right)-||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}\leqslant|W_{K,\ell}(n)|-\theta||s_{K,\ell}-s||_{2}^{2}+\Lambda_{K,\ell}(n)-\theta\frac{\overline{s}_{K,\ell}}{n},$$

where

$$\Lambda_{K,\ell}(n):=\left|||\widehat{s}_{K,\ell}-s_{K,\ell}||_{2}^{2}-\frac{\overline{s}_{K,\ell}}{n}\right|.$$

By equalities (B.5) and (B.7),

$$\Lambda_{K,\ell}(n)=\left|\frac{U_{K,\ell}(n)}{n^{2}}+\frac{v_{K,\ell}(n)}{n}-\frac{||s_{K,\ell}||_{2}^{2}}{n}\right|$$

with \(U_{K,\ell}(n)=U_{K,K,\ell}(n)\) (see (B.1)). By Lemmas B.1 and B.2, there exists a deterministic constant \(\mathfrak{c}_{1}>0\), not depending \(n\) and \(\theta\), such that

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\Lambda_{K,\ell}(n)-\theta\frac{\overline{s}_{K,\ell}}{n}\right\}\right)\leqslant\mathfrak{c}_{1}\frac{\log(n)^{5}}{\theta n}.$$

By Lemma B.3,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\{|W_{K,\ell}(n)|-\theta||s_{K,\ell}-s||_{2}^{2}\}\right)\leqslant\mathfrak{c}_{B.3}\frac{\log(n)^{4}}{\theta n}.$$

Therefore,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}-\frac{1}{1-\theta}||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}\right\}\right)\leqslant\overline{\mathfrak{c}}_{2.8}\frac{\log(n)^{5}}{\theta(1-\theta)n}$$

with \(\overline{\mathfrak{c}}_{2.8}=\mathfrak{c}_{B.3}+\mathfrak{c}_{1}\).

B.4. Proof of Theorem 3.2

The proof of Theorem 3.2 is dissected in three steps.

Step 1. This first step is devoted to provide a suitable decomposition of

$$||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-s||_{2}^{2}.$$

First,

$$||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-s||_{2}^{2}=||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-\widehat{s}_{K_{0},\ell}(n;\cdot)||_{2}^{2}$$

$${}+||\widehat{s}_{K_{0},\ell}(n;\cdot)-s||_{2}^{2}-2\langle\widehat{s}_{K_{0},\ell}(n;\cdot)-\widehat{s}_{\widehat{K},\ell}(n;\cdot),\widehat{s}_{K_{0},\ell}(n;\cdot)-s\rangle_{2}.$$

From (8), it follows that for any \(K\in\mathcal{K}_{n}\),

$$||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-s||_{2}^{2}\leqslant||\widehat{s}_{K,\ell}(n;\cdot)-s||_{2}^{2}+\textrm{pen}_{\ell}(K)-\textrm{pen}_{\ell}(\widehat{K})+||\widehat{s}_{K_{0},\ell}(n;\cdot)-s||_{2}^{2}$$

$${}-2\langle\widehat{s}_{K,\ell}(n;\cdot)-\widehat{s}_{\widehat{K},\ell}(n\cdot),\widehat{s}_{K_{0},\ell}(n;\cdot)-s\rangle_{2}=||\widehat{s}_{K,\ell}(n;\cdot)-s||_{2}^{2}+\psi_{n}(K)-\psi_{n}(\widehat{K}),$$
(B.8)

where

$$\psi_{n}(K):=2\langle\widehat{s}_{K,\ell}(n;\cdot)-s,\widehat{s}_{K_{0},\ell}(n;\cdot)-s\rangle_{2}-\textrm{pen}_{\ell}(K).$$

Let’s complete the decomposition of \(||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-s||_{2}^{2}\) by writing

$$\psi_{n}(K)=2(\psi_{1,n}(K)+\psi_{2,n}(K)+\psi_{3,n}(K)),$$

where

$$\psi_{1,n}(K):=\dfrac{U_{K,K_{0},\ell}(n)}{n^{2}},$$

$$\psi_{2,n}(K):=-\dfrac{1}{n^{2}}\left(\displaystyle\sum_{i=1}^{n}\ell(Y_{i})\langle K_{0}(X_{i},.),s_{K,\ell}\rangle_{2}+\sum_{i=1}^{n}\ell(Y_{i})\langle K(X_{i},.),s_{K_{0},\ell}\rangle_{2}\right)+\dfrac{1}{n}\langle s_{K_{0},\ell},s_{K,\ell}\rangle_{2},\textrm{ and}$$

$$\psi_{3,n}(K):=W_{K,K_{0},\ell}(n)+W_{K_{0},K,\ell}(n)+\langle s_{K,\ell}-s,s_{K_{0},\ell}-s\rangle_{2}.$$

Step 2. In this step, we give controls of the quantities

$$\mathbb{E}(\psi_{i,n}(K))\quad\text{and}\quad\mathbb{E}(\psi_{i,n}(\widehat{K}));\quad i=1,2,3.$$

  • By Lemma B.1, for any \(\theta\in]0,1[\),

    $$\mathbb{E}(|\psi_{1,n}(K)|)\leqslant\frac{\theta}{n}\overline{s}_{K,\ell}+\mathfrak{c}_{B.1}\frac{\log(n)^{5}}{\theta n}$$

    and

    $$\mathbb{E}(|\psi_{1,n}(\widehat{K})|)\leqslant\frac{\theta}{n}\mathbb{E}(\overline{s}_{\widehat{K},\ell})+\mathfrak{c}_{B.1}\frac{\log(n)^{5}}{\theta n}.$$
  • On the one hand, for any \(K,K^{\prime}\in\mathcal{K}_{n}\), consider

    $$\Psi_{2,n}(K,K^{\prime}):=\dfrac{1}{n}\displaystyle\sum_{i=1}^{n}\ell(Y_{i})\langle K(X_{i},.),s_{K^{\prime},\ell}\rangle_{2}.$$

    Then, by Assumption 3.1,

    $$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}|\Psi_{2,n}(K,K^{\prime})|\right)\leqslant\mathbb{E}(\ell(Y_{1})^{2})^{1/2}\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\langle K(X_{1},.),s_{K^{\prime},\ell}\rangle_{2}^{2}\right)^{1/2}$$
    $${}\leqslant\overline{\mathfrak{m}}_{\mathcal{K},\ell}^{1/2}\mathbb{E}(\ell(Y_{1})^{2})^{1/2}.$$

    On the other hand, by Assumption 2.1.(2),

    $$|\langle s_{K,\ell},s_{K_{0},\ell}\rangle_{2}|\leqslant\mathfrak{m}_{\mathcal{K},\ell}.$$

    Then, there exists a deterministic constant \(\mathfrak{c}_{1}>0\), not depending on \(n\) and \(K\), such that

    $$\mathbb{E}(|\psi_{2,n}(K)|)\leqslant\frac{\mathfrak{c}_{1}}{n}\quad\textrm{and}\quad\mathbb{E}(|\psi_{2,n}(\widehat{K})|)\leqslant\frac{\mathfrak{c}_{1}}{n}.$$
  • By Lemma B.3,

    $$\mathbb{E}(|\psi_{3,n}(K)|)\leqslant\dfrac{\theta}{4}(||s_{K,\ell}-s||_{2}^{2}+||s_{K_{0},\ell}-s||_{2}^{2})+8\mathfrak{c}_{B.3}\frac{\log(n)^{4}}{\theta n}$$
    $${}+\left(\dfrac{\theta}{2}\right)^{1/2}||s_{K,\ell}-s||_{2}\times\left(\dfrac{2}{\theta}\right)^{1/2}||s_{K_{0},\ell}-s||_{2}$$
    $${}\leqslant\dfrac{\theta}{2}||s_{K,\ell}-s||_{2}^{2}+\left(\dfrac{\theta}{4}+\dfrac{1}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+8\mathfrak{c}_{B.3}\frac{\log(n)^{4}}{\theta n}$$

    and

    $$\mathbb{E}(|\psi_{3,n}(\widehat{K})|)\leqslant\frac{\theta}{2}\mathbb{E}(||s_{\widehat{K},\ell}-s||_{2}^{2})+\left(\dfrac{\theta}{4}+\dfrac{1}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+8\mathfrak{c}_{B.3}\frac{\log(n)^{4}}{\theta n}.$$

Step 3. By the previous step, there exists a deterministic constant \(\mathfrak{c}_{2}>0\), not depending on \(n\), \(\theta\), \(K\), and \(K_{0}\), such that

$$\mathbb{E}(|\psi_{n}(K)|)\leqslant\theta\left(||s_{K,\ell}-s||_{2}^{2}+\dfrac{\overline{s}_{K,\ell}}{n}\right)+\left(\dfrac{\theta}{2}+\dfrac{2}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+\mathfrak{c}_{2}\dfrac{\log(n)^{5}}{\theta n}$$

and

$$\mathbb{E}(|\psi_{n}(\widehat{K})|)\leqslant\theta\mathbb{E}\left(||s_{\widehat{K},\ell}-s||_{2}^{2}+\dfrac{\overline{s}_{\widehat{K},\ell}}{n}\right)+\left(\dfrac{\theta}{2}+\dfrac{2}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+\mathfrak{c}_{2}\dfrac{\log(n)^{5}}{\theta n}.$$

Then, by Theorem 2.8,

$$\mathbb{E}(|\psi_{n}(K)|)\leqslant\dfrac{\theta}{1-\theta}\mathbb{E}(||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2})+\left(\dfrac{\theta}{2}+\dfrac{2}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+\left(\dfrac{\mathfrak{c}_{2}}{\theta}+\dfrac{\mathfrak{c}_{2.8}}{1-\theta}\right)\dfrac{\log(n)^{5}}{n}$$

and

$$\mathbb{E}(|\psi_{n}(\widehat{K})|)\leqslant\dfrac{\theta}{1-\theta}\mathbb{E}(||\widehat{s}_{\widehat{K},\ell}(n;.)-s||_{2}^{2})+\left(\dfrac{\theta}{2}+\dfrac{2}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+\left(\dfrac{\mathfrak{c}_{2}}{\theta}+\dfrac{\mathfrak{c}_{2.8}}{1-\theta}\right)\dfrac{\log(n)^{5}}{n}.$$

By decomposition (B.8), there exist two deterministic constants \(\mathfrak{c}_{3},\mathfrak{c}_{4}>0\), not depending on \(n\), \(\theta\), \(K\), and \(K_{0}\), such that

$$\mathbb{E}(||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-s||_{2}^{2})\leqslant\mathbb{E}(||\widehat{s}_{K,\ell}(n;\cdot)-s||_{2}^{2})+\mathbb{E}(|\psi_{n}(K)|)+\mathbb{E}(|\psi_{n}(\widehat{K})|)$$

$${}\leqslant\left(1+\dfrac{\theta}{1-\theta}\right)\mathbb{E}(||\widehat{s}_{K,\ell}(n;\cdot)-s||_{2}^{2})+\dfrac{\theta}{1-\theta}\mathbb{E}(||\widehat{s}_{\widehat{K},\ell}(n;.)-s||_{2}^{2})$$

$${}+\dfrac{\mathfrak{c}_{3}}{\theta}||s_{K_{0},\ell}-s||_{2}^{2}+\dfrac{\mathfrak{c}_{4}}{\theta(1-\theta)}\dfrac{\log(n)^{5}}{n}.$$

This concludes the proof.

ACKNOWLEDGMENTS

The authors want also to thank Fabienne Comte for her careful reading and advices.

FUNDING

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 811017.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Halconruy, H., Marie, N. Kernel Selection in Nonparametric Regression. Math. Meth. Stat. 29, 32–56 (2020). https://doi.org/10.3103/S1066530720010044

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S1066530720010044

Keywords:

Navigation