Optimal Adaptive Estimation on $${\mathbb{R}}$$ or $${\mathbb{R}}^{{+}}$$ of the Derivatives of a Density

Comte, F.; Duval, C.; Sacko, O.

doi:10.3103/S1066530720010020

Optimal Adaptive Estimation on ${\mathbb{R}}$ or ${\mathbb{R}}^{{+}}$of the Derivatives of a Density

Published: 31 August 2021

Volume 29, pages 1–31, (2020)
Cite this article

Mathematical Methods of Statistics Aims and scope Submit manuscript

F. Comte¹,
C. Duval¹ &
O. Sacko¹

75 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we consider the problem of estimating the $d$-th order derivative $f^{(d)}$ of a density $f$, relying on a sample of $n$ i.i.d. observations $X_{1},\dots,X_{n}$ with density $f$ supported on ${\mathbb{R}}$ or ${\mathbb{R}}^{+}$. We propose projection estimators defined in the orthonormal Hermite or Laguerre bases and study their integrated ${\mathbb{L}}^{2}$-risk. For the density $f$ belonging to regularity spaces and for a projection space chosen with adequate dimension, we obtain rates of convergence for our estimators, which are optimal in the minimax sense. The optimal choice of the projection space depends on unknown parameters, so a general data-driven procedure is proposed to reach the bias-variance compromise automatically. We discuss the assumptions and the estimator is compared to the one obtained by simply differentiating the density estimator. Simulations are finally performed. They illustrate the good performances of the procedure and provide numerical comparison of projection and kernel estimators

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bernstein–Jackson Inequalities on Gaussian Hilbert Spaces

Article Open access 12 September 2023

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Multivariate Gaussian processes: definitions, examples and applications

Article Open access 27 January 2023

REFERENCES

M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Volume 55 of National Bureau of Standards Applied Mathematics Series. For sale by the Superintendent of Documents (U.S. Government Printing Office, Washington, D.C, 1964).
R. Askey and S. Wainger, ‘‘Mean convergence of expansions in Laguerre and Hermite series,’’ Amer. J. Math. 87, 695–708 (1965).
Article MathSciNet Google Scholar
J.-P. Baudry, C. Maugis, and B. Michel, ‘‘Slope heuristics: Overview and implementation,’’ Stat. Comput. 22 (2), 455–470 (2012).
Article MathSciNet Google Scholar
D. Belomestny, F. Comte, and V. Genon-Catalot, ‘‘Nonparametric Laguerre estimation in the multiplicative censoring model,’’ Electron. J. Stat. 10 (2), 3114–3152 (2016).
Article MathSciNet Google Scholar
D. Belomestny, F. Comte, and V. Genon-Catalot, ‘‘Correction to: Nonparametric laguerre estimation in the multiplicative censoring model,’’ Electronic Journal of Statistics 11 (2), 4845–4850 (2017).
Article MathSciNet Google Scholar
D. Belomestny, F. Comte, and V. Genon-Catalot, ‘‘Sobolev-Hermite versus Sobolev nonparametric density estimation on $\mathbb{R}$,’’ Ann. Inst. Statist. Math. 71 (1), 29–62 (2019).
Article MathSciNet Google Scholar
B. Bercu, S. Capderou, and G. Durrieu, ‘‘Nonparametric recursive estimation of the derivative of the regression function with application to sea shores water quality,’’ Stat. Inference Stoch. Process. 22 (1), 17–40 (2019).
Article MathSciNet Google Scholar
P. Bhattacharya, ‘‘Estimation of a probability density function and its derivatives,’’ Sankhyā: The Indian Journal of Statistics, Series A, 373–382 (1967).
B. Bongioanni and J. L. Torrea, ‘‘What is a Sobolev space for the Laguerre function systems?’’ Studia Math. 192 (2), 147–172 (2009).
Article MathSciNet Google Scholar
J. E. Chacón and T. Duong, ‘‘Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting,’’ Electronic Journal of Statistics 7, 499–532 (2013).
Article MathSciNet Google Scholar
J. E. Chacón, T. Duong, and M. Wand, ‘‘Asymptotics for general multivariate kernel density derivative estimators,’’ Statistica Sinica, 807–840 (2011).
Y. Cheng, ‘‘Mean shift, mode seeking, and clustering,’’ IEEE transactions on pattern analysis and machine intelligence 17 (8), 790–799 (1995).
Article Google Scholar
F. Comte and V. Genon-Catalot, ‘‘Laguerre and Hermite bases for inverse problems,’’ J. Korean Statist. Soc. 47 (3), 273–296 (2018).
Article MathSciNet Google Scholar
F. Comte and N. Marie, ‘‘Bandwidth selection for the Wolverton–Wagner estimator,’’ J. Statist. Plann. Inference 207, 198–214 (2020).
Article MathSciNet Google Scholar
S. Efromovich, ‘‘Simultaneous sharp estimation of functions and their derivatives,’’ Ann. Statist. 26 (1), 273–278 (1998).
Article MathSciNet Google Scholar
S. Efromovich, ‘‘Nonparametric curve estimation: methods, theory, and applications,’’ Springer Series in Statistics (1999).
MATH Google Scholar
C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman, ‘‘Non-parametric inference for density modes,’’ J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 (1), 99–126 (2016).
Article MathSciNet Google Scholar
E. Giné and R. Nickl, Mathematical Foundations of Infinite-Dimensional Statistical Models, Vol. 40 (Cambridge University Press. 2016).
Book Google Scholar
W. Härdle, J. Hart, J. S. Marron, and A. B. Tsybakov, ‘‘Bandwidth choice for average derivative estimation,’’ Journal of the American Statistical Association 87 (417), 218–226 (1992).
MathSciNet MATH Google Scholar
W. Härdle, W. Hildenbrand, and M. Jerison, Empirical evidence on the law of demand (Econometrica: Journal of the Econometric Society, 1991), p. 1525–1549.
W. Härdle and T. M. Stoker, ‘‘Investigating smooth multiple regression by the method of average derivatives,’’ Journal of the American statistical Association 84 (408), 986–995 (1989).
MathSciNet MATH Google Scholar
J. Indritz, ‘‘An inequality for Hermite polynomials,’’ Proc. Amer. Math. Soc. 12, 981–983 (1961).
Article MathSciNet Google Scholar
T. Klein and E. Rio, ‘‘Concentration around the mean for maxima of empirical processes,’’ Ann. Probab. 33 (3), 1060–1077 (2005).
Article MathSciNet Google Scholar
R. Koekoek, ‘‘Generalizations of laguerre polynomials,’’ Journal of Mathematical Analysis and Applications 153 (2), 576–590 (1990).
Article MathSciNet Google Scholar
C. Lacour, P. Massart, and V. Rivoirard, ‘‘Estimator selection: A new method with applications to kernel density estimation,’’ Sankhya A 79 (2), 298–335 (2017).
Article MathSciNet Google Scholar
M. Ledoux, ‘‘On Talagrand’s deviation inequalities for product measures,’’ ESAIM Probab. Statist. 1, 63–87 (1995/1997).
Article MathSciNet Google Scholar
O. V. Lepski, ‘‘A new approach to estimator selection,’’ Bernoulli 24 (4A), 2776–2810 (2018).
Article MathSciNet Google Scholar
L. Markovich, ‘‘Gamma kernel estimation of the density derivative on the positive semi-axis by dependent data,’’ REVSTAT–Statistical Journal 14 (3), 327–348 (2016).
MathSciNet MATH Google Scholar
P. Massart, Concentration Inequalities and Model Selection, Vol. 1896 of Lecture Notes in Mathematics, Springer, Berlin, Lectures from the 33rd Summer School on Probability Theory Held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard (2007).
C. Park and K.-H. Kang, ‘‘Sizer analysis for the comparison of regression curves,’’ Computational Statistics and Data Analysis 52 (8), 3954–3970 (2008).
Article MathSciNet Google Scholar
S. Plancade, ‘‘Estimation of the density of regression errors by pointwise model selection,’’ Math. Methods Statist. 18 (4), 341–374 (2009).
Article MathSciNet Google Scholar
B. L. S. P. Rao, ‘‘Nonparametric estimation of the derivatives of a density by the method of wavelets,’’ Bull. Inform. Cybernet. 28 (1), 91–100 (1996).
Article MathSciNet Google Scholar
H. Sasaki, Y.-K. Noh, G. Niu, and M. Sugiyama, ‘‘Direct density derivative estimation,’’ Neural Comput. 28 (6), 1101–1140 (2016).
Article MathSciNet Google Scholar
E. Schmisser, ‘‘Nonparametric estimation of the derivatives of the stationary density for stationary processes,’’ ESAIM Probab. Stat. 17, 33–69 (2013).
Article MathSciNet Google Scholar
E. F. Schuster, ‘‘Estimation of a probability density function and its derivatives,’’ The Annals of Mathematical Statistics 40 (4), 1187–1195 (1969).
Article MathSciNet Google Scholar
W. Shen and S. Ghosal, ‘‘Posterior contraction rates of density derivative estimation,’’ Sankhya A 79 (2), 336–354 (2017).
Article MathSciNet Google Scholar
B. W. Silverman, ‘‘Weak and strong uniform consistency of the kernel estimate of a density and its derivatives,’’ The Annals of Statistics, 177–184 (1978).
R. Singh, ‘‘Mean squared errors of estimates of a density and its derivatives,’’ Biometrika 66 (1), 177–180 (1979).
Article MathSciNet Google Scholar
R. S. Singh, ‘‘Applications of estimators of a density and its derivatives to certain statistical problems,’’ J. Roy. Statist. Soc. Ser. B 39 (3), 357–363 (1977).
MathSciNet MATH Google Scholar
G. Szegö, Orthogonal polynomials. American Mathematical Society Colloquium Publications, Vol. 23 (Revised ed. American Mathematical Society, Providence, R.I., 1959).
M. Talagrand, ‘‘New concentration inequalities in product spaces,’’ Invent. Math. 126 (3), 505–563 (1996).
Article MathSciNet Google Scholar
A. B. Tsybakov, Introduction to Nonparametric Estimation. Springer Series in Statistics (Springer, New York. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats, 2009).

Download references

Author information

Authors and Affiliations

Université de Paris, CNRS, MAP5 UMR 8145, F-75006, Paris, France
F. Comte, C. Duval & O. Sacko

Authors

F. Comte
View author publications
You can also search for this author in PubMed Google Scholar
C. Duval
View author publications
You can also search for this author in PubMed Google Scholar
O. Sacko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to F. Comte.

APPENDIX A

PROOFS OF AUXILIARY RESULTS

A.1. Proof of Lemma 2.1

In the Hermite case $\varphi_{j}=h_{j}$ and $f:\mathbb{R}\mapsto[0,\infty),$ allowing $d$ successive integration by parts, it holds that

$$a_{j}(f^{(d)})=\int\limits_{\mathbb{R}}f^{(d)}(x)h_{j}(x)dx=\left[\sum_{k=0}^{d-1}(-1)^{k}f^{(d-1-k)}(x)h_{j}^{(k)}(x)\right]^{+\infty}_{-\infty}+(-1)^{d}\int\limits_{\mathbb{R}}h_{j}^{(d)}(x)f(x)dx.$$

(A.1)

By definition for all $j\geqslant 0$, $h_{j}(x)=c_{j}H_{j}(x)e^{-\frac{x^{2}}{2}}$ where $H_{j}$ is a polynomial. Then, its $k$th derivative, $0\leqslant k\leqslant d-1$, is a polynomial multiplied by $e^{-{x^{2}}/{2}}$ and $\lim_{|x|\to+\infty}h_{j}^{(k)}(x)=0$. This together with (A2), gives that the bracket in (A.1) is null and the result follows.

Similarly in the Laguerre case, (A.1) holds integrating on $[0,\infty)$ instead of $\mathbb{R}$ and replacing $h_{j}$ by $\ell_{j}$. The term in the bracket is null at 0 from (A3). It is also null at infinity using (A2) together with the fact that $\ell_{j}$ are polynomials multiplied by $e^{-x}$ leading similarly to $\lim_{x\to\infty}f^{(d-1-k)}(x)\ell^{(k)}_{j}(x)=0$, $0\leqslant k\leqslant d-1$, $j\geqslant 0$. The result follows.

A.2. Proof of Lemma 2.2

We control the quantity

$$\sum_{j\geqslant 0}j^{s-d}\langle f^{(d)},h_{j}\rangle^{2}=\sum_{j=0}^{d-1}j^{s-d}\langle f^{(d)},h_{j}\rangle^{2}+\sum_{j\geqslant d}j^{s-d}\langle f^{(d)},h_{j}\rangle^{2}.$$

(A.2)

The first term is a constant which depending on $d$. For the second term using Lemma 5.2, we obtain

$$\sum_{j\geqslant d}j^{s-d}\langle f^{(d)},h_{j}\rangle^{2}=\sum_{j\geqslant d}j^{s-d}\left(\sum_{k=-d}^{d}b_{k,j}^{(d)}\int h_{j+k}(x)f(x)dx\right)^{2}$$

$${}\leqslant C_{d}\sum_{j\geqslant d}j^{s}\sum_{k=-d}^{d}\left(\int h_{j+k}(x)f(x)dx\right)^{2}=C_{d}\sum_{k=-d}^{d}\sum_{j\geqslant d}j^{s}\langle h_{j+k},f\rangle^{2}$$

$${}=C_{d}\sum_{k=-d}^{d}\left(\sum_{j\geqslant d+k}|j-k|^{s}\langle h_{j},f\rangle^{2}\right)\leqslant C_{d}\sum_{k=-d}^{d}\left(\sum_{j\geqslant 0}2^{s}j^{s}\langle h_{j},f\rangle^{2}\right)=(2d+1)2^{s}DC_{d}.$$

Inserting this in (59), we obtain the announced result.

A.3. Proof of Lemma 2.3

We establish the result for $d=1$, the general case is an immediate consequence. It follows from the definition of $\widetilde{W}_{L}^{s}(D)$ that $(\theta^{\prime})^{(j)}$, $0\leqslant j\leqslant s-1$ are in $C([0,\infty))$. Moreover, it holds that $x\mapsto x^{k/2}(\theta^{\prime})^{(j)}(x)\in\mathbb{L}^{2}(\mathbb{R}^{+})$ for all $0\leqslant j<k\leqslant s-1$. The case $k=j$ is obtained using that $\theta^{(j)}$ is continuous on $C([0,\infty))$ and that $x\mapsto x^{(j+1)/2}(\theta^{\prime})^{(j)}(x)\in\mathbb{L}^{2}(\mathbb{R}^{+}).$ It follows that

$$|||\theta^{\prime}|||_{s}^{2}=\sum_{j=0}^{s-1}\Big{|}\Big{|}x^{j/2}\sum_{k=0}^{j}\binom{j}{k}(\theta^{\prime})^{(k)}\Big{|}\Big{|}^{2}\leqslant 2\sum_{j=0}^{s-1}\Big{|}\Big{|}x^{j/2}\sum_{k=0}^{j-1}\binom{j}{k}(\theta^{\prime})^{(k)}\Big{|}\Big{|}^{2}+2\sum_{j=0}^{s-1}\Big{|}\Big{|}x^{j/2}(\theta^{\prime})^{(j)}\Big{|}\Big{|}^{2}$$

$${}\leqslant C+2\sum_{j=0}^{s-1}||x^{(j+1)/2}(\theta^{\prime})^{(j)}(x)||^{2}<\infty,$$

where $C$ depends on $D$. Finally, using the equivalence of the norms $|.|_{s}$ and $|||.|||_{s}$, the value of $D^{\prime}$ follows from the latter inequality.

A.4. Proof of Lemma 5.1

Consider the decomposition

$$\int\limits_{0}^{+\infty}x^{-k}(\ell_{j-k,(k)}(x/2))^{2}f(x/2)dx=\sum_{i=1}^{6}I_{i},$$

where for $\nu=4j-2k+2$, $j\geqslant k$, we used the decomposition $(0,\infty)=(0,\frac{1}{\nu}]\cup(\frac{1}{\nu},\frac{\nu}{2}]\cup(\frac{\nu}{2},\nu-\nu^{1/3}]\cup(\nu-\nu^{1/3},\nu+\nu^{/13}]\cup(\nu+\nu^{1/3},3\nu/2]\cup(3\nu/2,\infty).$ Using [2] (see Appendix B.1) and straightforward inequalities give

$$I_{1}\lesssim\int\limits_{0}^{\frac{1}{\nu}}x^{-k}(x\nu)^{k}f(x/2)dx\leqslant\int\limits_{0}^{\frac{1}{\nu}}x^{-k}(x\nu)^{-1/2}f(x/2)dx\lesssim\nu^{-1/2}\mathbb{E}[X^{-k-1/2}],$$

$$I_{2}\lesssim\int\limits_{1/\nu}^{\frac{\nu}{2}}x^{-k}((x\nu)^{-1/4})^{2}f(x/2)dx=\nu^{-1/2}\int\limits_{1/\nu}^{\frac{\nu}{2}}x^{-k-1/2}f(x/2)dx\leqslant\nu^{-1/2}\mathbb{E}[X^{-k-1/2}],$$

$$I_{3}\lesssim\int\limits_{\frac{\nu}{2}}^{\nu-\nu^{1/3}}x^{-k}(\nu^{-1/4}(\nu-x)^{-1/4})^{2}f(x/2)dx=\nu^{-1/2}\int\limits_{\frac{\nu}{2}}^{\nu-\nu^{1/3}}x^{-k}(\nu-x)^{-1/2}f(x/2)dx\lesssim\nu^{-1/2},$$

$$I_{4}\lesssim\int\limits_{\nu-\nu^{1/3}}^{\nu+\nu^{1/3}}x^{-k}(\nu^{-1/3})^{2}f(x/2)dx\leqslant\nu^{-2/3}\int\limits_{\frac{\nu}{2}}^{\nu+\nu^{1/3}}x^{-k}f(x/2)dx\lesssim\nu^{-1/2}\nu^{-k}\leqslant\nu^{-1/2},$$

$$I_{5}\lesssim\int\limits_{\nu+\nu^{1/3}}^{3\nu/2}x^{-k}\nu^{-1/2}(x-\nu)^{-1/2}e^{-2\gamma_{1}\nu^{-1/2}(x-\nu)^{3/2}}f(x/2)dx\lesssim\nu^{-1/2}\nu^{-1/6}\nu^{-k}\int f(x/2)dx\lesssim\nu^{-1/2},$$

$$I_{6}\lesssim\int\limits_{3\nu/2}^{+\infty}x^{-k}e^{-2\gamma_{2}x}f(x/2)dx\lesssim e^{-3\gamma_{2}\nu/2}=\mathcal{O}(\nu^{-1/2}).$$

Gathering these inequalities give the announced result.

A.5. Proof of Lemma 5.2

The result is obtained by induction on $d$. If $d=1$, $h_{j}^{\prime}$ is given by (5), with $b^{(1)}_{-1,j-1}=j^{1/2}/\sqrt{2}$, $b_{0,j}=0$ and $b^{(1)}_{1,j}=(j+1)^{1/2}/\sqrt{2}$, $\forall j\geqslant 1$. Thus, it holds $b_{k,j}^{(1)}=\mathcal{O}(j^{1/2})$ and (37) is satisfied for $d=1$. Let $\text{P}(d)$ the proposition given by Eq. (37) and assume $\text{P}(d)$ holds and we establish $\text{P}(d+1)$. It holds using successively $\text{P}(d)$ and (5) that

$$h_{j}^{(d+1)}(x)=\sum_{k=-d}^{d}b_{k,j}^{(d)}\left[\frac{\sqrt{j+k}}{\sqrt{2}}h_{j+k-1}-\frac{\sqrt{j+k+1}}{\sqrt{2}}h_{j+k+1}\right]$$

$${}=\sum_{k^{\prime}=-d-1}^{d-1}b_{k^{\prime}+1,j}^{(d)}\frac{\sqrt{j+k^{\prime}+1}}{\sqrt{2}}h_{j+k^{\prime}}-\sum_{k^{\prime}=-d+1}^{d+1}b_{k^{\prime}-1,j}^{(d)}\frac{\sqrt{j+k^{\prime}}}{\sqrt{2}}h_{j+k^{\prime}}:=\sum_{k=-d-1}^{d+1}b_{k,j}^{(d+1)}h_{j+k^{\prime}},$$

where $b^{(d)}_{k,j}=\mathcal{O}(j^{d/2})$, $\forall j\geqslant d\geqslant|k|$ and $b_{k,j}^{(d+1)}=b_{k+1,j}^{(d)}\frac{\sqrt{j+k+1}}{\sqrt{2}}\mathbf{1}_{|k|\leqslant d-1}-b_{k-1,j}^{(d)}\frac{\sqrt{j+k}}{\sqrt{2}}\mathbf{1}_{|k|\leqslant d+1}$. It follows that $|b_{k,j}^{(d+1)}|\leqslant 2\sqrt{({j+d+1})/{2}}j^{\frac{d}{2}}\leqslant C_{d}j^{\frac{d+1}{2}},$ $|k|\leqslant d\leqslant j$, which completes the proof.

Proof of Lemma 5.4

A.6.1. Proof of part (i). First, it holds that

$$\mathbb{E}\left[\left(\sup_{t\in S_{m}+S_{\widehat{m}},||t||=1}|\nu_{n,d}(t)|^{2}-p(m,\widehat{m}_{n})\right)_{+}\right]$$

$${}\leqslant\sum_{m^{\prime}\in\mathcal{M}_{n,d}}\mathbb{E}\left[\left(\sup_{t\in S_{m}+S_{{m^{\prime}}},||t||=1}|\nu_{n,d}(t)|^{2}-p(m,{m^{\prime}})\right)_{+}\right],$$

(A.3)

which we bound applying a Talagrand Inequality (see Appendix B.2). Following notations of Appendix B.2, we have three terms $H^{2},v$, and $M_{1}$ to compute. Let us denote by $m^{*}=m\vee m^{\prime}$, for $t\in S_{m}+S_{{m^{\prime}}}$, $||t||=1$, it holds

$$||t||^{2}=\left|\left|\sum_{j=0}^{m^{*}-1}a_{j}\varphi_{j}\right|\right|^{2}=\sum_{j=0}^{m^{*}-1}a_{j}^{2}=1.$$

Computing $\boldsymbol{H}^{\mathbf{2}}$. By the linearity of $\nu_{n,d}$ and the Cauchy–Schwarz inequality, we have

$$\nu_{n,d}(t)^{2}=\left(\sum_{j=0}^{m^{*}-1}a_{j}\nu_{n,d}(\varphi_{j})\right)^{2}\leqslant\sum_{j=0}^{m^{*}-1}a_{j}^{2}\sum_{j=0}^{m^{*}-1}\nu_{n,d}^{2}(\varphi_{j})=\sum_{j=0}^{m^{*}-1}\nu_{n,d}^{2}(\varphi_{j}).$$

One can check that the latter is an equality for $a_{j}=\nu_{n,d}(\varphi_{j}).$ Therefore, taking expectation, it follows

$$\mathbb{E}\left[\sup_{t\in S_{m}^{*},||t||=1}\nu^{2}_{n,d}(t)\right]=\sum_{j=0}^{m^{*}-1}\textrm{Var}(\nu_{n,d}(\varphi_{j}))=\frac{1}{n}\sum_{j=0}^{m^{*}-1}\textrm{Var}(\varphi_{j}^{(d)}(X_{1}))$$

$${}\leqslant\frac{1}{n}\sum_{j=0}^{m^{*}-1}\mathbb{E}\left[\varphi_{j}^{(d)}(X_{1})^{2}\right]=\frac{V_{m^{*},d}}{n}=:H^{2}.$$

Computing $\boldsymbol{v}$. It holds for $t\in S_{m}+S_{{m^{\prime}}}$, $||t||=1$,

$$\textrm{Var}\left((-1)^{d}t^{(d)}(X_{1})\right)\leqslant\int t^{(d)}(x)^{2}f(x)dx=\int\left(\sum_{j=0}^{m^{*}-1}a_{j}\varphi_{j}^{(d)}(x)\right)^{2}f(x)dx$$

$${}\leqslant 2\int\left(\sum_{j=0}^{d-1}a_{j}\varphi_{j}^{(d)}(x)\right)^{2}f(x)dx+2\int\left(\sum_{j=d}^{m^{*}-1}a_{j}\varphi_{j}^{(d)}(x)\right)^{2}f(x)dx.$$

(A.4)

The first term of the previous inequality is a constant depending only on $d$. For the second term, we consider separately the Laguerre and Hermite cases.

The Laguerre case ($\varphi_{j}=\ell_{j}$). Using (36) and the Cauchy–Schwarz inequality, it holds that

$$\int\left(\sum_{j=d}^{m^{*}-1}a_{j}\ell_{j}^{(d)}(x)\right)^{2}f(x)dx\leqslant 3^{d}\sum_{k=0}^{d}\binom{d}{k}\int\left(\sum_{j=d}^{m^{*}-1}a_{j}\left(\frac{j!}{(j-k)!}\right)^{\frac{1}{2}}x^{-\frac{k}{2}}\ell_{j-k,(k)}(x)\right)^{2}f(x)dx$$

$${}\leqslant 3^{d}\sum_{k=0}^{d}\binom{d}{k}\sup_{x\in\mathbb{R}^{+}}\frac{f(x)}{x^{k}}\sum_{j=d}^{m^{*}-1}a_{j}^{2}\frac{j!}{(j-k)!}\leqslant C(d)(m^{*})^{d},$$

(A.5)

where we used the orthonormality of $(\ell_{j,(k)})_{j\geqslant 0}$ and where $C(d)$ is a constant depending only on $d$ and $\sup_{x\in\mathbb{R}^{+}}\frac{f(x)}{x^{k}}$.

The Hermite case ($\varphi_{j}=h_{j}$). Similarly, using Lemma 5.2 and the orthonormality of $h_{j}$, it follows

$$\int\left(\sum_{j=d}^{m^{*}-1}a_{j}h_{j}^{(d)}(x)\right)^{2}f(x)dx\leqslant(2d+1)\sum_{k=-d}^{d}\int\left(\sum_{j=d}^{m^{*}-1}a_{j}b_{k,j}h_{j+k}(x)\right)^{2}f(x)dx$$

$${}\leqslant C(d)||f||_{\infty}(m^{*})^{d}.$$

(A.6)

Plugging (A.5) or (A.6) in (A.4), we set in the two cases $v:=c_{1}(m^{*})^{d}$ where $c_{1}$ depends on $d$ and either on $\sup_{x\in\mathbb{R}^{+}}\frac{f(x)}{x^{k}}$ (Laguerre case) or $||f||_{\infty}$ (Hermite case).

Computing $\boldsymbol{M}_{\mathbf{1}}$. The Cauchy Schwarz Inequality and $||t||=1$ give

$$||(-1)^{d}t^{(d)}||_{\infty}=\left|\left|\sum_{j=0}^{m^{*}-1}(-1)^{d}a_{j}\varphi_{j}^{(d)}\right|\right|_{\infty}\leqslant\sup_{x\in\mathbb{R}}\sqrt{\sum_{j=0}^{m^{*}-1}\varphi_{j}^{(d)}(x)^{2}}.$$

(A.7)

The Laguerre case. We use the following Lemma whose proof is a consequence of (2) and an induction on $d$.

Lemma A.1. For $\ell_{j}$ given in (1), the $d$th derivative of $\ell_{j}$ is such that $||\ell_{j}^{(d)}||_{\infty}\leqslant C_{d}(j+1)^{d}$, $\forall j\geqslant 0$ and where $C_{d}$ is a positive constant depending on $d$ .

Using Lemma A.1, we obtain

$$\sum_{j=0}^{m^{*}-1}\ell_{j}^{(d)}(x)^{2}\leqslant C_{d}^{2}(m^{*})^{2d+1}.$$

(A.8)

The Hermite case. The $d$ first terms in the sum in (A.7) can be bounded by a constant depending only on $d$. For the remaining terms, Lemma 5.2 and $||h_{j}||_{\infty}\leqslant\phi_{0}$ (see (4)) give

$$\sum_{j=d}^{m^{*}-1}[h_{j}^{(d)}(x)]^{2}\leqslant C_{d}^{2}\phi_{0}^{2}\sum_{k=-d}^{d}\sum_{j=d}^{m^{*}-1}j^{d}\leqslant C(m^{*})^{d+1},$$

(A.9)

where $C$ is a positive constant depending on $d$ and $\phi_{0}$.

Injecting either (A.8) or (A.9) in (A.7), we set $M_{1}=\mathcal{O}(m^{d+\frac{1}{2}})$ in the Laguerre case or $M_{1}=\mathcal{O}(m^{\frac{d}{2}+\frac{1}{2}})$ in the Hermite case.

Now, we apply the Talagrand inequality see Appendix B.2 with $\varepsilon=1/2$, it follows

$$\mathbb{E}\left[\left(\sup_{t\in S_{m}+S_{{m^{\prime}}},||t||=1}|\nu_{n,d}(t)|^{2}-4H^{2}\right)_{+}\right]\leqslant\frac{C_{1}}{n}\left(v\exp\left(-C_{2}\frac{nH^{2}}{v}\right)+C_{3}\frac{M_{1}^{2}}{n}\exp\left(-C_{4}\frac{nH}{M_{1}}\right)\right)$$

$${}:=\frac{C_{1}}{n}\left(U_{d}(m^{*})+V_{d}(m^{*})\right).$$

The Laguerre case. We have

$$U_{d}(m^{*})=c_{1}(m^{*})^{d}\exp\left(-C_{2}\frac{V_{m^{*},d}}{c_{1}(m^{*})^{d}}\right)$$

$$\text{and}\quad V_{d}(m^{*})=C_{3}c_{2}\frac{(m^{*})^{2d+1}}{n}\exp\left(-C_{4}\sqrt{n}\frac{\sqrt{V_{m^{*},d}}}{c_{2}(m^{*})^{d+\frac{1}{2}}}\right).$$

From (41) and the value of $m_{n}(d)$, we obtain

$$U_{d}(m^{*})\leqslant c_{1}{(m^{*})^{d}}\exp(-C_{2}^{\prime}m^{*}{{}^{\frac{1}{2}}})\quad\textrm{and}\quad V_{d}(m^{*})\leqslant C_{3}c_{2}(m^{*})^{{d+\frac{1}{2}}}\exp(-C_{4}^{\prime}{\sqrt{n}}{(m^{*})^{-\frac{d}{2}-\frac{1}{4}}}).$$

Using the value $m_{n}(d)$, it holds $(m^{*})^{d+1/2}\leqslant{n}/{\log^{3}(n)}$, which implies (recall $m^{*}=m\vee m^{\prime}$)

$$\sum_{m^{\prime}\in\mathcal{M}_{n,d}}V_{d}(m^{*})\leqslant C\sum_{m^{\prime}\in\mathcal{M}_{n,d}}(m^{*})^{{d+\frac{1}{2}}}\exp\left(-{C_{4}}\log^{2}(n)\right)\leqslant\Sigma_{d,2},$$

where $\Sigma_{d,2}$ is a constant depending only on $d$. Next, it follows

$$\sum_{m^{\prime}=1}^{n}U_{d}(m^{*})=\sum_{m^{\prime}=1}^{m}U_{d}(m^{*})+\sum_{m^{\prime}=m}^{n}U_{d}(m^{*})=c_{1}m^{d+1}\exp(-C_{2}^{\prime}m^{\frac{1}{2}})+\sum_{m^{\prime}=m}^{n}c_{1}(m^{\prime})^{d}\exp(-C_{2}^{\prime}m^{\prime\frac{1}{2}}).$$

The function $m\mapsto m^{d+1}\exp(-C_{2}^{\prime}m^{\frac{1}{2}})$ is bounded and the sum is finite on $m^{\prime}$, it holds

$$C_{1}\sum_{m^{\prime}=1}^{n}U_{d}(m^{*})\leqslant\Sigma_{d,1},\text{ where }\Sigma_{d,1}\text{ depends only on }d.$$

The Hermite case. Only the second term $V_{d}(m^{*})$ changes. Here, it is given by

$$V_{d}(m^{*})=C_{3}c_{2}\frac{(m^{*})^{d+1}}{n}\exp\left(-C_{3}\sqrt{n}\frac{\sqrt{V_{m^{*},d}}}{c_{2}(m^{*})^{\frac{d}{2}+\frac{1}{2}}}\right)\leqslant C_{3}c_{2}(m^{*})^{1/2}\exp(-C_{4}^{\prime}{\sqrt{n}(m^{*})^{-\frac{1}{4}}})$$

$${}\leqslant C_{3}c_{2}(m^{*})^{1/2}\exp(-C_{4}^{\prime}{(m^{*})^{\frac{d}{2}}}),$$

where we used (46) and the value of $m_{n}(d)$. We derive that $\sum_{m^{\prime}\in\mathcal{M}_{n,d}}V_{d}(m^{*})\leqslant\Sigma_{d,2}$.

Gathering all terms, it follows

$$\mathbb{E}\left[\left(\sup_{t\in S_{m}+S_{{m^{\prime}}},||t||=1}|\nu_{n,d}(t)|^{2}-4H^{2}\right)_{+}\right]\leqslant\frac{\Sigma}{n},\text{ where }\Sigma=\Sigma_{d,1}+\Sigma_{d,2}.$$

Plugging this in (A.3) gives the announced result.

A.6.2. Proof of part (ii). We use the Bernstein Inequality (see Appendix B.3) to prove the result. Define

$$Z_{i}^{(m)}=\sum_{j=0}^{m-1}(\varphi_{j}^{(d)}(X_{i}))^{2},\quad\textrm{then, }\quad\widehat{V}_{m,d}=\frac{1}{n}\sum_{i=1}^{n}Z_{i}^{(m)}$$

We select $s^{2}$ and $b$ such that $\textrm{Var}(Z_{i}^{(m)})\leqslant s^{2}$ and $|Z_{i}^{(m)}|\leqslant b$. By the computation of $M_{1}$ (see proof of part (i)), we set $b:=C^{*}m^{\alpha}$, with $\alpha=2d+1$ (Laguerre case) or $\alpha=d+1$ (Hermite case), where $C^{*}$ depends on $d$. For $s^{2}$, using that $\textrm{Var}(Z_{i}^{(m)})\leqslant\mathbb{E}[(Z_{i}^{(m)})^{2}]\leqslant b\sum_{j=0}^{m-1}\mathbb{E}\left[(\varphi_{j}^{(d)}(X_{i}))^{2}\right]=C^{*}m^{\alpha}V_{m,d}=:s^{2}$. Applying the Bernstein inequality, we have for $S_{n}=n(\widehat{V}_{m,d}-V_{m,d})$

$$\mathbb{P}\left(\left|\frac{S_{n}}{n}\right|\geqslant\sqrt{\frac{2xC^{*}m^{\alpha}V_{m,d}}{n}}+\frac{C^{*}m^{\alpha}x}{3n}\right)\leqslant 2e^{-x},\quad\forall x>0.$$

(A.10)

Choose $x=2\log(n)$ and define the set

$$\Omega:=\left\{m\in\mathcal{M}_{n,d},\ \frac{1}{n}|S_{n}|\leq 2\sqrt{\frac{C^{*}m^{\alpha}\log(n)V_{m,d}}{n}}+\frac{2C^{*}m^{\alpha}\log(n)}{3n}\right\}.$$

Consider the decomposition,

$$\mathbb{E}\left[\left({\textrm{pen}}_{d}(\widehat{m}_{n})-\widehat{{\textrm{pen}}}_{d}(\widehat{m}_{n})\right)_{+}\right]\leqslant\mathbb{E}\left[\left({\textrm{pen}}_{d}(\widehat{m}_{n})-\widehat{{\textrm{pen}}}_{d}(\widehat{m}_{n})\right)_{+}\mathbf{1}_{\Omega}\right]$$

$${}+\mathbb{E}\left[\left({\textrm{pen}}_{d}(\widehat{m}_{n})-\widehat{{\textrm{pen}}}_{d}(\widehat{m}_{n})\right)_{+}\mathbf{1}_{\Omega^{c}}\right].$$

Using $2xy\leqslant x^{2}+y^{2}$, we have on $\Omega$

$$|\widehat{V}_{\widehat{m},d}-V_{\widehat{m},d}|\leqslant\frac{V_{\widehat{m},d}}{2}+\frac{2C^{*}\widehat{m}^{\alpha}\log(n)}{n}+\frac{2C^{*}\widehat{m}^{\alpha}\log(n)}{3n}=\frac{V_{\widehat{m},d}}{2}+\frac{8}{3}\frac{C^{*}\widehat{m}^{\alpha}\log(n)}{n}.$$

The constraint on $m_{n}$ gives $\widehat{m}^{d+1/2}\leqslant C{n}/{(\log(n))^{2}}$ together with (41) giving $V_{\widehat{m},d}\geqslant c^{*}\widehat{m}^{d+1/2}$ give for $\alpha=2d+1$ (Laguerre case) that $\frac{8C^{*}}{3}\frac{\widehat{m}^{\alpha}\log(n)}{n}\leqslant\frac{8CC^{*}}{3c^{*}}\frac{V_{\widehat{m},d}}{\log(n)}\leqslant\frac{V_{\widehat{m},d}}{4},$ for $n$ large enough and

$$\mathbb{E}\left[\left({\textrm{pen}}_{d}(\widehat{m}_{n})-\widehat{{\textrm{pen}}}_{d}(\widehat{m}_{n})\right)_{+}\mathbf{1}_{\Omega}\right]\leqslant\frac{3}{4}\mathbb{E}[{\textrm{pen}}_{d}(\widehat{m}_{n})].$$

(A.11)

In the Hermite case ($\alpha=d+1$) computations are similar as $\widehat{m}^{d+1}\leqslant\widehat{m}^{2d+1}$. For the control on $\Omega^{c}$, we write, using (A.10),

$$\mathbb{E}\left[\left({\textrm{pen}}_{d}(\widehat{m}_{n})-\widehat{{\textrm{pen}}}_{d}(\widehat{m}_{n})\right)_{+}\mathbf{1}_{\Omega^{c}}\right]\leqslant 2\kappa\mathbb{P}(\Omega^{c})\leqslant 2\kappa\sum_{m\in\mathcal{M}_{n,d}}2e^{-2\log(n)}:=\frac{\Sigma_{2}}{n}.$$

(A.12)

Gathering (A.11) and (A.12), we get the desired result.

Appendices

APPENDIX B

SOME INEQUALITIES

2.1 B.2. Asymptotic Askey and Wainger Formula

From [2], we have for $\nu=4k+2\delta+2$, and $k$ large enough

$$|\ell_{k,(\delta)}(x/2)|\leqslant C\begin{cases}\textrm{a)}(x\nu)^{\delta/2}\quad\textrm{ if }\quad 0\leqslant x\leqslant 1/\nu\\ \textrm{b)}(x\nu)^{-1/4}\quad\textrm{ if }\quad 1/\nu\leqslant x\leqslant\nu/2\\ \textrm{c)}\nu^{-1/4}(\nu-x)^{-1/4}\quad\textrm{ if }\quad\nu/2\leqslant x\leqslant\nu-\nu^{1/3}\\ \textrm{d)}\nu^{-1/3}\quad\textrm{ if }\quad\nu-\nu^{1/3}\leqslant x\leqslant\nu+\nu^{1/3}\\ \textrm{e)}\nu^{-1/4}(x-\nu)^{-1/4}e^{-\gamma_{1}\nu^{-1/2}(x-\nu)^{3/2}}\quad\textrm{ if }\quad\nu+\nu^{1/3}\leqslant x\leqslant 3\nu/2\\ \textrm{f)}e^{-\gamma_{2}x}\quad\textrm{ if }\quad x\geqslant 3\nu/2,\end{cases}$$

where $\gamma_{1}$ and $\gamma_{2}$ are positive and fixed constants.

2.2 B.2. A Talagrand Inequality

The Talagrand inequalities have been proven in [41] and reworked by [26]. This version is given in [23]. Let $(X_{i})_{1\leqslant i\leqslant n}$ be independent real random variables and

$$\nu_{n}(t)=\frac{1}{n}\sum_{i=1}^{n}(t(X_{i})-\mathbb{E}[t(X_{i})])$$

for $t$ in $\mathcal{F}$ a class of measurable functions. If there exist $M_{1}$, $H$, and $v$ such that:

$$\sup_{t\in\mathcal{F}}||t||_{\infty}\leqslant M_{1},\quad\mathbb{E}[\sup_{t\in\mathcal{F}}\mid{\nu_{n}(t)}\mid]\leqslant H,\quad\sup_{t\in\mathcal{F}}\dfrac{1}{n}\sum_{i=1}^{n}\textrm{Var}(t(X_{i}))\leqslant v,$$

then, for $\varepsilon>0$,

$$\mathbb{E}\left[\left(\sup_{t\in\mathcal{F}}|{\nu^{2}_{n}(t)}|-2(1+2\varepsilon)H^{2}\right)_{+}\right]\leqslant\frac{4}{K_{1}}\left(\frac{v}{n}\exp\left(-K_{1}\varepsilon\frac{nH^{2}}{v}\right){}\right.$$

$${}\left.+\frac{49M_{1}^{2}}{K_{1}C^{2}(\varepsilon)n^{2}}\exp\left(-K_{1}^{\prime}C(\varepsilon)\sqrt{\varepsilon}\dfrac{nH}{M_{1}}\right)\right),$$

where $C(\varepsilon)=(\sqrt{1+\varepsilon}-1)\wedge 1$, $K_{1}=1/6$ and $K_{1}^{\prime}$ a universal constant.

2.3 B.3. Bernstein Inequality ([29])

Let $X_{1},\dots X_{n}$, $n$ independent real random variables. Assume there exist two constants $s^{2}$ and $b$, such that $\textrm{Var}(X_{i})\leqslant s^{2}$ and $|X_{i}|\leqslant b$. Then, for all $x$ positive, we have

$$\mathbb{P}\left(|{S_{n}}|\geqslant\sqrt{2ns^{2}x}+\frac{bx}{3}\right)\leqslant 2e^{-x}\quad\text{with}\quad S_{n}=\sum_{i=1}^{n}(X_{i}-\mathbb{E}[X_{i}]).$$

About this article

Cite this article

Comte, F., Duval, C. & Sacko, O. Optimal Adaptive Estimation on ${\mathbb{R}}$ or ${\mathbb{R}}^{{+}}$of the Derivatives of a Density. Math. Meth. Stat. 29, 1–31 (2020). https://doi.org/10.3103/S1066530720010020

Download citation

Received: 01 November 2020
Revised: 22 November 2020
Accepted: 01 February 2021
Published: 31 August 2021
Issue Date: January 2020
DOI: https://doi.org/10.3103/S1066530720010020

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions