Skip to main content
Log in

Optimal Adaptive Estimation on \({\mathbb{R}}\) or \({\mathbb{R}}^{{+}}\)of the Derivatives of a Density

  • Published:
Mathematical Methods of Statistics Aims and scope Submit manuscript

Abstract

In this paper, we consider the problem of estimating the \(d\)-th order derivative \(f^{(d)}\) of a density \(f\), relying on a sample of \(n\) i.i.d. observations \(X_{1},\dots,X_{n}\) with density \(f\) supported on \({\mathbb{R}}\) or \({\mathbb{R}}^{+}\). We propose projection estimators defined in the orthonormal Hermite or Laguerre bases and study their integrated \({\mathbb{L}}^{2}\)-risk. For the density \(f\) belonging to regularity spaces and for a projection space chosen with adequate dimension, we obtain rates of convergence for our estimators, which are optimal in the minimax sense. The optimal choice of the projection space depends on unknown parameters, so a general data-driven procedure is proposed to reach the bias-variance compromise automatically. We discuss the assumptions and the estimator is compared to the one obtained by simply differentiating the density estimator. Simulations are finally performed. They illustrate the good performances of the procedure and provide numerical comparison of projection and kernel estimators

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

REFERENCES

  1. M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Volume 55 of National Bureau of Standards Applied Mathematics Series. For sale by the Superintendent of Documents (U.S. Government Printing Office, Washington, D.C, 1964).

  2. R. Askey and S. Wainger, ‘‘Mean convergence of expansions in Laguerre and Hermite series,’’ Amer. J. Math. 87, 695–708 (1965).

    Article  MathSciNet  Google Scholar 

  3. J.-P. Baudry, C. Maugis, and B. Michel, ‘‘Slope heuristics: Overview and implementation,’’ Stat. Comput. 22 (2), 455–470 (2012).

    Article  MathSciNet  Google Scholar 

  4. D. Belomestny, F. Comte, and V. Genon-Catalot, ‘‘Nonparametric Laguerre estimation in the multiplicative censoring model,’’ Electron. J. Stat. 10 (2), 3114–3152 (2016).

    Article  MathSciNet  Google Scholar 

  5. D. Belomestny, F. Comte, and V. Genon-Catalot, ‘‘Correction to: Nonparametric laguerre estimation in the multiplicative censoring model,’’ Electronic Journal of Statistics 11 (2), 4845–4850 (2017).

    Article  MathSciNet  Google Scholar 

  6. D. Belomestny, F. Comte, and V. Genon-Catalot, ‘‘Sobolev-Hermite versus Sobolev nonparametric density estimation on \(\mathbb{R}\),’’ Ann. Inst. Statist. Math. 71 (1), 29–62 (2019).

    Article  MathSciNet  Google Scholar 

  7. B. Bercu, S. Capderou, and G. Durrieu, ‘‘Nonparametric recursive estimation of the derivative of the regression function with application to sea shores water quality,’’ Stat. Inference Stoch. Process. 22 (1), 17–40 (2019).

    Article  MathSciNet  Google Scholar 

  8. P. Bhattacharya, ‘‘Estimation of a probability density function and its derivatives,’’ Sankhyā: The Indian Journal of Statistics, Series A, 373–382 (1967).

  9. B. Bongioanni and J. L. Torrea, ‘‘What is a Sobolev space for the Laguerre function systems?’’ Studia Math. 192 (2), 147–172 (2009).

    Article  MathSciNet  Google Scholar 

  10. J. E. Chacón and T. Duong, ‘‘Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting,’’ Electronic Journal of Statistics 7, 499–532 (2013).

    Article  MathSciNet  Google Scholar 

  11. J. E. Chacón, T. Duong, and M. Wand, ‘‘Asymptotics for general multivariate kernel density derivative estimators,’’ Statistica Sinica, 807–840 (2011).

  12. Y. Cheng, ‘‘Mean shift, mode seeking, and clustering,’’ IEEE transactions on pattern analysis and machine intelligence 17 (8), 790–799 (1995).

    Article  Google Scholar 

  13. F. Comte and V. Genon-Catalot, ‘‘Laguerre and Hermite bases for inverse problems,’’ J. Korean Statist. Soc. 47 (3), 273–296 (2018).

    Article  MathSciNet  Google Scholar 

  14. F. Comte and N. Marie, ‘‘Bandwidth selection for the Wolverton–Wagner estimator,’’ J. Statist. Plann. Inference 207, 198–214 (2020).

    Article  MathSciNet  Google Scholar 

  15. S. Efromovich, ‘‘Simultaneous sharp estimation of functions and their derivatives,’’ Ann. Statist. 26 (1), 273–278 (1998).

    Article  MathSciNet  Google Scholar 

  16. S. Efromovich, ‘‘Nonparametric curve estimation: methods, theory, and applications,’’ Springer Series in Statistics (1999).

    MATH  Google Scholar 

  17. C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman, ‘‘Non-parametric inference for density modes,’’ J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 (1), 99–126 (2016).

    Article  MathSciNet  Google Scholar 

  18. E. Giné and R. Nickl, Mathematical Foundations of Infinite-Dimensional Statistical Models, Vol. 40 (Cambridge University Press. 2016).

    Book  Google Scholar 

  19. W. Härdle, J. Hart, J. S. Marron, and A. B. Tsybakov, ‘‘Bandwidth choice for average derivative estimation,’’ Journal of the American Statistical Association 87 (417), 218–226 (1992).

    MathSciNet  MATH  Google Scholar 

  20. W. Härdle, W. Hildenbrand, and M. Jerison, Empirical evidence on the law of demand (Econometrica: Journal of the Econometric Society, 1991), p. 1525–1549.

  21. W. Härdle and T. M. Stoker, ‘‘Investigating smooth multiple regression by the method of average derivatives,’’ Journal of the American statistical Association 84 (408), 986–995 (1989).

    MathSciNet  MATH  Google Scholar 

  22. J. Indritz, ‘‘An inequality for Hermite polynomials,’’ Proc. Amer. Math. Soc. 12, 981–983 (1961).

    Article  MathSciNet  Google Scholar 

  23. T. Klein and E. Rio, ‘‘Concentration around the mean for maxima of empirical processes,’’ Ann. Probab. 33 (3), 1060–1077 (2005).

    Article  MathSciNet  Google Scholar 

  24. R. Koekoek, ‘‘Generalizations of laguerre polynomials,’’ Journal of Mathematical Analysis and Applications 153 (2), 576–590 (1990).

    Article  MathSciNet  Google Scholar 

  25. C. Lacour, P. Massart, and V. Rivoirard, ‘‘Estimator selection: A new method with applications to kernel density estimation,’’ Sankhya A 79 (2), 298–335 (2017).

    Article  MathSciNet  Google Scholar 

  26. M. Ledoux, ‘‘On Talagrand’s deviation inequalities for product measures,’’ ESAIM Probab. Statist. 1, 63–87 (1995/1997).

    Article  MathSciNet  Google Scholar 

  27. O. V. Lepski, ‘‘A new approach to estimator selection,’’ Bernoulli 24 (4A), 2776–2810 (2018).

    Article  MathSciNet  Google Scholar 

  28. L. Markovich, ‘‘Gamma kernel estimation of the density derivative on the positive semi-axis by dependent data,’’ REVSTAT–Statistical Journal 14 (3), 327–348 (2016).

    MathSciNet  MATH  Google Scholar 

  29. P. Massart, Concentration Inequalities and Model Selection, Vol. 1896 of Lecture Notes in Mathematics, Springer, Berlin, Lectures from the 33rd Summer School on Probability Theory Held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard (2007).

  30. C. Park and K.-H. Kang, ‘‘Sizer analysis for the comparison of regression curves,’’ Computational Statistics and Data Analysis 52 (8), 3954–3970 (2008).

    Article  MathSciNet  Google Scholar 

  31. S. Plancade, ‘‘Estimation of the density of regression errors by pointwise model selection,’’ Math. Methods Statist. 18 (4), 341–374 (2009).

    Article  MathSciNet  Google Scholar 

  32. B. L. S. P. Rao, ‘‘Nonparametric estimation of the derivatives of a density by the method of wavelets,’’ Bull. Inform. Cybernet. 28 (1), 91–100 (1996).

    Article  MathSciNet  Google Scholar 

  33. H. Sasaki, Y.-K. Noh, G. Niu, and M. Sugiyama, ‘‘Direct density derivative estimation,’’ Neural Comput. 28 (6), 1101–1140 (2016).

    Article  MathSciNet  Google Scholar 

  34. E. Schmisser, ‘‘Nonparametric estimation of the derivatives of the stationary density for stationary processes,’’ ESAIM Probab. Stat. 17, 33–69 (2013).

    Article  MathSciNet  Google Scholar 

  35. E. F. Schuster, ‘‘Estimation of a probability density function and its derivatives,’’ The Annals of Mathematical Statistics 40 (4), 1187–1195 (1969).

    Article  MathSciNet  Google Scholar 

  36. W. Shen and S. Ghosal, ‘‘Posterior contraction rates of density derivative estimation,’’ Sankhya A 79 (2), 336–354 (2017).

    Article  MathSciNet  Google Scholar 

  37. B. W. Silverman, ‘‘Weak and strong uniform consistency of the kernel estimate of a density and its derivatives,’’ The Annals of Statistics, 177–184 (1978).

  38. R. Singh, ‘‘Mean squared errors of estimates of a density and its derivatives,’’ Biometrika 66 (1), 177–180 (1979).

    Article  MathSciNet  Google Scholar 

  39. R. S. Singh, ‘‘Applications of estimators of a density and its derivatives to certain statistical problems,’’ J. Roy. Statist. Soc. Ser. B 39 (3), 357–363 (1977).

    MathSciNet  MATH  Google Scholar 

  40. G. Szegö, Orthogonal polynomials. American Mathematical Society Colloquium Publications, Vol. 23 (Revised ed. American Mathematical Society, Providence, R.I., 1959).

  41. M. Talagrand, ‘‘New concentration inequalities in product spaces,’’ Invent. Math. 126 (3), 505–563 (1996).

    Article  MathSciNet  Google Scholar 

  42. A. B. Tsybakov, Introduction to Nonparametric Estimation. Springer Series in Statistics (Springer, New York. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats, 2009).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Comte.

APPENDIX A

PROOFS OF AUXILIARY RESULTS

A.1. Proof of Lemma 2.1

In the Hermite case \(\varphi_{j}=h_{j}\) and \(f:\mathbb{R}\mapsto[0,\infty),\) allowing \(d\) successive integration by parts, it holds that

$$a_{j}(f^{(d)})=\int\limits_{\mathbb{R}}f^{(d)}(x)h_{j}(x)dx=\left[\sum_{k=0}^{d-1}(-1)^{k}f^{(d-1-k)}(x)h_{j}^{(k)}(x)\right]^{+\infty}_{-\infty}+(-1)^{d}\int\limits_{\mathbb{R}}h_{j}^{(d)}(x)f(x)dx.$$
(A.1)

By definition for all \(j\geqslant 0\), \(h_{j}(x)=c_{j}H_{j}(x)e^{-\frac{x^{2}}{2}}\) where \(H_{j}\) is a polynomial. Then, its \(k\)th derivative, \(0\leqslant k\leqslant d-1\), is a polynomial multiplied by \(e^{-{x^{2}}/{2}}\) and \(\lim_{|x|\to+\infty}h_{j}^{(k)}(x)=0\). This together with (A2), gives that the bracket in (A.1) is null and the result follows.

Similarly in the Laguerre case, (A.1) holds integrating on \([0,\infty)\) instead of \(\mathbb{R}\) and replacing \(h_{j}\) by \(\ell_{j}\). The term in the bracket is null at 0 from (A3). It is also null at infinity using (A2) together with the fact that \(\ell_{j}\) are polynomials multiplied by \(e^{-x}\) leading similarly to \(\lim_{x\to\infty}f^{(d-1-k)}(x)\ell^{(k)}_{j}(x)=0\), \(0\leqslant k\leqslant d-1\), \(j\geqslant 0\). The result follows.

A.2. Proof of Lemma 2.2

We control the quantity

$$\sum_{j\geqslant 0}j^{s-d}\langle f^{(d)},h_{j}\rangle^{2}=\sum_{j=0}^{d-1}j^{s-d}\langle f^{(d)},h_{j}\rangle^{2}+\sum_{j\geqslant d}j^{s-d}\langle f^{(d)},h_{j}\rangle^{2}.$$
(A.2)

The first term is a constant which depending on \(d\). For the second term using Lemma 5.2, we obtain

$$\sum_{j\geqslant d}j^{s-d}\langle f^{(d)},h_{j}\rangle^{2}=\sum_{j\geqslant d}j^{s-d}\left(\sum_{k=-d}^{d}b_{k,j}^{(d)}\int h_{j+k}(x)f(x)dx\right)^{2}$$

$${}\leqslant C_{d}\sum_{j\geqslant d}j^{s}\sum_{k=-d}^{d}\left(\int h_{j+k}(x)f(x)dx\right)^{2}=C_{d}\sum_{k=-d}^{d}\sum_{j\geqslant d}j^{s}\langle h_{j+k},f\rangle^{2}$$

$${}=C_{d}\sum_{k=-d}^{d}\left(\sum_{j\geqslant d+k}|j-k|^{s}\langle h_{j},f\rangle^{2}\right)\leqslant C_{d}\sum_{k=-d}^{d}\left(\sum_{j\geqslant 0}2^{s}j^{s}\langle h_{j},f\rangle^{2}\right)=(2d+1)2^{s}DC_{d}.$$

Inserting this in (59), we obtain the announced result.

A.3. Proof of Lemma 2.3

We establish the result for \(d=1\), the general case is an immediate consequence. It follows from the definition of \(\widetilde{W}_{L}^{s}(D)\) that \((\theta^{\prime})^{(j)}\), \(0\leqslant j\leqslant s-1\) are in \(C([0,\infty))\). Moreover, it holds that \(x\mapsto x^{k/2}(\theta^{\prime})^{(j)}(x)\in\mathbb{L}^{2}(\mathbb{R}^{+})\) for all \(0\leqslant j<k\leqslant s-1\). The case \(k=j\) is obtained using that \(\theta^{(j)}\) is continuous on \(C([0,\infty))\) and that \(x\mapsto x^{(j+1)/2}(\theta^{\prime})^{(j)}(x)\in\mathbb{L}^{2}(\mathbb{R}^{+}).\) It follows that

$$|||\theta^{\prime}|||_{s}^{2}=\sum_{j=0}^{s-1}\Big{|}\Big{|}x^{j/2}\sum_{k=0}^{j}\binom{j}{k}(\theta^{\prime})^{(k)}\Big{|}\Big{|}^{2}\leqslant 2\sum_{j=0}^{s-1}\Big{|}\Big{|}x^{j/2}\sum_{k=0}^{j-1}\binom{j}{k}(\theta^{\prime})^{(k)}\Big{|}\Big{|}^{2}+2\sum_{j=0}^{s-1}\Big{|}\Big{|}x^{j/2}(\theta^{\prime})^{(j)}\Big{|}\Big{|}^{2}$$

$${}\leqslant C+2\sum_{j=0}^{s-1}||x^{(j+1)/2}(\theta^{\prime})^{(j)}(x)||^{2}<\infty,$$

where \(C\) depends on \(D\). Finally, using the equivalence of the norms \(|.|_{s}\) and \(|||.|||_{s}\), the value of \(D^{\prime}\) follows from the latter inequality.

A.4. Proof of Lemma 5.1

Consider the decomposition

$$\int\limits_{0}^{+\infty}x^{-k}(\ell_{j-k,(k)}(x/2))^{2}f(x/2)dx=\sum_{i=1}^{6}I_{i},$$

where for \(\nu=4j-2k+2\), \(j\geqslant k\), we used the decomposition \((0,\infty)=(0,\frac{1}{\nu}]\cup(\frac{1}{\nu},\frac{\nu}{2}]\cup(\frac{\nu}{2},\nu-\nu^{1/3}]\cup(\nu-\nu^{1/3},\nu+\nu^{/13}]\cup(\nu+\nu^{1/3},3\nu/2]\cup(3\nu/2,\infty).\) Using [2] (see Appendix B.1) and straightforward inequalities give

$$I_{1}\lesssim\int\limits_{0}^{\frac{1}{\nu}}x^{-k}(x\nu)^{k}f(x/2)dx\leqslant\int\limits_{0}^{\frac{1}{\nu}}x^{-k}(x\nu)^{-1/2}f(x/2)dx\lesssim\nu^{-1/2}\mathbb{E}[X^{-k-1/2}],$$

$$I_{2}\lesssim\int\limits_{1/\nu}^{\frac{\nu}{2}}x^{-k}((x\nu)^{-1/4})^{2}f(x/2)dx=\nu^{-1/2}\int\limits_{1/\nu}^{\frac{\nu}{2}}x^{-k-1/2}f(x/2)dx\leqslant\nu^{-1/2}\mathbb{E}[X^{-k-1/2}],$$

$$I_{3}\lesssim\int\limits_{\frac{\nu}{2}}^{\nu-\nu^{1/3}}x^{-k}(\nu^{-1/4}(\nu-x)^{-1/4})^{2}f(x/2)dx=\nu^{-1/2}\int\limits_{\frac{\nu}{2}}^{\nu-\nu^{1/3}}x^{-k}(\nu-x)^{-1/2}f(x/2)dx\lesssim\nu^{-1/2},$$

$$I_{4}\lesssim\int\limits_{\nu-\nu^{1/3}}^{\nu+\nu^{1/3}}x^{-k}(\nu^{-1/3})^{2}f(x/2)dx\leqslant\nu^{-2/3}\int\limits_{\frac{\nu}{2}}^{\nu+\nu^{1/3}}x^{-k}f(x/2)dx\lesssim\nu^{-1/2}\nu^{-k}\leqslant\nu^{-1/2},$$

$$I_{5}\lesssim\int\limits_{\nu+\nu^{1/3}}^{3\nu/2}x^{-k}\nu^{-1/2}(x-\nu)^{-1/2}e^{-2\gamma_{1}\nu^{-1/2}(x-\nu)^{3/2}}f(x/2)dx\lesssim\nu^{-1/2}\nu^{-1/6}\nu^{-k}\int f(x/2)dx\lesssim\nu^{-1/2},$$

$$I_{6}\lesssim\int\limits_{3\nu/2}^{+\infty}x^{-k}e^{-2\gamma_{2}x}f(x/2)dx\lesssim e^{-3\gamma_{2}\nu/2}=\mathcal{O}(\nu^{-1/2}).$$

Gathering these inequalities give the announced result.

A.5. Proof of Lemma 5.2

The result is obtained by induction on \(d\). If \(d=1\), \(h_{j}^{\prime}\) is given by (5), with \(b^{(1)}_{-1,j-1}=j^{1/2}/\sqrt{2}\), \(b_{0,j}=0\) and \(b^{(1)}_{1,j}=(j+1)^{1/2}/\sqrt{2}\), \(\forall j\geqslant 1\). Thus, it holds \(b_{k,j}^{(1)}=\mathcal{O}(j^{1/2})\) and (37) is satisfied for \(d=1\). Let \(\text{P}(d)\) the proposition given by Eq. (37) and assume \(\text{P}(d)\) holds and we establish \(\text{P}(d+1)\). It holds using successively \(\text{P}(d)\) and (5) that

$$h_{j}^{(d+1)}(x)=\sum_{k=-d}^{d}b_{k,j}^{(d)}\left[\frac{\sqrt{j+k}}{\sqrt{2}}h_{j+k-1}-\frac{\sqrt{j+k+1}}{\sqrt{2}}h_{j+k+1}\right]$$

$${}=\sum_{k^{\prime}=-d-1}^{d-1}b_{k^{\prime}+1,j}^{(d)}\frac{\sqrt{j+k^{\prime}+1}}{\sqrt{2}}h_{j+k^{\prime}}-\sum_{k^{\prime}=-d+1}^{d+1}b_{k^{\prime}-1,j}^{(d)}\frac{\sqrt{j+k^{\prime}}}{\sqrt{2}}h_{j+k^{\prime}}:=\sum_{k=-d-1}^{d+1}b_{k,j}^{(d+1)}h_{j+k^{\prime}},$$

where \(b^{(d)}_{k,j}=\mathcal{O}(j^{d/2})\), \(\forall j\geqslant d\geqslant|k|\) and \(b_{k,j}^{(d+1)}=b_{k+1,j}^{(d)}\frac{\sqrt{j+k+1}}{\sqrt{2}}\mathbf{1}_{|k|\leqslant d-1}-b_{k-1,j}^{(d)}\frac{\sqrt{j+k}}{\sqrt{2}}\mathbf{1}_{|k|\leqslant d+1}\). It follows that \(|b_{k,j}^{(d+1)}|\leqslant 2\sqrt{({j+d+1})/{2}}j^{\frac{d}{2}}\leqslant C_{d}j^{\frac{d+1}{2}},\) \(|k|\leqslant d\leqslant j\), which completes the proof.

Proof of Lemma 5.4

A.6.1. Proof of part (i). First, it holds that

$$\mathbb{E}\left[\left(\sup_{t\in S_{m}+S_{\widehat{m}},||t||=1}|\nu_{n,d}(t)|^{2}-p(m,\widehat{m}_{n})\right)_{+}\right]$$

$${}\leqslant\sum_{m^{\prime}\in\mathcal{M}_{n,d}}\mathbb{E}\left[\left(\sup_{t\in S_{m}+S_{{m^{\prime}}},||t||=1}|\nu_{n,d}(t)|^{2}-p(m,{m^{\prime}})\right)_{+}\right],$$
(A.3)

which we bound applying a Talagrand Inequality (see Appendix B.2). Following notations of Appendix B.2, we have three terms \(H^{2},v\), and \(M_{1}\) to compute. Let us denote by \(m^{*}=m\vee m^{\prime}\), for \(t\in S_{m}+S_{{m^{\prime}}}\), \(||t||=1\), it holds

$$||t||^{2}=\left|\left|\sum_{j=0}^{m^{*}-1}a_{j}\varphi_{j}\right|\right|^{2}=\sum_{j=0}^{m^{*}-1}a_{j}^{2}=1.$$

Computing \(\boldsymbol{H}^{\mathbf{2}}\). By the linearity of \(\nu_{n,d}\) and the Cauchy–Schwarz inequality, we have

$$\nu_{n,d}(t)^{2}=\left(\sum_{j=0}^{m^{*}-1}a_{j}\nu_{n,d}(\varphi_{j})\right)^{2}\leqslant\sum_{j=0}^{m^{*}-1}a_{j}^{2}\sum_{j=0}^{m^{*}-1}\nu_{n,d}^{2}(\varphi_{j})=\sum_{j=0}^{m^{*}-1}\nu_{n,d}^{2}(\varphi_{j}).$$

One can check that the latter is an equality for \(a_{j}=\nu_{n,d}(\varphi_{j}).\) Therefore, taking expectation, it follows

$$\mathbb{E}\left[\sup_{t\in S_{m}^{*},||t||=1}\nu^{2}_{n,d}(t)\right]=\sum_{j=0}^{m^{*}-1}\textrm{Var}(\nu_{n,d}(\varphi_{j}))=\frac{1}{n}\sum_{j=0}^{m^{*}-1}\textrm{Var}(\varphi_{j}^{(d)}(X_{1}))$$

$${}\leqslant\frac{1}{n}\sum_{j=0}^{m^{*}-1}\mathbb{E}\left[\varphi_{j}^{(d)}(X_{1})^{2}\right]=\frac{V_{m^{*},d}}{n}=:H^{2}.$$

Computing \(\boldsymbol{v}\). It holds for \(t\in S_{m}+S_{{m^{\prime}}}\), \(||t||=1\),

$$\textrm{Var}\left((-1)^{d}t^{(d)}(X_{1})\right)\leqslant\int t^{(d)}(x)^{2}f(x)dx=\int\left(\sum_{j=0}^{m^{*}-1}a_{j}\varphi_{j}^{(d)}(x)\right)^{2}f(x)dx$$

$${}\leqslant 2\int\left(\sum_{j=0}^{d-1}a_{j}\varphi_{j}^{(d)}(x)\right)^{2}f(x)dx+2\int\left(\sum_{j=d}^{m^{*}-1}a_{j}\varphi_{j}^{(d)}(x)\right)^{2}f(x)dx.$$
(A.4)

The first term of the previous inequality is a constant depending only on \(d\). For the second term, we consider separately the Laguerre and Hermite cases.

The Laguerre case (\(\varphi_{j}=\ell_{j}\)). Using (36) and the Cauchy–Schwarz inequality, it holds that

$$\int\left(\sum_{j=d}^{m^{*}-1}a_{j}\ell_{j}^{(d)}(x)\right)^{2}f(x)dx\leqslant 3^{d}\sum_{k=0}^{d}\binom{d}{k}\int\left(\sum_{j=d}^{m^{*}-1}a_{j}\left(\frac{j!}{(j-k)!}\right)^{\frac{1}{2}}x^{-\frac{k}{2}}\ell_{j-k,(k)}(x)\right)^{2}f(x)dx$$

$${}\leqslant 3^{d}\sum_{k=0}^{d}\binom{d}{k}\sup_{x\in\mathbb{R}^{+}}\frac{f(x)}{x^{k}}\sum_{j=d}^{m^{*}-1}a_{j}^{2}\frac{j!}{(j-k)!}\leqslant C(d)(m^{*})^{d},$$
(A.5)

where we used the orthonormality of \((\ell_{j,(k)})_{j\geqslant 0}\) and where \(C(d)\) is a constant depending only on \(d\) and \(\sup_{x\in\mathbb{R}^{+}}\frac{f(x)}{x^{k}}\).

The Hermite case (\(\varphi_{j}=h_{j}\)). Similarly, using Lemma 5.2 and the orthonormality of \(h_{j}\), it follows

$$\int\left(\sum_{j=d}^{m^{*}-1}a_{j}h_{j}^{(d)}(x)\right)^{2}f(x)dx\leqslant(2d+1)\sum_{k=-d}^{d}\int\left(\sum_{j=d}^{m^{*}-1}a_{j}b_{k,j}h_{j+k}(x)\right)^{2}f(x)dx$$

$${}\leqslant C(d)||f||_{\infty}(m^{*})^{d}.$$
(A.6)

Plugging (A.5) or (A.6) in (A.4), we set in the two cases \(v:=c_{1}(m^{*})^{d}\) where \(c_{1}\) depends on \(d\) and either on \(\sup_{x\in\mathbb{R}^{+}}\frac{f(x)}{x^{k}}\) (Laguerre case) or \(||f||_{\infty}\) (Hermite case).

Computing \(\boldsymbol{M}_{\mathbf{1}}\). The Cauchy Schwarz Inequality and \(||t||=1\) give

$$||(-1)^{d}t^{(d)}||_{\infty}=\left|\left|\sum_{j=0}^{m^{*}-1}(-1)^{d}a_{j}\varphi_{j}^{(d)}\right|\right|_{\infty}\leqslant\sup_{x\in\mathbb{R}}\sqrt{\sum_{j=0}^{m^{*}-1}\varphi_{j}^{(d)}(x)^{2}}.$$
(A.7)

The Laguerre case. We use the following Lemma whose proof is a consequence of (2) and an induction on \(d\).

Lemma A.1. For \(\ell_{j}\) given in (1), the \(d\)th derivative of \(\ell_{j}\) is such that \(||\ell_{j}^{(d)}||_{\infty}\leqslant C_{d}(j+1)^{d}\), \(\forall j\geqslant 0\) and where \(C_{d}\) is a positive constant depending on \(d\) .

Using Lemma A.1, we obtain

$$\sum_{j=0}^{m^{*}-1}\ell_{j}^{(d)}(x)^{2}\leqslant C_{d}^{2}(m^{*})^{2d+1}.$$
(A.8)

The Hermite case. The \(d\) first terms in the sum in (A.7) can be bounded by a constant depending only on \(d\). For the remaining terms, Lemma 5.2 and \(||h_{j}||_{\infty}\leqslant\phi_{0}\) (see (4)) give

$$\sum_{j=d}^{m^{*}-1}[h_{j}^{(d)}(x)]^{2}\leqslant C_{d}^{2}\phi_{0}^{2}\sum_{k=-d}^{d}\sum_{j=d}^{m^{*}-1}j^{d}\leqslant C(m^{*})^{d+1},$$
(A.9)

where \(C\) is a positive constant depending on \(d\) and \(\phi_{0}\).

Injecting either (A.8) or (A.9) in (A.7), we set \(M_{1}=\mathcal{O}(m^{d+\frac{1}{2}})\) in the Laguerre case or \(M_{1}=\mathcal{O}(m^{\frac{d}{2}+\frac{1}{2}})\) in the Hermite case.

Now, we apply the Talagrand inequality see Appendix B.2 with \(\varepsilon=1/2\), it follows

$$\mathbb{E}\left[\left(\sup_{t\in S_{m}+S_{{m^{\prime}}},||t||=1}|\nu_{n,d}(t)|^{2}-4H^{2}\right)_{+}\right]\leqslant\frac{C_{1}}{n}\left(v\exp\left(-C_{2}\frac{nH^{2}}{v}\right)+C_{3}\frac{M_{1}^{2}}{n}\exp\left(-C_{4}\frac{nH}{M_{1}}\right)\right)$$

$${}:=\frac{C_{1}}{n}\left(U_{d}(m^{*})+V_{d}(m^{*})\right).$$

The Laguerre case. We have

$$U_{d}(m^{*})=c_{1}(m^{*})^{d}\exp\left(-C_{2}\frac{V_{m^{*},d}}{c_{1}(m^{*})^{d}}\right)$$

$$\text{and}\quad V_{d}(m^{*})=C_{3}c_{2}\frac{(m^{*})^{2d+1}}{n}\exp\left(-C_{4}\sqrt{n}\frac{\sqrt{V_{m^{*},d}}}{c_{2}(m^{*})^{d+\frac{1}{2}}}\right).$$

From (41) and the value of \(m_{n}(d)\), we obtain

$$U_{d}(m^{*})\leqslant c_{1}{(m^{*})^{d}}\exp(-C_{2}^{\prime}m^{*}{{}^{\frac{1}{2}}})\quad\textrm{and}\quad V_{d}(m^{*})\leqslant C_{3}c_{2}(m^{*})^{{d+\frac{1}{2}}}\exp(-C_{4}^{\prime}{\sqrt{n}}{(m^{*})^{-\frac{d}{2}-\frac{1}{4}}}).$$

Using the value \(m_{n}(d)\), it holds \((m^{*})^{d+1/2}\leqslant{n}/{\log^{3}(n)}\), which implies (recall \(m^{*}=m\vee m^{\prime}\))

$$\sum_{m^{\prime}\in\mathcal{M}_{n,d}}V_{d}(m^{*})\leqslant C\sum_{m^{\prime}\in\mathcal{M}_{n,d}}(m^{*})^{{d+\frac{1}{2}}}\exp\left(-{C_{4}}\log^{2}(n)\right)\leqslant\Sigma_{d,2},$$

where \(\Sigma_{d,2}\) is a constant depending only on \(d\). Next, it follows

$$\sum_{m^{\prime}=1}^{n}U_{d}(m^{*})=\sum_{m^{\prime}=1}^{m}U_{d}(m^{*})+\sum_{m^{\prime}=m}^{n}U_{d}(m^{*})=c_{1}m^{d+1}\exp(-C_{2}^{\prime}m^{\frac{1}{2}})+\sum_{m^{\prime}=m}^{n}c_{1}(m^{\prime})^{d}\exp(-C_{2}^{\prime}m^{\prime\frac{1}{2}}).$$

The function \(m\mapsto m^{d+1}\exp(-C_{2}^{\prime}m^{\frac{1}{2}})\) is bounded and the sum is finite on \(m^{\prime}\), it holds

$$C_{1}\sum_{m^{\prime}=1}^{n}U_{d}(m^{*})\leqslant\Sigma_{d,1},\text{ where }\Sigma_{d,1}\text{ depends only on }d.$$

The Hermite case. Only the second term \(V_{d}(m^{*})\) changes. Here, it is given by

$$V_{d}(m^{*})=C_{3}c_{2}\frac{(m^{*})^{d+1}}{n}\exp\left(-C_{3}\sqrt{n}\frac{\sqrt{V_{m^{*},d}}}{c_{2}(m^{*})^{\frac{d}{2}+\frac{1}{2}}}\right)\leqslant C_{3}c_{2}(m^{*})^{1/2}\exp(-C_{4}^{\prime}{\sqrt{n}(m^{*})^{-\frac{1}{4}}})$$

$${}\leqslant C_{3}c_{2}(m^{*})^{1/2}\exp(-C_{4}^{\prime}{(m^{*})^{\frac{d}{2}}}),$$

where we used (46) and the value of \(m_{n}(d)\). We derive that \(\sum_{m^{\prime}\in\mathcal{M}_{n,d}}V_{d}(m^{*})\leqslant\Sigma_{d,2}\).

Gathering all terms, it follows

$$\mathbb{E}\left[\left(\sup_{t\in S_{m}+S_{{m^{\prime}}},||t||=1}|\nu_{n,d}(t)|^{2}-4H^{2}\right)_{+}\right]\leqslant\frac{\Sigma}{n},\text{ where }\Sigma=\Sigma_{d,1}+\Sigma_{d,2}.$$

Plugging this in (A.3) gives the announced result.

A.6.2. Proof of part (ii). We use the Bernstein Inequality (see Appendix B.3) to prove the result. Define

$$Z_{i}^{(m)}=\sum_{j=0}^{m-1}(\varphi_{j}^{(d)}(X_{i}))^{2},\quad\textrm{then, }\quad\widehat{V}_{m,d}=\frac{1}{n}\sum_{i=1}^{n}Z_{i}^{(m)}$$

We select \(s^{2}\) and \(b\) such that \(\textrm{Var}(Z_{i}^{(m)})\leqslant s^{2}\) and \(|Z_{i}^{(m)}|\leqslant b\). By the computation of \(M_{1}\) (see proof of part (i)), we set \(b:=C^{*}m^{\alpha}\), with \(\alpha=2d+1\) (Laguerre case) or \(\alpha=d+1\) (Hermite case), where \(C^{*}\) depends on \(d\). For \(s^{2}\), using that \(\textrm{Var}(Z_{i}^{(m)})\leqslant\mathbb{E}[(Z_{i}^{(m)})^{2}]\leqslant b\sum_{j=0}^{m-1}\mathbb{E}\left[(\varphi_{j}^{(d)}(X_{i}))^{2}\right]=C^{*}m^{\alpha}V_{m,d}=:s^{2}\). Applying the Bernstein inequality, we have for \(S_{n}=n(\widehat{V}_{m,d}-V_{m,d})\)

$$\mathbb{P}\left(\left|\frac{S_{n}}{n}\right|\geqslant\sqrt{\frac{2xC^{*}m^{\alpha}V_{m,d}}{n}}+\frac{C^{*}m^{\alpha}x}{3n}\right)\leqslant 2e^{-x},\quad\forall x>0.$$
(A.10)

Choose \(x=2\log(n)\) and define the set

$$\Omega:=\left\{m\in\mathcal{M}_{n,d},\ \frac{1}{n}|S_{n}|\leq 2\sqrt{\frac{C^{*}m^{\alpha}\log(n)V_{m,d}}{n}}+\frac{2C^{*}m^{\alpha}\log(n)}{3n}\right\}.$$

Consider the decomposition,

$$\mathbb{E}\left[\left({\textrm{pen}}_{d}(\widehat{m}_{n})-\widehat{{\textrm{pen}}}_{d}(\widehat{m}_{n})\right)_{+}\right]\leqslant\mathbb{E}\left[\left({\textrm{pen}}_{d}(\widehat{m}_{n})-\widehat{{\textrm{pen}}}_{d}(\widehat{m}_{n})\right)_{+}\mathbf{1}_{\Omega}\right]$$

$${}+\mathbb{E}\left[\left({\textrm{pen}}_{d}(\widehat{m}_{n})-\widehat{{\textrm{pen}}}_{d}(\widehat{m}_{n})\right)_{+}\mathbf{1}_{\Omega^{c}}\right].$$

Using \(2xy\leqslant x^{2}+y^{2}\), we have on \(\Omega\)

$$|\widehat{V}_{\widehat{m},d}-V_{\widehat{m},d}|\leqslant\frac{V_{\widehat{m},d}}{2}+\frac{2C^{*}\widehat{m}^{\alpha}\log(n)}{n}+\frac{2C^{*}\widehat{m}^{\alpha}\log(n)}{3n}=\frac{V_{\widehat{m},d}}{2}+\frac{8}{3}\frac{C^{*}\widehat{m}^{\alpha}\log(n)}{n}.$$

The constraint on \(m_{n}\) gives \(\widehat{m}^{d+1/2}\leqslant C{n}/{(\log(n))^{2}}\) together with (41) giving \(V_{\widehat{m},d}\geqslant c^{*}\widehat{m}^{d+1/2}\) give for \(\alpha=2d+1\) (Laguerre case) that \(\frac{8C^{*}}{3}\frac{\widehat{m}^{\alpha}\log(n)}{n}\leqslant\frac{8CC^{*}}{3c^{*}}\frac{V_{\widehat{m},d}}{\log(n)}\leqslant\frac{V_{\widehat{m},d}}{4},\) for \(n\) large enough and

$$\mathbb{E}\left[\left({\textrm{pen}}_{d}(\widehat{m}_{n})-\widehat{{\textrm{pen}}}_{d}(\widehat{m}_{n})\right)_{+}\mathbf{1}_{\Omega}\right]\leqslant\frac{3}{4}\mathbb{E}[{\textrm{pen}}_{d}(\widehat{m}_{n})].$$
(A.11)

In the Hermite case (\(\alpha=d+1\)) computations are similar as \(\widehat{m}^{d+1}\leqslant\widehat{m}^{2d+1}\). For the control on \(\Omega^{c}\), we write, using (A.10),

$$\mathbb{E}\left[\left({\textrm{pen}}_{d}(\widehat{m}_{n})-\widehat{{\textrm{pen}}}_{d}(\widehat{m}_{n})\right)_{+}\mathbf{1}_{\Omega^{c}}\right]\leqslant 2\kappa\mathbb{P}(\Omega^{c})\leqslant 2\kappa\sum_{m\in\mathcal{M}_{n,d}}2e^{-2\log(n)}:=\frac{\Sigma_{2}}{n}.$$
(A.12)

Gathering (A.11) and (A.12), we get the desired result.

Appendices

APPENDIX B

SOME INEQUALITIES

2.1 B.2. Asymptotic Askey and Wainger Formula

From [2], we have for \(\nu=4k+2\delta+2\), and \(k\) large enough

$$|\ell_{k,(\delta)}(x/2)|\leqslant C\begin{cases}\textrm{a)}(x\nu)^{\delta/2}\quad\textrm{ if }\quad 0\leqslant x\leqslant 1/\nu\\ \textrm{b)}(x\nu)^{-1/4}\quad\textrm{ if }\quad 1/\nu\leqslant x\leqslant\nu/2\\ \textrm{c)}\nu^{-1/4}(\nu-x)^{-1/4}\quad\textrm{ if }\quad\nu/2\leqslant x\leqslant\nu-\nu^{1/3}\\ \textrm{d)}\nu^{-1/3}\quad\textrm{ if }\quad\nu-\nu^{1/3}\leqslant x\leqslant\nu+\nu^{1/3}\\ \textrm{e)}\nu^{-1/4}(x-\nu)^{-1/4}e^{-\gamma_{1}\nu^{-1/2}(x-\nu)^{3/2}}\quad\textrm{ if }\quad\nu+\nu^{1/3}\leqslant x\leqslant 3\nu/2\\ \textrm{f)}e^{-\gamma_{2}x}\quad\textrm{ if }\quad x\geqslant 3\nu/2,\end{cases}$$

where \(\gamma_{1}\) and \(\gamma_{2}\) are positive and fixed constants.

2.2 B.2. A Talagrand Inequality

The Talagrand inequalities have been proven in [41] and reworked by [26]. This version is given in [23]. Let \((X_{i})_{1\leqslant i\leqslant n}\) be independent real random variables and

$$\nu_{n}(t)=\frac{1}{n}\sum_{i=1}^{n}(t(X_{i})-\mathbb{E}[t(X_{i})])$$

for \(t\) in \(\mathcal{F}\) a class of measurable functions. If there exist \(M_{1}\), \(H\), and \(v\) such that:

$$\sup_{t\in\mathcal{F}}||t||_{\infty}\leqslant M_{1},\quad\mathbb{E}[\sup_{t\in\mathcal{F}}\mid{\nu_{n}(t)}\mid]\leqslant H,\quad\sup_{t\in\mathcal{F}}\dfrac{1}{n}\sum_{i=1}^{n}\textrm{Var}(t(X_{i}))\leqslant v,$$

then, for \(\varepsilon>0\),

$$\mathbb{E}\left[\left(\sup_{t\in\mathcal{F}}|{\nu^{2}_{n}(t)}|-2(1+2\varepsilon)H^{2}\right)_{+}\right]\leqslant\frac{4}{K_{1}}\left(\frac{v}{n}\exp\left(-K_{1}\varepsilon\frac{nH^{2}}{v}\right){}\right.$$
$${}\left.+\frac{49M_{1}^{2}}{K_{1}C^{2}(\varepsilon)n^{2}}\exp\left(-K_{1}^{\prime}C(\varepsilon)\sqrt{\varepsilon}\dfrac{nH}{M_{1}}\right)\right),$$

where \(C(\varepsilon)=(\sqrt{1+\varepsilon}-1)\wedge 1\), \(K_{1}=1/6\) and \(K_{1}^{\prime}\) a universal constant.

2.3 B.3. Bernstein Inequality ([29])

Let \(X_{1},\dots X_{n}\), \(n\) independent real random variables. Assume there exist two constants \(s^{2}\) and \(b\), such that \(\textrm{Var}(X_{i})\leqslant s^{2}\) and \(|X_{i}|\leqslant b\). Then, for all \(x\) positive, we have

$$\mathbb{P}\left(|{S_{n}}|\geqslant\sqrt{2ns^{2}x}+\frac{bx}{3}\right)\leqslant 2e^{-x}\quad\text{with}\quad S_{n}=\sum_{i=1}^{n}(X_{i}-\mathbb{E}[X_{i}]).$$

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Comte, F., Duval, C. & Sacko, O. Optimal Adaptive Estimation on \({\mathbb{R}}\) or \({\mathbb{R}}^{{+}}\)of the Derivatives of a Density. Math. Meth. Stat. 29, 1–31 (2020). https://doi.org/10.3103/S1066530720010020

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S1066530720010020

Keywords:

Navigation