Abstract
In this paper, we consider the problem of estimating the \(d\)-th order derivative \(f^{(d)}\) of a density \(f\), relying on a sample of \(n\) i.i.d. observations \(X_{1},\dots,X_{n}\) with density \(f\) supported on \({\mathbb{R}}\) or \({\mathbb{R}}^{+}\). We propose projection estimators defined in the orthonormal Hermite or Laguerre bases and study their integrated \({\mathbb{L}}^{2}\)-risk. For the density \(f\) belonging to regularity spaces and for a projection space chosen with adequate dimension, we obtain rates of convergence for our estimators, which are optimal in the minimax sense. The optimal choice of the projection space depends on unknown parameters, so a general data-driven procedure is proposed to reach the bias-variance compromise automatically. We discuss the assumptions and the estimator is compared to the one obtained by simply differentiating the density estimator. Simulations are finally performed. They illustrate the good performances of the procedure and provide numerical comparison of projection and kernel estimators
Similar content being viewed by others
REFERENCES
M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Volume 55 of National Bureau of Standards Applied Mathematics Series. For sale by the Superintendent of Documents (U.S. Government Printing Office, Washington, D.C, 1964).
R. Askey and S. Wainger, ‘‘Mean convergence of expansions in Laguerre and Hermite series,’’ Amer. J. Math. 87, 695–708 (1965).
J.-P. Baudry, C. Maugis, and B. Michel, ‘‘Slope heuristics: Overview and implementation,’’ Stat. Comput. 22 (2), 455–470 (2012).
D. Belomestny, F. Comte, and V. Genon-Catalot, ‘‘Nonparametric Laguerre estimation in the multiplicative censoring model,’’ Electron. J. Stat. 10 (2), 3114–3152 (2016).
D. Belomestny, F. Comte, and V. Genon-Catalot, ‘‘Correction to: Nonparametric laguerre estimation in the multiplicative censoring model,’’ Electronic Journal of Statistics 11 (2), 4845–4850 (2017).
D. Belomestny, F. Comte, and V. Genon-Catalot, ‘‘Sobolev-Hermite versus Sobolev nonparametric density estimation on \(\mathbb{R}\),’’ Ann. Inst. Statist. Math. 71 (1), 29–62 (2019).
B. Bercu, S. Capderou, and G. Durrieu, ‘‘Nonparametric recursive estimation of the derivative of the regression function with application to sea shores water quality,’’ Stat. Inference Stoch. Process. 22 (1), 17–40 (2019).
P. Bhattacharya, ‘‘Estimation of a probability density function and its derivatives,’’ Sankhyā: The Indian Journal of Statistics, Series A, 373–382 (1967).
B. Bongioanni and J. L. Torrea, ‘‘What is a Sobolev space for the Laguerre function systems?’’ Studia Math. 192 (2), 147–172 (2009).
J. E. Chacón and T. Duong, ‘‘Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting,’’ Electronic Journal of Statistics 7, 499–532 (2013).
J. E. Chacón, T. Duong, and M. Wand, ‘‘Asymptotics for general multivariate kernel density derivative estimators,’’ Statistica Sinica, 807–840 (2011).
Y. Cheng, ‘‘Mean shift, mode seeking, and clustering,’’ IEEE transactions on pattern analysis and machine intelligence 17 (8), 790–799 (1995).
F. Comte and V. Genon-Catalot, ‘‘Laguerre and Hermite bases for inverse problems,’’ J. Korean Statist. Soc. 47 (3), 273–296 (2018).
F. Comte and N. Marie, ‘‘Bandwidth selection for the Wolverton–Wagner estimator,’’ J. Statist. Plann. Inference 207, 198–214 (2020).
S. Efromovich, ‘‘Simultaneous sharp estimation of functions and their derivatives,’’ Ann. Statist. 26 (1), 273–278 (1998).
S. Efromovich, ‘‘Nonparametric curve estimation: methods, theory, and applications,’’ Springer Series in Statistics (1999).
C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman, ‘‘Non-parametric inference for density modes,’’ J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 (1), 99–126 (2016).
E. Giné and R. Nickl, Mathematical Foundations of Infinite-Dimensional Statistical Models, Vol. 40 (Cambridge University Press. 2016).
W. Härdle, J. Hart, J. S. Marron, and A. B. Tsybakov, ‘‘Bandwidth choice for average derivative estimation,’’ Journal of the American Statistical Association 87 (417), 218–226 (1992).
W. Härdle, W. Hildenbrand, and M. Jerison, Empirical evidence on the law of demand (Econometrica: Journal of the Econometric Society, 1991), p. 1525–1549.
W. Härdle and T. M. Stoker, ‘‘Investigating smooth multiple regression by the method of average derivatives,’’ Journal of the American statistical Association 84 (408), 986–995 (1989).
J. Indritz, ‘‘An inequality for Hermite polynomials,’’ Proc. Amer. Math. Soc. 12, 981–983 (1961).
T. Klein and E. Rio, ‘‘Concentration around the mean for maxima of empirical processes,’’ Ann. Probab. 33 (3), 1060–1077 (2005).
R. Koekoek, ‘‘Generalizations of laguerre polynomials,’’ Journal of Mathematical Analysis and Applications 153 (2), 576–590 (1990).
C. Lacour, P. Massart, and V. Rivoirard, ‘‘Estimator selection: A new method with applications to kernel density estimation,’’ Sankhya A 79 (2), 298–335 (2017).
M. Ledoux, ‘‘On Talagrand’s deviation inequalities for product measures,’’ ESAIM Probab. Statist. 1, 63–87 (1995/1997).
O. V. Lepski, ‘‘A new approach to estimator selection,’’ Bernoulli 24 (4A), 2776–2810 (2018).
L. Markovich, ‘‘Gamma kernel estimation of the density derivative on the positive semi-axis by dependent data,’’ REVSTAT–Statistical Journal 14 (3), 327–348 (2016).
P. Massart, Concentration Inequalities and Model Selection, Vol. 1896 of Lecture Notes in Mathematics, Springer, Berlin, Lectures from the 33rd Summer School on Probability Theory Held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard (2007).
C. Park and K.-H. Kang, ‘‘Sizer analysis for the comparison of regression curves,’’ Computational Statistics and Data Analysis 52 (8), 3954–3970 (2008).
S. Plancade, ‘‘Estimation of the density of regression errors by pointwise model selection,’’ Math. Methods Statist. 18 (4), 341–374 (2009).
B. L. S. P. Rao, ‘‘Nonparametric estimation of the derivatives of a density by the method of wavelets,’’ Bull. Inform. Cybernet. 28 (1), 91–100 (1996).
H. Sasaki, Y.-K. Noh, G. Niu, and M. Sugiyama, ‘‘Direct density derivative estimation,’’ Neural Comput. 28 (6), 1101–1140 (2016).
E. Schmisser, ‘‘Nonparametric estimation of the derivatives of the stationary density for stationary processes,’’ ESAIM Probab. Stat. 17, 33–69 (2013).
E. F. Schuster, ‘‘Estimation of a probability density function and its derivatives,’’ The Annals of Mathematical Statistics 40 (4), 1187–1195 (1969).
W. Shen and S. Ghosal, ‘‘Posterior contraction rates of density derivative estimation,’’ Sankhya A 79 (2), 336–354 (2017).
B. W. Silverman, ‘‘Weak and strong uniform consistency of the kernel estimate of a density and its derivatives,’’ The Annals of Statistics, 177–184 (1978).
R. Singh, ‘‘Mean squared errors of estimates of a density and its derivatives,’’ Biometrika 66 (1), 177–180 (1979).
R. S. Singh, ‘‘Applications of estimators of a density and its derivatives to certain statistical problems,’’ J. Roy. Statist. Soc. Ser. B 39 (3), 357–363 (1977).
G. Szegö, Orthogonal polynomials. American Mathematical Society Colloquium Publications, Vol. 23 (Revised ed. American Mathematical Society, Providence, R.I., 1959).
M. Talagrand, ‘‘New concentration inequalities in product spaces,’’ Invent. Math. 126 (3), 505–563 (1996).
A. B. Tsybakov, Introduction to Nonparametric Estimation. Springer Series in Statistics (Springer, New York. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats, 2009).
Author information
Authors and Affiliations
Corresponding author
APPENDIX A
PROOFS OF AUXILIARY RESULTS
A.1. Proof of Lemma 2.1
In the Hermite case \(\varphi_{j}=h_{j}\) and \(f:\mathbb{R}\mapsto[0,\infty),\) allowing \(d\) successive integration by parts, it holds that
By definition for all \(j\geqslant 0\), \(h_{j}(x)=c_{j}H_{j}(x)e^{-\frac{x^{2}}{2}}\) where \(H_{j}\) is a polynomial. Then, its \(k\)th derivative, \(0\leqslant k\leqslant d-1\), is a polynomial multiplied by \(e^{-{x^{2}}/{2}}\) and \(\lim_{|x|\to+\infty}h_{j}^{(k)}(x)=0\). This together with (A2), gives that the bracket in (A.1) is null and the result follows.
Similarly in the Laguerre case, (A.1) holds integrating on \([0,\infty)\) instead of \(\mathbb{R}\) and replacing \(h_{j}\) by \(\ell_{j}\). The term in the bracket is null at 0 from (A3). It is also null at infinity using (A2) together with the fact that \(\ell_{j}\) are polynomials multiplied by \(e^{-x}\) leading similarly to \(\lim_{x\to\infty}f^{(d-1-k)}(x)\ell^{(k)}_{j}(x)=0\), \(0\leqslant k\leqslant d-1\), \(j\geqslant 0\). The result follows.
A.2. Proof of Lemma 2.2
We control the quantity
The first term is a constant which depending on \(d\). For the second term using Lemma 5.2, we obtain
Inserting this in (59), we obtain the announced result.
A.3. Proof of Lemma 2.3
We establish the result for \(d=1\), the general case is an immediate consequence. It follows from the definition of \(\widetilde{W}_{L}^{s}(D)\) that \((\theta^{\prime})^{(j)}\), \(0\leqslant j\leqslant s-1\) are in \(C([0,\infty))\). Moreover, it holds that \(x\mapsto x^{k/2}(\theta^{\prime})^{(j)}(x)\in\mathbb{L}^{2}(\mathbb{R}^{+})\) for all \(0\leqslant j<k\leqslant s-1\). The case \(k=j\) is obtained using that \(\theta^{(j)}\) is continuous on \(C([0,\infty))\) and that \(x\mapsto x^{(j+1)/2}(\theta^{\prime})^{(j)}(x)\in\mathbb{L}^{2}(\mathbb{R}^{+}).\) It follows that
where \(C\) depends on \(D\). Finally, using the equivalence of the norms \(|.|_{s}\) and \(|||.|||_{s}\), the value of \(D^{\prime}\) follows from the latter inequality.
A.4. Proof of Lemma 5.1
Consider the decomposition
where for \(\nu=4j-2k+2\), \(j\geqslant k\), we used the decomposition \((0,\infty)=(0,\frac{1}{\nu}]\cup(\frac{1}{\nu},\frac{\nu}{2}]\cup(\frac{\nu}{2},\nu-\nu^{1/3}]\cup(\nu-\nu^{1/3},\nu+\nu^{/13}]\cup(\nu+\nu^{1/3},3\nu/2]\cup(3\nu/2,\infty).\) Using [2] (see Appendix B.1) and straightforward inequalities give
Gathering these inequalities give the announced result.
A.5. Proof of Lemma 5.2
The result is obtained by induction on \(d\). If \(d=1\), \(h_{j}^{\prime}\) is given by (5), with \(b^{(1)}_{-1,j-1}=j^{1/2}/\sqrt{2}\), \(b_{0,j}=0\) and \(b^{(1)}_{1,j}=(j+1)^{1/2}/\sqrt{2}\), \(\forall j\geqslant 1\). Thus, it holds \(b_{k,j}^{(1)}=\mathcal{O}(j^{1/2})\) and (37) is satisfied for \(d=1\). Let \(\text{P}(d)\) the proposition given by Eq. (37) and assume \(\text{P}(d)\) holds and we establish \(\text{P}(d+1)\). It holds using successively \(\text{P}(d)\) and (5) that
where \(b^{(d)}_{k,j}=\mathcal{O}(j^{d/2})\), \(\forall j\geqslant d\geqslant|k|\) and \(b_{k,j}^{(d+1)}=b_{k+1,j}^{(d)}\frac{\sqrt{j+k+1}}{\sqrt{2}}\mathbf{1}_{|k|\leqslant d-1}-b_{k-1,j}^{(d)}\frac{\sqrt{j+k}}{\sqrt{2}}\mathbf{1}_{|k|\leqslant d+1}\). It follows that \(|b_{k,j}^{(d+1)}|\leqslant 2\sqrt{({j+d+1})/{2}}j^{\frac{d}{2}}\leqslant C_{d}j^{\frac{d+1}{2}},\) \(|k|\leqslant d\leqslant j\), which completes the proof.
Proof of Lemma 5.4
A.6.1. Proof of part (i). First, it holds that
which we bound applying a Talagrand Inequality (see Appendix B.2). Following notations of Appendix B.2, we have three terms \(H^{2},v\), and \(M_{1}\) to compute. Let us denote by \(m^{*}=m\vee m^{\prime}\), for \(t\in S_{m}+S_{{m^{\prime}}}\), \(||t||=1\), it holds
Computing \(\boldsymbol{H}^{\mathbf{2}}\). By the linearity of \(\nu_{n,d}\) and the Cauchy–Schwarz inequality, we have
One can check that the latter is an equality for \(a_{j}=\nu_{n,d}(\varphi_{j}).\) Therefore, taking expectation, it follows
Computing \(\boldsymbol{v}\). It holds for \(t\in S_{m}+S_{{m^{\prime}}}\), \(||t||=1\),
The first term of the previous inequality is a constant depending only on \(d\). For the second term, we consider separately the Laguerre and Hermite cases.
The Laguerre case (\(\varphi_{j}=\ell_{j}\)). Using (36) and the Cauchy–Schwarz inequality, it holds that
where we used the orthonormality of \((\ell_{j,(k)})_{j\geqslant 0}\) and where \(C(d)\) is a constant depending only on \(d\) and \(\sup_{x\in\mathbb{R}^{+}}\frac{f(x)}{x^{k}}\).
The Hermite case (\(\varphi_{j}=h_{j}\)). Similarly, using Lemma 5.2 and the orthonormality of \(h_{j}\), it follows
Plugging (A.5) or (A.6) in (A.4), we set in the two cases \(v:=c_{1}(m^{*})^{d}\) where \(c_{1}\) depends on \(d\) and either on \(\sup_{x\in\mathbb{R}^{+}}\frac{f(x)}{x^{k}}\) (Laguerre case) or \(||f||_{\infty}\) (Hermite case).
Computing \(\boldsymbol{M}_{\mathbf{1}}\). The Cauchy Schwarz Inequality and \(||t||=1\) give
The Laguerre case. We use the following Lemma whose proof is a consequence of (2) and an induction on \(d\).
Lemma A.1. For \(\ell_{j}\) given in (1), the \(d\)th derivative of \(\ell_{j}\) is such that \(||\ell_{j}^{(d)}||_{\infty}\leqslant C_{d}(j+1)^{d}\), \(\forall j\geqslant 0\) and where \(C_{d}\) is a positive constant depending on \(d\) .
Using Lemma A.1, we obtain
The Hermite case. The \(d\) first terms in the sum in (A.7) can be bounded by a constant depending only on \(d\). For the remaining terms, Lemma 5.2 and \(||h_{j}||_{\infty}\leqslant\phi_{0}\) (see (4)) give
where \(C\) is a positive constant depending on \(d\) and \(\phi_{0}\).
Injecting either (A.8) or (A.9) in (A.7), we set \(M_{1}=\mathcal{O}(m^{d+\frac{1}{2}})\) in the Laguerre case or \(M_{1}=\mathcal{O}(m^{\frac{d}{2}+\frac{1}{2}})\) in the Hermite case.
Now, we apply the Talagrand inequality see Appendix B.2 with \(\varepsilon=1/2\), it follows
The Laguerre case. We have
From (41) and the value of \(m_{n}(d)\), we obtain
Using the value \(m_{n}(d)\), it holds \((m^{*})^{d+1/2}\leqslant{n}/{\log^{3}(n)}\), which implies (recall \(m^{*}=m\vee m^{\prime}\))
where \(\Sigma_{d,2}\) is a constant depending only on \(d\). Next, it follows
The function \(m\mapsto m^{d+1}\exp(-C_{2}^{\prime}m^{\frac{1}{2}})\) is bounded and the sum is finite on \(m^{\prime}\), it holds
The Hermite case. Only the second term \(V_{d}(m^{*})\) changes. Here, it is given by
where we used (46) and the value of \(m_{n}(d)\). We derive that \(\sum_{m^{\prime}\in\mathcal{M}_{n,d}}V_{d}(m^{*})\leqslant\Sigma_{d,2}\).
Gathering all terms, it follows
Plugging this in (A.3) gives the announced result.
A.6.2. Proof of part (ii). We use the Bernstein Inequality (see Appendix B.3) to prove the result. Define
We select \(s^{2}\) and \(b\) such that \(\textrm{Var}(Z_{i}^{(m)})\leqslant s^{2}\) and \(|Z_{i}^{(m)}|\leqslant b\). By the computation of \(M_{1}\) (see proof of part (i)), we set \(b:=C^{*}m^{\alpha}\), with \(\alpha=2d+1\) (Laguerre case) or \(\alpha=d+1\) (Hermite case), where \(C^{*}\) depends on \(d\). For \(s^{2}\), using that \(\textrm{Var}(Z_{i}^{(m)})\leqslant\mathbb{E}[(Z_{i}^{(m)})^{2}]\leqslant b\sum_{j=0}^{m-1}\mathbb{E}\left[(\varphi_{j}^{(d)}(X_{i}))^{2}\right]=C^{*}m^{\alpha}V_{m,d}=:s^{2}\). Applying the Bernstein inequality, we have for \(S_{n}=n(\widehat{V}_{m,d}-V_{m,d})\)
Choose \(x=2\log(n)\) and define the set
Consider the decomposition,
Using \(2xy\leqslant x^{2}+y^{2}\), we have on \(\Omega\)
The constraint on \(m_{n}\) gives \(\widehat{m}^{d+1/2}\leqslant C{n}/{(\log(n))^{2}}\) together with (41) giving \(V_{\widehat{m},d}\geqslant c^{*}\widehat{m}^{d+1/2}\) give for \(\alpha=2d+1\) (Laguerre case) that \(\frac{8C^{*}}{3}\frac{\widehat{m}^{\alpha}\log(n)}{n}\leqslant\frac{8CC^{*}}{3c^{*}}\frac{V_{\widehat{m},d}}{\log(n)}\leqslant\frac{V_{\widehat{m},d}}{4},\) for \(n\) large enough and
In the Hermite case (\(\alpha=d+1\)) computations are similar as \(\widehat{m}^{d+1}\leqslant\widehat{m}^{2d+1}\). For the control on \(\Omega^{c}\), we write, using (A.10),
Gathering (A.11) and (A.12), we get the desired result.
Appendices
APPENDIX B
SOME INEQUALITIES
2.1 B.2. Asymptotic Askey and Wainger Formula
From [2], we have for \(\nu=4k+2\delta+2\), and \(k\) large enough
where \(\gamma_{1}\) and \(\gamma_{2}\) are positive and fixed constants.
2.2 B.2. A Talagrand Inequality
The Talagrand inequalities have been proven in [41] and reworked by [26]. This version is given in [23]. Let \((X_{i})_{1\leqslant i\leqslant n}\) be independent real random variables and
for \(t\) in \(\mathcal{F}\) a class of measurable functions. If there exist \(M_{1}\), \(H\), and \(v\) such that:
then, for \(\varepsilon>0\),
where \(C(\varepsilon)=(\sqrt{1+\varepsilon}-1)\wedge 1\), \(K_{1}=1/6\) and \(K_{1}^{\prime}\) a universal constant.
2.3 B.3. Bernstein Inequality ([29])
Let \(X_{1},\dots X_{n}\), \(n\) independent real random variables. Assume there exist two constants \(s^{2}\) and \(b\), such that \(\textrm{Var}(X_{i})\leqslant s^{2}\) and \(|X_{i}|\leqslant b\). Then, for all \(x\) positive, we have
About this article
Cite this article
Comte, F., Duval, C. & Sacko, O. Optimal Adaptive Estimation on \({\mathbb{R}}\) or \({\mathbb{R}}^{{+}}\)of the Derivatives of a Density. Math. Meth. Stat. 29, 1–31 (2020). https://doi.org/10.3103/S1066530720010020
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S1066530720010020