A full proof of Berry-Esseen inequality in the Central Limit Theorem

April 21, 2018  1707 words 9 mins read  Join the Discussion

This proof is based on the book by W. Feller, An Introduction to Probability and its application, volume II and is only for identically independent distributed summands. Thus, I won’t prove the non-identical case because this post will become long. Please find its proof in Feller’s book.

The central limit theorem is concerned with the situation that the limit distribution of the normalized sum is normal as the sample size goes to infinity. But the question you may raise is, “What is the rate of convergence of normalized sum distribution to the standard normal distribution?”. Let’s answer this question by considering the case where the samples are identical. To be more precise, let’s state like this:

Let xk x_{k} be independent variable with identical(or common) distribution FF such that, E(xk)=0,E(xk2)=σ2>0,E(x_{k})=0, E(x_{k}^{2})=\sigma^{2}>0, E(xk3)=ρ<E(|x_{k}|^{3})=\rho<\infty and, let FnF_{n} stands for the distribution of the normalized sum x1+x2++xnσn\frac{x_{1}+x_{2}+\ldots+x_{n}}{\sigma\sqrt{n}}. Then for all xx and nn, the supremum of convergence between Fn(x)F_{n}(x) and ϕ(x)\phi(x) i.e. standard normal distribution is Fn(x)ϕ(x)3ρσ3n|F_{n}(x)-\phi(x)|\leq \frac{3\rho}{\sigma^{3}\sqrt{n}} .

Looks very boring! Right? Okay, let’s start with the history of the Central limit theorem(CLT).

The first proof of CLT was given by French mathematician Pierre-Simon Laplace in 1810. Fourteen years later, French mathematician Siméon-Denis Poisson improved it and provided us with a more general form of proof. Laplace and his contemporaries were very interested in this theorem because they see the importance of it in repeated measurements of the same quantity. And thus they realized the individual measurements could be viewed as approximately independent and identically distributed, then their mean could be approximated by a normal distribution. Because this statistical plus probability theorem states that for a given sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population regardless of the shape of the proposed distribution.

Then, the first convergence rate for CLT was estimated by Russian mathematician Aleksandr M. Lyapunov. But, the more refined version of the proof is independently discovered by two mathematicians Andrew C. Berry (in 1941) and Carl-Gustav Esseen (in 1942), who then, continuously refined the convergence theorem of CLT and hence, given this theorem which is named as “Berry-Esseen theorem”. The best thing about this theorem is that it only considered the first three moments.

Now, I think you are very eager to know about the proof of this theorem. Right? Let’s get started without any further delay!

From Feller’s book (Lemma 1 equation 3.13 on page 538), the upper bound between Fn(x)F_{n}(x) and ϕ(x)\phi(x) is

Fn(x)ϕ(x)1πTTψ(ξ)γ(ξ)ξdξ+24mπT|F_{n}(x)-\phi(x)| \leq \frac{1}{\pi}\int_{-T}^{T}|\frac{\psi(\xi) - \gamma(\xi)}{\xi}| d\xi + \frac{24m}{\pi T}  (1) \rule{2cm}{0.4pt}  (1)

where,

ψ(ξ)\psi(\xi) = characteristics function for Fn(x)F_{n}(x) which equals to ψn(ξσn)\psi^{n}(\frac{\xi}{\sigma\sqrt{n}}),

γ(ξ)\gamma(\xi) = characteristics function for ϕ(x)\phi(x) which equals to eξ22e^{\frac{-\xi^{2}}{2}},

mm = maximum growth rate for ϕ\phi such that ϕ(x)m<|\phi^{’}(x)| \leq m < \infty.

The above expression can be found by starting with Fourier’s methods.  And our proposed proof is based on smoothing inequality (refer Feller’s paper in section 3.5) such that,

T=43σ3nρ4n3T = \frac{4}{3}\frac{\sigma^{3} \sqrt{n}}{\rho} \leq \frac{4 \sqrt{n}}{3}.

The last inequality is the result of moment inequality. And, the normal density ϕ\phi has maximum m<25m < \frac{2}{5}  (I don’t know why Feller chose this bound. I would be very happy if you guys could help me in this quest!).

So equation (1)(1) becomes,

πFn(x)ϕ(x)TTψn(ξσn)eξ22dξξ+24×2T×5\pi |F_{n}(x) - \phi(x)| \leq \int_{-T}^{T}|\psi^{n}(\frac{\xi}{\sigma\sqrt{n}}) - e^{\frac{-\xi^{2}}{2}}|\frac{d\xi}{|\xi|} + \frac{24\times 2}{T\times 5}

=TTψn(ξσn)eξ22dξξ+9.6T=\int_{-T}^{T}|\psi^{n}(\frac{\xi}{\sigma\sqrt{n}}) - e^{\frac{-\xi^{2}}{2}}|\frac{d\xi}{|\xi|} + \frac{9.6}{T}  (2) \rule{2cm}{0.4pt}  (2)

Now, Let’s find ψn(ξσn)eξ22=|\psi^{n}(\frac{\xi}{\sigma\sqrt{n}}) - e^{\frac{-\xi^{2}}{2}}| = ?

Isn’t it look like the reverse triangle inequality with exponent “nn”? I mean this

αnβnnαβΓn1|\alpha^{n} - \beta^{n}| \leq n|\alpha - \beta|\Gamma^{n-1} if αΓ,βΓ|\alpha| \leq \Gamma, |\beta| \leq \Gamma.

Thus, we can say

ψn(ξσn)(eξ22n)nnψ(ξσn)eξ22nΓn1|\psi^{n}(\frac{\xi}{\sigma\sqrt{n}}) - (e^{\frac{-\xi^{2}}{2n}})^{n}| \leq n |\psi(\frac{\xi}{\sigma\sqrt{n}}) - e^{\frac{-\xi^{2}}{2n}}|\Gamma^{n-1}  (3) \rule{2cm}{0.4pt}  (3)

, if  ψ(ξσn)Γ,eξ22nΓ|\psi(\frac{\xi}{\sigma\sqrt{n}})| \leq \Gamma, |e^{\frac{-\xi^{2}}{2n}}| \leq \Gamma .

Again, let’s make our problem much simpler by proposing ψ(ξσn)eξ22n=|\psi(\frac{\xi}{\sigma\sqrt{n}}) - e^{\frac{-\xi^{2}}{2n}}| = ?

First of all, let’s suppose t=ξσnt = \frac{\xi}{\sigma\sqrt{n}} so that ψ(ξσn)=ψ(t)\psi(\frac{\xi}{\sigma\sqrt{n}}) = \psi(t). Thus, ξ=tσn\xi = t\sigma\sqrt{n} so that eξ22n=et2σ2n2n=et2σ22e^{\frac{-\xi^{2}}{2n}} = e^{\frac{-t^{2}\sigma^{2}n}{2n}} = e^{\frac{-t^{2}\sigma^{2}}{2}}.

Look! How beautiful this looks like:

ψ(ξσn)eξ22n=ψ(t)et2σ22|\psi(\frac{\xi}{\sigma\sqrt{n}}) - e^{\frac{-\xi^{2}}{2n}}| = |\psi(t) - e^{\frac{-t^{2}\sigma^{2}}{2}}|

=ψ(t)(1t2σ22+)= |\psi(t) - (1 - \frac{t^{2}\sigma^{2}}{2} + \ldots)| , putting the series of et2σ22e^{\frac{-t^{2}\sigma^{2}}{2}}

=ψ(t)1+t2σ22= |\psi(t) - 1 + \frac{t^{2}\sigma^{2}}{2}|  (4) \rule{2cm}{0.4pt}  (4)

,  neglecting the higher order terms because for large nn then, t0t \to 0.

The characteristics function for ψ(t)\psi(t) is

ψ(t)=eitxFn(x)dx\psi(t) = \int_{-\infty}^{\infty} e^{i t x} F_{n}(x) dx.

From the very first, I said as the sample size goes on increasing the shape of the curve of the proposed distribution tends to match up with the normal curve. I mean this

Isn’t the smoothing concept look like Taylor’s theorem? Exactly! As Taylor’s theorem said we can approximate any curve to a well-defined curve by a series expression. Likewise, we can estimate our proposed distribution with standard normal distribution by taking higher-order terms. So, we will need to go like this

eitx=1+itxt2x22!+d=3tdxdd!e^{i t x} = 1 + i t x - \frac{t^{2} x^{2}}{2!} + \sum_{d = 3}^{\infty} \frac{t^{d} x^{d}}{d!}

or, d=3tdxdd!=eitx1itx + t2x22!\sum_{d = 3}^{\infty} \frac{t^{d} x^{d}}{d!} =e^{i t x} - 1 - i t x + \frac{t^{2} x^{2}}{2!} .

Now, multiply by Fn(x)F_{n}(x) and do integration both sides with the limit -\infty to \infty i.e.

(d=3tdxdd!)Fn(x)dx= (eitx1itx + t2x22!)Fn(x)dx\int_{-\infty}^{\infty} (\sum_{d = 3}^{\infty} \frac{t^{d} x^{d}}{d!})F_{n}(x) dx =\int_{-\infty}^{\infty}  (e^{i t x} - 1 - i t x + \frac{t^{2} x^{2}}{2!}) F_{n}(x) dx  (5) \rule{2cm}{0.4pt}  (5).

From the characteristics property, the subtraction of two characteristics function gives another characteristics function and also we suppose, the result can be approximated by taking the higher order series. This is our trick:

ψ(t)1+t2σ22d=3tdxdd!Fn(x)dx|\psi(t) - 1 + \frac{t^{2}\sigma^{2}}{2}| \approx |\int_{-\infty}^{\infty} \sum_{d = 3}^{\infty} \frac{t^{d}x^{d}}{d!} F_{n}(x) dx|

=(eitx1itx+t2x22)Fn(x)dx= |\int_{-\infty}^{\infty} (e^{i t x} - 1 - i t x + \frac{t^{2}x^{2}}{2})F_{n}(x) dx|  (6) \rule{2cm}{0.4pt}  (6)

, from equation (5)(5).

Also, another inequality we can suppose is this:

(eitx1itx+t2x22!+(itx)n1(n1)!)(xt)nn!(e^{i t x} - 1 - i tx + \frac{t^{2}x^{2}}{2!} + \ldots - \frac{(i t x)^{n-1}}{(n-1)!}) \leq \frac{(x t)^{n}}{n!}.

For n = 3,

(eitx1itx+t2x22!)(xt)33!(e^{i t x} - 1 - i tx + \frac{t^{2}x^{2}}{2!}) \leq \frac{(x t)^{3}}{3!}.

So, the equation (6)(6) becomes

ψ(t)1+t2σ22(xt)36Fn(x)dx|\psi(t) - 1 + \frac{t^{2}\sigma^{2}}{2}| \leq |\int_{-\infty}^{\infty} \frac{(xt)^{3}}{6} F_{n}(x) dx|.

In the left part of this inequality, we’re going to apply the Cauchy-Schwarz inequality as

ψ(t)1+t2σ22ψ(t)+t2σ22|\psi(t) - 1 + \frac{t^{2}\sigma^{2}}{2}| \leq |\psi(t)| + |\frac{t^{2}\sigma^{2}}{2}|.

Then, this will turn into

ψ(t)1+t2σ22ψ(t)+t2σ22(xt)36Fn(x)dx|\psi(t) - 1 + \frac{t^{2}\sigma^{2}}{2}| \leq|\psi(t)| + |\frac{t^{2}\sigma^{2}}{2}| \leq|\int_{-\infty}^{\infty} \frac{(xt)^{3}}{6} F_{n}(x) dx| where the third part of the inequality has a higher value than others.

For our needs, we will use

ψ(t)+t2σ22(xt)36Fn(x)dx|\psi(t)| + |\frac{t^{2}\sigma^{2}}{2}| \leq|\int_{-\infty}^{\infty} \frac{(xt)^{3}}{6} F_{n}(x) dx|

\Rightarrow ψ(t)+(t2σ22)t36x3Fn(x)dx|\psi(t)| + (\frac{t^{2}\sigma^{2}}{2}) \leq \frac{|t|^{3}}{6} |\int_{-\infty}^{\infty} x^{3} F_{n}(x) dx|, if σ>0\sigma > 0, and second part is from Cauchy-Schwarz inequality

\Rightarrow ψ(t)+(t2σ22) t36x3Fn(x)dx|\psi(t)| + (\frac{t^{2}\sigma^{2}}{2}) \leq \frac{|t|^{3}}{6} \int_{-\infty}^{\infty} |x^{3} F_{n}(x)| dx, applying the properties of Riemann integral in second part

\Rightarrow ψ(t)+(t2σ22)t36x3Fn(x)dx|\psi(t)| + (\frac{t^{2}\sigma^{2}}{2}) \leq \frac{|t|^{3}}{6} \int_{-\infty}^{\infty} |x^{3}| |F_{n}(x)| dx, applying Cauchy-Schwarz inequality.

= t36×E(xk3)= \frac{|t|^{3}}{6} \times E(|x_{k}|^{3})

= t36×ρ)= \frac{|t|^{3}}{6} \times \rho) such that ρ<\rho < \infty

\therefore ψ(t)1t2σ22+16ρt3|\psi(t)| \leq 1 - \frac{t^{2}\sigma^{2}}{2} + \frac{1}{6} \rho |t|^{3}.

Returning back the value of t=ξσn t = \frac{\xi}{\sigma\sqrt{n}}. we get,

ψ(ξσn)1ξ22n+ρ6×ξ3σ3n3/2|\psi(\frac{\xi}{\sigma\sqrt{n}})| \leq 1 - \frac{\xi^{2}}{2n} + \frac{\rho}{6} \times \frac{|\xi|^{3}}{\sigma^{3} n^{3/2}}  (7) \rule{2cm}{0.4pt}  (7).

Now, we conclude ξT|\xi| \leq T to smooth our proposed PDF. So that we can use ξ=T=43σ3nρ|\xi| = T = \frac{4}{3}\frac{\sigma^{3}\sqrt{n}}{\rho}. So, the equation (7)(7) becomes

ψ(ξσn)1ξ22n+ρ6σ3n3/2ξ2ξ|\psi(\frac{\xi}{\sigma\sqrt{n}})| \leq 1 - \frac{\xi^{2}}{2n} + \frac{\rho}{6\sigma^{3}n^{3/2}}|\xi|^{2} |\xi|

=1ξ22n+ρ6σ3n3/2ξ2×(4σ3n3ρ)= 1 - \frac{\xi^{2}}{2n} +\frac{\rho}{6\sigma^{3}n^{3/2}}|\xi|^{2} \times (\frac{4\sigma^{3}\sqrt{n}}{3\rho})

=1ξ22n+4ξ218n= 1- \frac{\xi^{2}}{2n} + \frac{4\xi^{2}}{18n}

=15ξ218n= 1 - \frac{5\xi^{2}}{18n}

ψ(ξσn)e5ξ218n\therefore |\psi(\frac{\xi}{\sigma\sqrt{n}})| \leq e^{\frac{-5\xi^2}{18n}}  (8)\rule{2cm}{0.4pt}  (8)

, converting into exponential form with nn \to \infty.

We know, σ3<ρ\sigma^{3} < \rho the assertion of the theorem is trivially true for n3\sqrt{n} \leq 3 and hence we may assume n10n\geq 10.

We take exponent n1n-1 on both sides in equation (8)(8). We can get,

ψ(ξσn)n1e5ξ218n×(n1)|\psi(\frac{\xi}{\sigma\sqrt{n}})|^{n-1} \leq e^{\frac{-5\xi^2}{18n}\times (n-1)}

Thus, for n = 10,

ψ(ξσn)n1e5ξ218×10×(101)=eξ24|\psi(\frac{\xi}{\sigma\sqrt{n}})|^{n-1} \leq e^{\frac{-5\xi^2}{18\times 10}\times (10-1)} = e^{\frac{-\xi^{2}}{4}}  (9)\rule{2cm}{0.4pt}  (9).

Let me remind you equation (3)(3) with maximum equality i.e.

ψn(ξσn)(eξ22n)n=nψ(ξσn)(eξ22n)Γn1|\psi^{n}(\frac{\xi}{\sigma\sqrt{n}}) - (e^{\frac{-\xi^{2}}{2n}})^{n}| = n |\psi(\frac{\xi}{\sigma\sqrt{n}}) - (e^{\frac{-\xi^{2}}{2n}})| \Gamma^{n-1} if ψ(ξσn)=Γ|\psi(\frac{\xi}{\sigma\sqrt{n}})| = \Gamma

So, the right part of equation (9)(9) may serve for the bound Γn1\Gamma^{n-1} i.e.

ψ(ξσn)=Γ|\psi(\frac{\xi}{\sigma\sqrt{n}})| = \Gamma

or, ψ(ξσn)n1=Γn1|\psi(\frac{\xi}{\sigma\sqrt{n}})|^{n-1} = \Gamma^{n-1}

or, eξ24=Γn1e^{\frac{-\xi^{2}}{4}} = \Gamma^{n-1}

Thus, we can have

ψn(ξσn)(eξ22n)n=nψ(ξσn)(eξ22n)eξ24|\psi^{n}(\frac{\xi}{\sigma\sqrt{n}}) - (e^{\frac{-\xi^{2}}{2n}})^{n}| = n |\psi(\frac{\xi}{\sigma\sqrt{n}}) - (e^{\frac{-\xi^{2}}{2n}})|e^{\frac{-\xi^{2}}{4}}   (10)\rule{2cm}{0.4pt}  (10).

Also, we need to formulate one more inequality. Let’s start with this:

ex1x+x22e^{-x} \leq 1 - x + \frac{x^{2}}{2} for x>0x > 0

\Rightarrow ex1+xx22e^{-x} - 1 + x \leq \frac{x^{2}}{2}  (11)\rule{2cm}{0.4pt}  (11).

Oh! I almost forgot. We need to construct something very useful. i.e.

nψ(ξσn)eξ22nnψ(ξσn)1+ξ22n+n1ξ22neξ22nn |\psi(\frac{\xi}{\sigma\sqrt{n}}) - e^{\frac{-\xi^{2}}{2n}}| \leq n|\psi(\frac{\xi}{\sigma\sqrt{n}}) - 1 + \frac{\xi^{2}}{2n}| + n|1 - \frac{\xi^{2}}{2n} - e^{\frac{-\xi^{2}}{2n}}|, I have added two terms and applied triangle inequality.

== First term ++ Second term  (12)\rule{2cm}{0.4pt}  (12)

which means,

First term = nψ(ξσn)1+ξ22nρξ36σ3n3/2×n=ρξ36σ3n1/2n|\psi(\frac{\xi}{\sigma\sqrt{n}}) - 1 + \frac{\xi^{2}}{2n}| \leq \frac{\rho |\xi|^{3}}{6 \sigma^{3}n^{3/2}}\times n = \frac{\rho |\xi|^{3}}{6 \sigma^{3}n^{1/2}}, from equation (7)(7).

and,

Second term = n1ξ22neξ22nn×12×ξ4(2n)2=18nξ4n|1 - \frac{\xi^{2}}{2n} - e^{\frac{-\xi^{2}}{2n}}| \leq n \times \frac{1}{2} \times \frac{\xi^{4}}{(2n)^{2}} = \frac{1}{8n}\xi^{4}, from equation (11)(11).

Returning the above results in equation (12)(12). we get,

nψ(ξσn)eξ22nρξ36σ3n1/2+18nξ4n |\psi(\frac{\xi}{\sigma\sqrt{n}}) - e^{\frac{-\xi^{2}}{2n}}| \leq\frac{\rho |\xi|^{3}}{6 \sigma^{3}n^{1/2}} +\frac{1}{8n}\xi^{4}.

Since n>3\sqrt{n} > 3, the above inequality should follow the integrand (2)(2) which means

πFn(x)ϕ(x)TTψn(ξσn)(eξ22n)ndξξ+9.6T\pi |F_{n}(x) - \phi(x)| \leq \int_{-T}^{T} |\psi^{n}(\frac{\xi}{\sigma\sqrt{n}}) - (e^{\frac{-\xi^{2}}{2n}})^{n}| \frac{d\xi}{|\xi|} + \frac{9.6}{T}

TTnψ(ξσn)(eξ22n)eξ24dξξ+9.6T\leq \int_{-T}^{T}n |\psi(\frac{\xi}{\sigma\sqrt{n}}) - (e^{\frac{-\xi^{2}}{2n}})| e^{\frac{-\xi^{2}}{4}} \frac{d\xi}{|\xi|} + \frac{9.6}{T}, from equation (10)(10)

TT(ρξ36σ3n1/2+18nξ4)×1ξ× eξ24dξ+9.6T\leq \int_{-T}^{T} (\frac{\rho |\xi|^{3}}{6 \sigma^{3}n^{1/2}} +\frac{1}{8n}\xi^{4})\times \frac{1}{|\xi|} \times e^{\frac{-\xi^{2}}{4}} d\xi + \frac{9.6}{T}

=TTρ6σ3nξ2eξ24dξ+TTξ38neξ24dξ+9.6T= \int_{-T}^{T} \frac{\rho}{6\sigma^{3}\sqrt{n}} |\xi|^{2} e^{\frac{-\xi^{2}}{4}} d\xi + \int_{-T}^{T} \frac{\xi^{3}}{8n} e^{\frac{-\xi^{2}}{4}} d\xi + \frac{9.6}{T}  (13)\rule{2cm}{0.4pt}  (13).

Also, we know T=4σ3n3ρT = \frac{4\sigma^{3}\sqrt{n}}{3\rho}. So,

9.6T=9.6×3ρ4σ3n=36ρ5σ3n\frac{9.6}{T} =\frac{9.6\times 3\rho}{4\sigma^{3}\sqrt{n}} =\frac{36\rho}{5\sigma^{3}\sqrt{n}}.

From equation (13), let’s consider for nn \to \infty then, TT \to \infty thus,

I1=ξ2eξ24dξ=4πI_{1} = \int_{-\infty}^{\infty} |\xi|^{2} e^{\frac{-\xi^{2}}{4}} d\xi = 4\sqrt{\pi}

and,

I2=ξ3eξ24dξ=0I_{2} = \int_{-\infty}^{\infty} \xi^{3} e^{\frac{-\xi^{2}}{4}} d\xi = 0.

These above integrations can be done by using by-parts rule and also from Gamma function. But, I found a difficulty when solving on aaξ2eξ24dξ\int_{-a}^{a} |\xi|^{2} e^{\frac{-\xi^{2}}{4}} d\xi . If you guys solve it, please share your solution in the comment box. Thanks in advance!

So, equation (13)(13) becomes

πFn(x)ϕ(x)ρ6σ3n×4π+36ρ5σ3n=8.382×ρσ3n\pi |F_{n}(x) - \phi(x)| \leq \frac{\rho}{6 \sigma^{3} \sqrt{n}} \times 4\sqrt{\pi} + \frac{36\rho}{5\sigma^{3}\sqrt{n}} = 8.382\times\frac{\rho}{\sigma^{3} \sqrt{n}}.

Fn(x)ϕ(x)2.668×ρσ3n\therefore |F_{n}(x) - \phi(x)| \leq 2.668\times\frac{\rho}{\sigma^{3} \sqrt{n}}

For simplicity, we use the below as the final form:

Fn(x)ϕ(x)3ρσ3n|F_{n}(x) - \phi(x)| \leq \frac{3\rho}{\sigma^{3} \sqrt{n}}   Q.E.D.

Any feedback?

If you guys have some questions, comments, or suggestions then, please don't hesitate to shot me an email at to[At]rDamodar.com.np or comment below.

Liked this post?

If you find this post helpful and want to show your appreciation, I would be grateful for a donation to arXiv , which supports open science and benefits the global scientific community.

Want to share this post?

  • Damodar Rajbhandari
    Written by Damodar Rajbhandari, who is working on a PhD in the Mathematical Physics at the School of Mathematics & Statistics, University of Melbourne, Australia.
Related Posts
Wonder what's this about? See the author's webpage!