This proof is based on the book by W. Feller, An Introduction to Probability and its application, volume II and is only for identically independent distributed summands. Thus, I won’t prove the non-identical case because this post will become long. Please find its proof in Feller’s book.
The central limit theorem is concerned with the situation that the limit distribution of the normalized sum is normal as the sample size goes to infinity. But the question you may raise is, “What is the rate of convergence of normalized sum distribution to the standard normal distribution?”. Let’s answer this question by considering the case where the samples are identical. To be more precise, let’s state like this:
Let xk be independent variable with identical(or common) distribution F such that, E(xk)=0,E(xk2)=σ2>0,E(∣xk∣3)=ρ<∞ and, let Fn stands for the distribution of the normalized sum σnx1+x2+…+xn. Then for all x and n, the supremum of convergence between Fn(x) and ϕ(x) i.e. standard normal distribution is ∣Fn(x)−ϕ(x)∣≤σ3n3ρ .
Looks very boring! Right? Okay, let’s start with the history of the Central limit theorem(CLT).
The first proof of CLT was given by French mathematician Pierre-Simon Laplace in 1810. Fourteen years later, French mathematician Siméon-Denis Poisson improved it and provided us with a more general form of proof. Laplace and his contemporaries were very interested in this theorem because they see the importance of it in repeated measurements of the same quantity. And thus they realized the individual measurements could be viewed as approximately independent and identically distributed, then their mean could be approximated by a normal distribution. Because this statistical plus probability theorem states that for a given sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population regardless of the shape of the proposed distribution.
Then, the first convergence rate for CLT was estimated by Russian mathematician Aleksandr M. Lyapunov. But, the more refined version of the proof is independently discovered by two mathematicians Andrew C. Berry (in 1941) and Carl-Gustav Esseen (in 1942), who then, continuously refined the convergence theorem of CLT and hence, given this theorem which is named as “Berry-Esseen theorem”. The best thing about this theorem is that it only considered the first three moments.
Now, I think you are very eager to know about the proof of this theorem. Right? Let’s get started without any further delay!
From Feller’s book (Lemma 1 equation 3.13 on page 538), the upper bound between Fn(x) and ϕ(x) is
∣Fn(x)−ϕ(x)∣≤π1∫−TT∣ξψ(ξ)−γ(ξ)∣dξ+πT24m(1)
where,
ψ(ξ) = characteristics function for Fn(x) which equals to ψn(σnξ),
γ(ξ) = characteristics function for ϕ(x) which equals to e2−ξ2,
m = maximum growth rate for ϕ such that ∣ϕ’(x)∣≤m<∞.
The last inequality is the result of moment inequality. And, the normal density ϕ has maximum m<52 (I don’t know why Feller chose this bound. I would be very happy if you guys could help me in this quest!).
Isn’t it look like the reverse triangle inequality with exponent “n”? I mean this
∣αn−βn∣≤n∣α−β∣Γn−1 if ∣α∣≤Γ,∣β∣≤Γ.
Thus, we can say
∣ψn(σnξ)−(e2n−ξ2)n∣≤n∣ψ(σnξ)−e2n−ξ2∣Γn−1(3)
, if ∣ψ(σnξ)∣≤Γ,∣e2n−ξ2∣≤Γ.
Again, let’s make our problem much simpler by proposing ∣ψ(σnξ)−e2n−ξ2∣=?
First of all, let’s suppose t=σnξ so that ψ(σnξ)=ψ(t). Thus, ξ=tσn so that e2n−ξ2=e2n−t2σ2n=e2−t2σ2.
Look! How beautiful this looks like:
∣ψ(σnξ)−e2n−ξ2∣=∣ψ(t)−e2−t2σ2∣
=∣ψ(t)−(1−2t2σ2+…)∣ , putting the series of e2−t2σ2
=∣ψ(t)−1+2t2σ2∣(4)
, neglecting the higher order terms because for large n then, t→0.
The characteristics function for ψ(t) is
ψ(t)=∫−∞∞eitxFn(x)dx.
From the very first, I said as the sample size goes on increasing the shape of the curve of the proposed distribution tends to match up with the normal curve. I mean this
Isn’t the smoothing concept look like Taylor’s theorem? Exactly! As Taylor’s theorem said we can approximate any curve to a well-defined curve by a series expression. Likewise, we can estimate our proposed distribution with standard normal distribution by taking higher-order terms. So, we will need to go like this
eitx=1+itx−2!t2x2+∑d=3∞d!tdxd
or, ∑d=3∞d!tdxd=eitx−1−itx+2!t2x2.
Now, multiply by Fn(x) and do integration both sides with the limit −∞ to ∞ i.e.
From the characteristics property, the subtraction of two characteristics function gives another characteristics function and also we suppose, the result can be approximated by taking the higher order series. This is our trick:
From equation (13), let’s consider for n→∞ then, T→∞ thus,
I1=∫−∞∞∣ξ∣2e4−ξ2dξ=4π
and,
I2=∫−∞∞ξ3e4−ξ2dξ=0.
These above integrations can be done by using by-parts rule and also from Gamma function. But, I found a difficulty when solving on ∫−aa∣ξ∣2e4−ξ2dξ. If you guys solve it, please share your solution in the comment box. Thanks in advance!
For simplicity, we use the below as the final form:
∣Fn(x)−ϕ(x)∣≤σ3n3ρ Q.E.D.
Any feedback?
If you guys have some questions, comments, or suggestions then, please don't hesitate to shot me an email at to[At]rDamodar.com.np or comment below.
Liked this post?
If you find this post helpful and want to show your appreciation, I would be grateful for a donation to arXiv , which supports open science and benefits the global scientific community.
Want to share this post?
Written by Damodar Rajbhandari, who is working on a PhD in the Mathematical Physics at the School of Mathematics & Statistics, University of Melbourne, Australia.
We welcome relevant, & respectful comments. Please read our Comment Policy before commenting.