An Analysis of Variance Test for Normality (Complete Samples)
S. S. Shapiro; M. B. Wilk
Biometrika, Vol. 52, No. 3/4. (Dec., 1965), pp. 591-611.
Stable URL:
http://links.jstor.org/sici?sici=0006-3444%28196512%2952%3A3%2F4%3C591%3AAAOVTF%3E2.0.CO%3B2-B
Biometrika is currently published by Biometrika Trust.
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/bio.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For
more information regarding JSTOR, please contact support@jstor.org.
http://www.jstor.org
Mon May 21 11:16:44 2007
Biometrika (1965), 52, 3 and 2, p. 591
W i t h 5 text-jgures
Printed in &eat Bri tain
An analysis of variance test for normality
(complete samp1es)t
BYS. S. SHAPIRO AND M. B. WILK
General Electric Go. and Bell Telephone Laboratories, Inc.
The main intent of this paper is to introduce a new statistical procedure for testing a
complete sample for normality. The test statistic is obtained by dividing the square of an
appropriate linear combination of the sample order statistics by the usual symmetric
estimate of variance. This ratio is both scale and origin invariant and hence the statistic
is appropriate for a test of the composite hypothesis of normality.
Testing for distributional assumptions in general and for normality in particular has been
a major area of continuing statistical research-both theoretically and practically. A
possible cause of such sustained interest is that many statistical procedures have been
derived based on particular distributional assumptions-especially that of normality.
Although in many cases the techniques are more robust than the assumptions underlying
them, still a knowledge that the underlying assumption is incorrect may temper the use
and application of the methods. Moreover, the study of a body of data with the stimulus
of a distributional test may encourage consideration of, for example, normalizing trans-
formations and the use of alternate methods such as distribution-free techniques, as well as
detection of gross peculiarities such as outliers or errors.
The test procedure developed in this paper is defined and some of its analytical properties
described in $2. Operational information and tables useful in employing the test are detailed
in $ 3 (which may be read independently of the rest of the paper). Some examples are given
in $4. Section5 consists of an extract from an empirical sampling study of the comparison of
the effectiveness of various alternative tests. Discussion and concluding remarks are given
in $6.
2. THE W TEST FOR NORMALITY (COMPLETE SAMPLES)
2.1. Motivation and early work
This study was initiated, in part, in an attempt to summarize formally certain indications
of probability plots. In particular, could one condense departures from statistical linearity
of probability plots into one or a few 'degrees of freedom' in the manner of the application
of analysis of variance in regression analysis?
In a probability plot, one can consider the regression of the ordered observations on the
expected values of the order statistics from a standardized version of the hypothesized
distribution-the plot tending to be linear if the hypothesis is true. Hence a possible method
of testing the distributional assumptionis by means of an analysis of variance type procedure.
Using generalized least squares (the ordered variates are correlated) linear and higher-order
models can be fitted and an 3'-type ratio used to evaluate the adequacy of the linear fit.
t Part of this research was supported by the Office of Naval Research while both authors were at
Rutgers University.
This approach was investigated in preliminary work. While some promising results
were obtained, the procedure is subject to the serious shortcoming that the selection of the
higher-order model is, practically speaking, arbitrary. However, research is continuing
along these lines.
Another analysis of variance viewpoint which has been investigated by the present
authors is to compare the squared slope of the probability plot regression line, which under
the normality hypothesis is an estimate of the population variance multiplied by a constant,
with the residual mean square about the regression line, which is another estimate of the
variance. This procedure can be used with incomplete samples and has been described
elsewhere (Shapiro & Wilk, 1965b).
As an alternative to the above, for complete samples, the squared slope may be com-
pared with the usual symmetric sample sum of squares about the mean which is independent
of the ordering and easily computable. It is this last statistic that is discussed in the re-
mainder of this paper.
2.2. Derivation of the W statistic
Let m' = (ml,m,, ...,m,) denote the vector of expected values of standard normal
order statistics, and let V = (vii) be the corresponding n x n covariance matrix. That is, if
x, 6 x, 6 . . .x, denotes an ordered random sample of size n from a normal distribution with
mean 0 and variance 1, then
E ( x ) ~= mi (i= 1,2, ...,n),
and cov (xi, xj) = vii (i,j = 1,2,...,n).
Let y' = (y,, ...,y,) denote a vector of ordered random observations. The objective is
to derive a test for the hypothesis that this is a sample from a normal distribution with
unknown mean p and unknown variance a,.
Clearly, if the {y,} are a normal sample then yi may be expressed as
y i = p + r x i ( i = 1,2,...,n).
It follows from the generalized least-squares theorem (Aitken, 1938; Lloyd, 1952) that the
best linear unbiased estimates of p and a are those quantities that minimize the quadratic
form (y-pl -am)' V-l (y-pl --am), where 1' = (1,1, ...,1).These estimates are, respec-
tively,
m' V-I (ml' - lm') V-ly A
'LI = 1'v-llm1v-lm- (11v-lm)2
1' V-l(l??a' -ml') V-ly
and a h = 1'8-I 1m'V-lm- (1'V-1m)2'
For symmetric distributions, 1'V-lm = 0, and hence
A m' 7-l~
= - y = , and 8 = -----
n 1 m' V-lm'
Let
denote the usual symmetric unbiased estimate of (n - 1)a2.
The W test statistic for normality is defined by
An analysis of variance test for normality
where
m' V-I
a' = (a,, ...,a,) = (rn'V-1 V-lm)t
and
Thus, b is, up to the normalizing constant C, the best linear unbiased estimate of the slope
of a linear regression of the ordered observations, y,, on the expected values, mi, of the stand-
ard normal order statistics. The constant C is so defined that the linear coefficients are
normalized.
It may be noted that if one is indeed sampling from a normal populatioii then the numer-
ator, b2, and denominator, S2,of W are both, up to a constant, estimating the same quantity,
namely a2.For non-normal populations, these quantities would not in general be estimating
the same thing. Heuristic considerations augmented by some fairly extensive empirical
sampling results (Shapiro & Wilk, 1964~) using populations with a wide range of and
p2values, suggest that the mean values of W for non-null distributions tends to shift
to the left of that for the null case. Purther it appears that the variance of the null dis-
tribution of W tends to be smaller than that of the non-null distribution. It is likely
that this is due to the positive correlation between the numerator and denominator for a
normal population being greater than that for non-normal populations.
Note that the coefficients (a,) are just the normalized 'best linear unbiased' coefficients
tabulated in Sarhan & Greenberg (1956).
2.3. Some analytica2 properties of W
LEMMA1. W is scale and origin invariant
Proof. This follows from the fact that for normal (more generally symmetric) distribu-
tions,
COROLLARY1. W has a distribution which depends only on the sanzple size n, for samples
from a normal distribution.
COROLLARY W is statistically independent of S2and of 5,for samples from a normal 2.
distribution.
Proof. This follows from the fact that y and S2are sufficient for p and a2(Hogg & Craig,
1956).
COROLLARY = for any r. 3. E Wr Eb2r/ES2r,
LEMMA2. The maximum value of W is 1.
Proof. Assume ?j= 0 since W is origin invariant by Lemma 1. Hence
Since
because X a: = a'a = 1, by definition, then W is bounded by 1. This maximum is in fact
.I
achieved when yi = va,, for arbitrary 7.
LEMMA3. The minimum value of TV is na!/(n -1).
Pr0of.t (Due to C. L. Mallows.) Since W is scale and origin invariant, i t suffices to con-
n
sider the maximization of 2;y! subject to the constraints Zyi = 0, Zaiyi = 1. Since this
i=l
is a convex region and Zy? is a convex function, the maximum of the latter must occur at
one of the ( n- 1) vertices of the region. These are
1 1 - (n- 1)
9 9(%(a,+...+a,-,)' n(al+...+a,-,) n(al+.. . +a,-;Ja
It can now be checked numerically, for the values of the specific coefficients {a,), that the
n
maximum of 2; y: occurs a t the first of these points and the corresponding minimum value
i=l
of W is as given in the Lemma.
LEMMA4. The half andfirst moments of W are given by
and
where R2 = mlV-lm, and C2= mlV-l V-lm.
Proof. Using Corollary 3 of Lemma 1,
E W* = EbIES and E W = Eb2/ES2.
n - 1'/ ( and E S 2= ( n - I )@ .
From the general least squares theorem (see e.g. Kendall & Stuart, vol. 11(1961)),
and
since var (8)= a2/m' V-lm = a2/R2,and hence the results of the lemma follosv.
Values of these momen'ts are shown in Pig. 1 for sample sizes n = 3(1)20.
LEMMA5 . A joint distribution involving W is defined by
over a region T on which the Oi's and W are not independent, and where K is a constant.
1- Lemma 3 was conjectured intuitively and verified by certain numerical studies. Subsequently
the above proof was given by C. L. Mallows.
An analysis of variance test for normality
Proof.Consider an orthogonal transformation B such that y = Bu, where
12 12
u,= Cyi/@t and u2=lXaiyi=b.
i=l i=l
The ordered y,'s are distributed as
After integrating out, u,, the joint density for u,, ...,u, is
over the appropriate region T*.Changing to polar co-ordinates such that
u2 = psinO,, etc,
and then integrating over p, yields the joint density of O,, ...,On-, as
K** cosn-3 0, cos n-4 02...cos On-3,
over some region T**.
From these various transformations
,= b2-- u?
-
p2 sill2 0,
= sin2 O,,
8 2 12X .$ p2
i=l
from which the lemma follows. The Oi's and W are not independent, they are restricted
in the sample space T.
Sample size, 9%
Fig. 1. Moments of W, E(Wp) ,n = 3(1)20,s = +,1.
COROLLARY = 3, the density of W is4. For n
Note that for n = 3, the it' statistic is equivalent (up to a constant multiplier) to the
statistic (rangelstandard deviation) advanced by David, Hartley & Pearson (1954) and
the result of the corollary is essentially given by Pearson & Stephens (1964).
It has not been possible, for general n, to integrate out of the 8,'s of Lemma 5 to obtain
an explicit form for the distribution of W. However, explicit results have also been given
for n = 4, Shapiro (1964).
2.4. Approxirnatio~zsassociated with the W test
The {a,) used in the W statistic are defined by
n
ai = C rnjvij/C ( j= 1,2,. . . , n),
j=1
where rnj, vij and C have been defined in $2.2. To determine the ai directly it appears necessary
to know both the vector of means m and the covariance matrix V. However, to date, the
elements of V are known only up to samples of size 20 (Sarhan & Greenberg, 1956). Various
approximations are presented in the remainder of this section to enable the use of W for
samples larger than 20.
By definition,
nz' V-I nz' 8-I
a = - -. -(nz'V-1 v-lnt)B - C
is such that a'a = 1. Let a* = m'V-1, then C2 = u*'a*. Suggested approximations are
= 2nzi (i = 2, 3, . . . , n -1),
and
A comparisoil of a: (the exact values) and ti: for various values of i $. 1 and n = 5, 10,
15, 20 is given in Table 1. (Note a4 = - It will be seen that the approximation is
generally in error by less than 1%,particularly as n increases. This encourages one to trust
the use of this approximation for n > 20. Necessary values of the mi for this approximation
are available in Harter (1961).
Table 1. Comparison of la$/ and \ti; 1 = 12nz,l, for selected values of
i ( + 1) and n
Exact
Approx.
Exact
Approx.
Exact
Approx.
Exact
Approx.
597 An analysis of variance test for normality
A comparison of a: and &: for n = 6(1) 20 is given in Table 2. While the errors of this
approximation are quite small for n < 20, the approximation and true values appear to
cross over a t n = 19. Further comparisons with other approximations, discussed below,
suggested the changed formulation of 8: for n > 20 given above.
Table 2. Comparison of a: and & f
7% Exact Approximate " i ~ Exact Approximate
6:usable but the
Sample size, 1%
Fig. 2. Plot of C2 = m'V-lV-lm of the sample size n.
and R2 = m'V-lm as f~~nctions
What is required for the W test are the normalized coefficients {a,). Thus & f is directly
(i= 2, ...,n - 1), must be normalized by division by C = (m' V-1 V-lm):.
A plot of the values of C2and of R2 =m' V-lm as a function of n is given in Fig. 2. The
linearity of these may be summarized by the following least-squares equations:
which gave a regression mean square of 7331.6 and a residual mean square of 0.0186, and
with a regression mean square of 1725.7 and a residual mean square of 0.0016.
Biom. j z 38
These results encourage the use of the extrapolated equations to estimate C2 and R2
for higher values of n.
A comparison can now be made between values of C2from the extrapolation equation
12
and from using
1
For the case n = 30, these give values of 119.77 and 120.47,respectively. This concordance
of the independent approximations increases faith in both.
Plackett (1958) has suggested approximations for the elements of the vector a and R2.
While his approximations are valid for a wide range of distributions and can be used with
censored samples, they are more complex, for the normal case, than those suggested above.
For the normal case his approximations are
where F(mj) = cumulative distribution evaluated a t mj,
f(naj)= density function evaluated a t mj,
and a"*1 = -a"*n s
Plackett's approximation to R2 is
Plackett's a"," approximations and the present approximations are compared with the
exact values, for sample size 20, in Table 3. I n addition a consistency comparison of the
two approximations is given for sample size 30. Plackett's result for a, (n=20) was the
only case where his approximation was closer to the true value than the simpler approxima-
tions suggested above. The differences in the two approximations for a, were negligible,
being less than 0.5 %. Both methods give good approximations, being off no more than
three units in the second decimal place. The comparison of the two methods for n = 30
shows good agreement, most of the differences being in the third decimal place. The largest
discrepancy occurred for i = 2; the estimates differed by six units in the second decimal
place, an error of less than 2 %.
The two methods of approximating R2 were compared for n = 20. Plackett's method
gave a value of 36.09, the method suggested above gave a value of 37.21 and the true
value was 37.26.
The good practical agreement of these two approximations encourages the belief that
there is little risk in reasonable extrapolations for n > 20. The values of constants, for
n > 20, given in $ 3 below, were estimated from the simple approximations and extrapola-
tions described above.
As a further internal check the values of a,, a,-, and a,-, were plotted as a function of
n for n = 3(1) 50. The plots are shown in Fig. 3 which is seen to be quite smooth for each
of the three curves a t the value n = 20. Since values for n < 20 are 'exact' the smooth
transition lends credence to the approximations for n > 20.
An analysis of variance test for normality
Table 3. Comparison of approximate values of a* = m'V-l
Present approx. Exact Placltett
-4.223 -4.215
-2.815 -2.764
-2.262 -2.237
- 1.842 -1.820
- 1.491 -1.476
-1.181 -1.169
-0.897 -0.887
-0.630 -0.622
-0.374 -0.370
-0.124 -0.123
-4.655 -4.671
- 3.231 -3.170
-2.730 -2.768
-2.357 -2.369
-2.052 -2.013
- 1.789 -1.760
- 1.553 - 1.528
- 1.338 - 1.334
- 1.137 - 1.132
-0.947 -0.941
-0.765 -0.759
-0.589 -0.582
-0.418 -0.413
-0.249 -0.249
-0.083 -0.082
Sample size, ra
Fig. 3. a, plotted as a function of sample size, ?z = 2(1) 50, for
i = n, n- 1, n - 4 (n > 8).
1.oo
0.95
0.90
0.85
W
0.80
0 75
0 70 \ /
0 65
0 5 10 15 20 25 30 35 40 45 50
Sample size, n
Fig. 5. Selected empirical percentage points of W, n = 3(1)50.
An analysis of variance test for normal i ty
Table 4. Some theoretical mome nts (p,) and
$1
0.9130
.go19
a9021
0.9082
.9120
.9175
.9215
.9260
0.9295
a9338
.9369
.9399
.9422
0.9445
.9470
.9492
.9509
.9527
0.9549
.9558
.9570
.9579
.9584
0.9598
-9607
.9615
a9624
-9626
0.9636
.9642
a9650
-9654
.9658
0.9662
.9670
.9677
~9678
a9682
0.0084
a9691
~9694
-9695
a9701
0.9703
~9710
.9709
.9712
.9714
Monte Carlo
$2
0.005698
e005166
.004491
0.003390
.002995
~002470
e002293
.001972
0.001717
.001483
.001316
.001168
.001023
0.000964
.000823
.000810
.000711
.000651
0.000594
.000568
.000504
.000504
.00045S
0.00042 1
.000404
~000382
.000369
~000344
0.000336
.000326
.000308
.000293
.000265
0.000264
.000253
.000235
.000239
-000229
0.000227
.000212
.000196
.000193
.000192
0.000184
.000170
.000179
.000165
.000154
moments (2,)
f i ,A%
3 Pz
-0.5930
- .8944
- .8176
- 1.1790
- 1.3229
- 1.3841
- 1.5987
-1.6655
- 1.7494
- 1.7744
- 1.7581
- 1.9025
- 1.8876
- 1.7968
- 1.9468
-2.1391
-2.1305
-2.2761
-2.2827
-2.3984
-2.1862
-2.3517
-2.3448
-2.4978
-2.5903
-2.6964
-2.6090
-2.7288
-2.7997
-2.6900
-3.0181
-3.0166
-2.8574
-2.7965
-3.1566
-3.0679
-3.3283
-3.1719
-3.0740
-3.2885
- 3.2646
-3.0803
-3.1645
-3.3742
-3.3353
-3.2972
-3.2810
-3.3240
2.5. Approximation to the distribution of W
The complexity in the domain of the joint distribution of W and the angles (8,) in Lemma 5
necessitates consideration of an approximation to the null distribution of W. Since only
the first and second moments of normal order statistics are, practically, available, it follows
that only the one-half and first moments of W are known. Hence a technique such as the
Cornish-Fisher expansion cannot be used.
I n the circumstance it seemed both appropriate and efficient to employ empirical samp-
ling to obtain an approximation for the null distribution.
Accordingly, normal random samples were obtained from the Rand Tables (Rand Corp.
(1955)). Repeated values of W were computed for n = 3(1) 50 and the empirical percentage
points determined for each value of n. The number of samples, m, employed was as follows:
for n = 3(1) 20, m = 5000,
Fig. 4 gives the empirical G.D.F.'s for values of n = 5, 10, 15, 20, 35, 50. Fig. 5
gives a plot of the 1, 5 , 10, 50, 90, 95, and 99 e
本文档为【shapiro1965】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。