查看数据是否在 R 中正态分布

本文介绍了查看数据是否在 R 中正态分布的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

谁能帮我在R中填写以下函数:

Can someone please help me fill in the following function in R:

#data is a single vector of decimal values
normally.distributed <- function(data) {
if(data is normal)
return(TRUE)
else
return(NO)
}

推荐答案

常态性测试并没有像大多数人认为的那样做.Shapiro 的检验、Anderson Darling 和其他检验是针对正态性假设的零假设检验.这些不应用于确定是否使用正常的理论统计程序.事实上，它们对数据分析师几乎没有价值.在什么条件下我们有兴趣拒绝数据呈正态分布的原假设?我从未遇到过正常测试是正确做法的情况.样本量小时，即使是大的正态性偏差也检测不到，而当你的样本量大时，即使是最小的正态性偏差也会导致拒绝零.

Normality tests don't do what most think they do. Shapiro's test, Anderson Darling, and others are null hypothesis tests AGAINST the the assumption of normality. These should not be used to determine whether to use normal theory statistical procedures. In fact they are of virtually no value to the data analyst. Under what conditions are we interested in rejecting the null hypothesis that the data are normally distributed? I have never come across a situation where a normal test is the right thing to do. When the sample size is small, even big departures from normality are not detected, and when your sample size is large, even the smallest deviation from normality will lead to a rejected null.

例如:

> set.seed(100)
> x <- rbinom(15,5,.6)
> shapiro.test(x)

    Shapiro-Wilk normality test

data:  x
W = 0.8816, p-value = 0.0502

> x <- rlnorm(20,0,.4)
> shapiro.test(x)

    Shapiro-Wilk normality test

data:  x
W = 0.9405, p-value = 0.2453

因此，在这两种情况下(二项式和对数正态变量)，p 值 > 0.05 导致无法拒绝 null(数据正常).这是否意味着我们要得出数据正常的结论?(提示:答案是否定的).不拒绝不等于接受.这是假设检验 101.

So, in both these cases (binomial and lognormal variates) the p-value is > 0.05 causing a failure to reject the null (that the data are normal). Does this mean we are to conclude that the data are normal? (hint: the answer is no). Failure to reject is not the same thing as accepting. This is hypothesis testing 101.

但是更大的样本量呢?让我们以分布非常接近正态的情况为例.

But what about larger sample sizes? Let's take the case where there the distribution is very nearly normal.

> library(nortest)
> x <- rt(500000,200)
> ad.test(x)

    Anderson-Darling normality test

data:  x
A = 1.1003, p-value = 0.006975

> qqnorm(x)

这里我们使用自由度为 200 的 t 分布.qq-plot 显示该分布比您在现实世界中可能看到的任何分布都更接近正态，但测试以非常高的置信度拒绝了正态性.

Here we are using a t-distribution with 200 degrees of freedom. The qq-plot shows the distribution is closer to normal than any distribution you are likely to see in the real world, but the test rejects normality with a very high degree of confidence.

针对正态性的显着检验是否意味着在这种情况下我们不应该使用正态理论统计数据?(另一个提示:答案是否定的 :) )

Does the significant test against normality mean that we should not use normal theory statistics in this case? (another hint: the answer is no :) )

这篇关于查看数据是否在 R 中正态分布的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！