我有一个实验数据集,希望适合多项式。数据包括自变量,因变量和后者的测量不确定性,例如

2000  0.2084272   0.002067834
2500  0.207078125 0.001037248
3000  0.2054202   0.001959138
3500  0.203488075 0.000328942
4000  0.2013152   0.000646088
4500  0.198933825 0.001375657
5000  0.196375    0.000908696
5500  0.193668575 0.00014721
6000  0.1908432   0.000526976
6500  0.187926325 0.001217318
7000  0.1849442   0.000556495
7500  0.181921875 0.000401883
8000  0.1788832   0.001446992
8500  0.175850825 0.001235017
9000  0.1728462   0.001676249
9500  0.169889575 0.001011735
10000 0.167       0.000326678


(x,y,+-y列)。

我可以使用上面的例子进行多项式拟合

mydata = read.table("example.txt")
model <- lm(V2~V1+I(V1^2)+I(V1^3)+I(V1^4), data = mydata)


但这没有利用不确定性值。我如何告知R数据集的第三列是不确定性,因此应在回归分析中使用它?

最佳答案

由于因变量中的测量误差与自变量不相关,因此估计系数没有偏见,但标准误差太小。这是我使用的参考(第1和2页):
http://people.stfx.ca/tleo/econ370term2lec4.pdf

我认为您只需要调整lm()计算出的标准误差即可。这就是我在下面的代码中尝试做的。我不是统计人员,所以您可能想发布经过交叉验证并要求更好的直觉。

对于下面的示例,我假设“不确定性”列为标准偏差(或标准误差)。为简单起见,我将模型更改为:y〜x。

# initialize dataset
df <- data.frame(
        x = c(2000,2500,3000,3500,4000,4500,5000,5500,6000,6500,7000,7500,8000,8500,9000,9500,10000),
        y = c(0.2084272,0.207078125,0.2054202,0.203488075,0.2013152,0.198933825, 0.196375,0.193668575, 0.1908432, 0.187926325,   0.1849442,   0.181921875, 0.1788832, 0.175850825, 0.1728462,0.169889575,  0.167),
        y.err = c(0.002067834, 0.001037248,  0.001959138, 0.000328942, 0.000646088, 0.001375657, 0.000908696, 0.00014721, 0.000526976, 0.001217318, 0.000556495, 0.000401883, 0.001446992, 0.001235017, 0.001676249, 0.001011735, 0.000326678)
)

df

#  model regression
model <- lm(y~x, data = df)
summary(model)

#  get the variance of the measurement error
#  thanks to: http://schools-wikipedia.org/wp/v/Variance.htm
#  law of total variance
pooled.var.y.err <- mean((df$y.err)^2) + var((df$y.err)^2)

# get variance of beta from model
#   thanks to: http://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression
X <- cbind(1, df$x)
#     (if you add more variables to the model you need to modify the following line)
var.betaHat <- anova(model)[[3]][2] * solve(t(X) %*% X)

# add betaHat variance to measurement error variance
var.betaHat.adj <- var.betaHat + pooled.var.y.err

# calculate adjusted standard errors
sqrt(diag(var.betaHat.adj))

# compare to un-adjusted standard errors
sqrt(diag(var.betaHat))

08-24 12:15