问题描述
在很多夏天,我都在努力应对苍蝇随着时间的推移(不定期地采摘)而出现的累积现象(尽管首先,我只是想工作一年).累积出现遵循S型模式,我想创建3参数Weibull累积分布函数的最大似然估计.我一直尝试在fitdistrplus
包中使用的三参数模型一直给我一个错误.我认为这一定与我的数据的结构有关,但我无法弄清楚.显然,我希望它将每个点都读取为x
(度数天)和y
(紧急度)值,但是它似乎无法读取两列.我得到的主要错误是数学函数的非数字参数"或(数据必须稍有不同)"数据必须是长度大于1的数字矢量".下面是我的代码,包括在df_dd_em
数据框中添加的列,用于在必要时累积出现和出现百分比.
I am working with the cumulative emergence of flies over time (taken at irregular intervals) over many summers (though first I am just trying to make one year work). The cumulative emergence follows a sigmoid pattern and I want to create a maximum likelihood estimation of a 3-parameter Weibull cumulative distribution function. The three-parameter models I've been trying to use in the fitdistrplus
package keep giving me an error. I think this must have something to do with how my data is structured, but I cannot figure it out. Obviously I want it to read each point as an x
(degree days) and a y
(emergence) value, but it seems to be unable to read two columns. The main error I'm getting says "Non-numeric argument to mathematical function" or (with slightly different code) "data must be a numeric vector of length greater than 1". Below is my code including added columns in the df_dd_em
dataframe for cumulative emergence and percent emergence in case that is useful.
degree_days <- c(998.08,1039.66,1111.29,1165.89,1236.53,1293.71,
1347.66,1387.76,1445.47,1493.44,1553.23,1601.97,
1670.28,1737.29,1791.94,1849.20,1920.91,1967.25,
2036.64,2091.85,2152.89,2199.13,2199.13,2263.09,
2297.94,2352.39,2384.03,2442.44,2541.28,2663.90,
2707.36,2773.82,2816.39,2863.94)
emergence <- c(0,0,0,1,1,0,2,3,17,10,0,0,0,2,0,3,0,0,1,5,0,0,0,0,
0,0,0,0,1,0,0,0,0,0)
cum_em <- cumsum(emergence)
df_dd_em <- data.frame (degree_days, emergence, cum_em)
df_dd_em$percent <- ave(df_dd_em$emergence, FUN = function(df_dd_em) 100*(df_dd_em)/46)
df_dd_em$cum_per <- ave(df_dd_em$cum_em, FUN = function(df_dd_em) 100*(df_dd_em)/46)
x <- pweibull(df_dd_em[c(1,3)],shape=5)
dframe2.mle <- fitdist(x, "weibull",method='mle')
推荐答案
这是我对您追求的最好的猜测:
Here's my best guess at what you're after:
设置数据:
dd <- data.frame(degree_days=c(998.08,1039.66,1111.29,1165.89,1236.53,1293.71,
1347.66,1387.76,1445.47,1493.44,1553.23,1601.97,
1670.28,1737.29,1791.94,1849.20,1920.91,1967.25,
2036.64,2091.85,2152.89,2199.13,2199.13,2263.09,
2297.94,2352.39,2384.03,2442.44,2541.28,2663.90,
2707.36,2773.82,2816.39,2863.94),
emergence=c(0,0,0,1,1,0,2,3,17,10,0,0,0,2,0,3,0,0,1,5,0,0,0,0,
0,0,0,0,1,0,0,0,0,0))
dd <- transform(dd,cum_em=cumsum(emergence))
我们实际上将适合间隔删减"的分布(即,连续度日观测值之间出现的概率:此版本假设第一个观测值指的是第一度观测值之前天观测值,您可以将其更改为引用 最后一个观测值之后的观测值.
We're actually going to fit to an "interval-censored" distribution (i.e. probability of emergence between successive degree day observations: this version assumes that the first observation refers to observations before the first degree-day observation, you could change it to refer to observations after the last observation).
library(bbmle)
## y*log(p) allowing for 0/0 occurrences:
y_log_p <- function(y,p) ifelse(y==0 & p==0,0,y*log(p))
NLLfun <- function(scale,shape,x=dd$degree_days,y=dd$emergence) {
prob <- pmax(diff(pweibull(c(-Inf,x), ## or (c(x,Inf))
shape=shape,scale=scale)),1e-6)
## multinomial probability
-sum(y_log_p(y,prob))
}
library(bbmle)
我可能应该使用更系统的方法,例如矩量法(即,将Weibull分布的均值和方差与数据的均值和方差进行匹配),但我只是四处寻找以寻找合理的起始值:
I should probably have used something more systematic like the method of moments (i.e. matching the mean and variance of a Weibull distribution with the mean and variance of the data), but I just hacked around a bit to find plausible starting values:
## preliminary look (method of moments would be better)
scvec <- 10^(seq(0,4,length=101))
plot(scvec,sapply(scvec,NLLfun,shape=1))
使用parscale
让R知道参数的比例非常不同很重要:
It's important to use parscale
to let R know that the parameters are on very different scales:
startvals <- list(scale=1000,shape=1)
m1 <- mle2(NLLfun,start=startvals,
control=list(parscale=unlist(startvals)))
现在尝试使用三参数Weibull(按照最初的要求)-只需对我们已经拥有的产品进行一些修改:
Now try with a three-parameter Weibull (as originally requested) -- requires only a slight modification of what we already have:
library(FAdist)
NLLfun2 <- function(scale,shape,thres,
x=dd$degree_days,y=dd$emergence) {
prob <- pmax(diff(pweibull3(c(-Inf,x),shape=shape,scale=scale,thres)),
1e-6)
## multinomial probability
-sum(y_log_p(y,prob))
}
startvals2 <- list(scale=1000,shape=1,thres=100)
m2 <- mle2(NLLfun2,start=startvals2,
control=list(parscale=unlist(startvals2)))
看起来三参数拟合更好:
Looks like the three-parameter fit is much better:
library(emdbook)
AICtab(m1,m2)
## dAIC df
## m2 0.0 3
## m1 21.7 2
这是图形摘要:
with(dd,plot(cum_em~degree_days,cex=3))
with(as.list(coef(m1)),curve(sum(dd$emergence)*
pweibull(x,shape=shape,scale=scale),col=2,
add=TRUE))
with(as.list(coef(m2)),curve(sum(dd$emergence)*
pweibull3(x,shape=shape,
scale=scale,thres=thres),col=4,
add=TRUE))
(也可以通过ggplot2
...更加优雅地完成此操作)
(could also do this more elegantly with ggplot2
...)
- 这些看起来似乎不太合适,但它们很理智. (原则上,您可以根据每个时间间隔的预期出现次数进行卡方拟合优度检验,并考虑到已经安装了三参数模型这一事实,尽管该值可能会偏低...)
- 拟合度上的置信区间有点麻烦;您的选择是(1)自举; (2)参数自举(假设数据为多元正态分布的重采样参数); (3)增量法.
- 使用
bbmle::mle2
可以轻松完成获取配置文件置信区间的操作:
- These don't seem like spectacularly good fits, but they're sane. (You could in principle do a chi-squared goodness-of-fit test based on the expected number of emergences per interval, and accounting for the fact that you've fitted a three-parameter model, although the values might be a bit low ...)
- Confidence intervals on the fit are a bit of a nuisance; your choices are (1) bootstrapping; (2) parametric bootstrapping (resample parameters assuming a multivariate normal distribution of the data); (3) delta method.
- Using
bbmle::mle2
makes it easy to do things like get profile confidence intervals:
confint(m1)
## 2.5 % 97.5 %
## scale 1576.685652 1777.437283
## shape 4.223867 6.318481
这篇关于在三个参数Weibull cdf上运行最大似然估计的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!