本文介绍了ggplot2:如何在回归线上绘制小高斯密度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以图形方式显示线性(以及后来的其他类型)回归的假设.怎样像图所示那样在回归线上添加小的高斯密度(或任何类型的密度):

I want to graphically show the assumptions of linear (and later other type) regression. How can I add to my plot small Gaussian densities (or any type of densities) on a regression line just like in this figure:

推荐答案

您可以计算沿拟合线的截面的残差的经验密度.然后,只需使用geom_path在每个间隔中的所选位置绘制线条即可.要增加理论分布,请沿着每个部分的残差范围生成一些密度(此处使用法线密度).对于下面的法线密度,是根据残差确定每个部分的每个标准差,但是您可以为所有残差选择一个标准差,然后使用该标准差.

You can compute the empirical densities of the residuals for sections along a fitted line. Then, it is just a matter of drawing the lines at the positions of your choosing in each interval using geom_path. To add theoretical distribution, generate some densities along the range of the residuals for each section (here using normal density). For the Normal densities below, the standard deviation for each one is determined for each section from the residuals, but you could just choose a standard deviation for all of them and use that instead.

## Sample data
set.seed(0)
dat <- data.frame(x=(x=runif(100, 0, 50)),
                  y=rnorm(100, 10*x, 100))

## breaks: where you want to compute densities
breaks <- seq(0, max(dat$x), len=5)
dat$section <- cut(dat$x, breaks)

## Get the residuals
dat$res <- residuals(lm(y ~ x, data=dat))

## Compute densities for each section, and flip the axes, and add means of sections
## Note: the densities need to be scaled in relation to the section size (2000 here)
dens <- do.call(rbind, lapply(split(dat, dat$section), function(x) {
    d <- density(x$res, n=50)
    res <- data.frame(x=max(x$x)- d$y*2000, y=d$x+mean(x$y))
    res <- res[order(res$y), ]
    ## Get some data for normal lines as well
    xs <- seq(min(x$res), max(x$res), len=50)
    res <- rbind(res, data.frame(y=xs + mean(x$y),
                                 x=max(x$x) - 2000*dnorm(xs, 0, sd(x$res))))
    res$type <- rep(c("empirical", "normal"), each=50)
    res
}))
dens$section <- rep(levels(dat$section), each=100)

## Plot both empirical and theoretical
ggplot(dat, aes(x, y)) +
  geom_point() +
  geom_smooth(method="lm", fill=NA, lwd=2) +
  geom_path(data=dens, aes(x, y, group=interaction(section,type), color=type), lwd=1.1) +
  theme_bw() +
  geom_vline(xintercept=breaks, lty=2)

或者,只是高斯曲线

## Just normal
ggplot(dat, aes(x, y)) +
  geom_point() +
  geom_smooth(method="lm", fill=NA, lwd=2) +
  geom_path(data=dens[dens$type=="normal",], aes(x, y, group=section), color="salmon", lwd=1.1) +
  theme_bw() +
  geom_vline(xintercept=breaks, lty=2)

这篇关于ggplot2:如何在回归线上绘制小高斯密度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-12 15:31