本文介绍了重量相当于geom_density2d的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 考虑以下数据: contesto xy perc 1 M01 81.370 255.659 22 2 M02 85.814 242.688 16 3 M03 73.204 240.526 33 4 M04 66.478 227.916 46 5 M04a 67.679 218.668 15 6 M05 59.632 239.325 35 7 M06 64.316 252.777 23 8 M08 90.258 227.676 45 9 M09 100.707 217.828 58 10 M10 89.829 205.278 53 11 M11 114.998 216.747 15 12 M12 119.922 235.482 18 $ b $ 13 M13 129.170 239.205 36 14 M14 142.501 229.717 24 15 M15 76.206 213.144 24 $ b $ 16 M16 30.090 166.785 33 17 M17 130.731 219.989 56 18 M18 74.885 192.336 36 19 M19 48.823 142.645 32 20 M20 48.463 186.361 24 21 M21 74.765 205.698 16 我想为由perc加权的点x和y创建一个2d密度图。我可以通过使用 rep 来做到这一点(尽管我认为不正确): (数据集,数据集[rep(1:nrow(dataset),perc),]) (库ggplot2) dataset2< ggplot(dataset2,aes(x,y))+ stat_density2d(aes(alpha = .. level ..,fill = .. level ..),size = 2, bins = 10,geom =polygon)+ scale_fill_gradient(low =yellow,high =red)+ scale_alpha(range = c(0.00,0.5),guide = FALSE)+ geom_density2d (color =black,bins = 10)+ geom_point(data = dataset)+ guides(alpha = FALSE)+ xlim(c(10,160))+ ylim(c(120, 280)) 这似乎不是正确的方法,因为其他 geom s允许加权如下: dat ggplot(dat,aes(x = Var1))+ geom_bar(aes(weight = Freq)) 但是,如果我尝试在这里使用权重,则图表与数据不匹配(desc被忽略): ggplot(数据集,aes(x,y))+ stat_density2d(aes(alpha = .. level。 。,fill = .. level ..,weight = perc), size = 2,bins = 10,geom =polygon)+ scale_fill_gradient(low =yellow,high =red )+ scale_alpha(range = c(0.00,0.5),guide = FALSE)+ geom_density2d(color =black,bins = 10,aes(weight = perc))+ geom_point(data = dataset)+ guides(alpha = FALSE)+ xlim(c(10,160))+ ylim(c(120,280)) 是否使用 rep 来衡量密度的正确方法,或者有更好的方法类似于 geom_bar ? 重量 方法看起来像使用基本R s创建的内核密度o我认为这是它的外观: 数据集< - 结构(list(contesto = structure(1:21,.Label = c M01,M02,,M03,M04,M04a,M05,M06,M08,M09,M10,M11,M12,M13,M14,M15,M16,M17,M18,M19,M20,M21),class =因子),x = c(81.37,85.814,73.204,66.478,$ b $ 67.679,59.632,64.316,90.258,100.707,89.829,114.998,119.922,$ b $ 129.17,142.501,76.206,30.09,130​​.731, 74.885,48.823,48.463, 74.765),y = c(255.659,242.688,240.526,227.916,218.668,239.325, 252.777,227.676,217.828,205.278,216.747,235.482,239.205, 229.717,213.144,166.785,219.989,192.336,142.645,186.361, 205.698),perc = c(22,16,33,46,15,35,23,45,58,53,15, 18,36,24,24,33,56,36,32,24,16)),.Names = c(比赛,x,y,perc), row.names = c(NA,-21 L),class =data.frame) 解决方案如果你的权数是每个坐标的#个观测值(或者按比例),我认为你做得对。该函数似乎期待所有的观察结果,如果您在原始数据集上调用ggplot对象,则无法动态更新ggplot对象,因为它已经为密度建模,并且包含派生的绘图数据。 如果您的实际数据集合(),您可能希望使用 data.table ,而不是很大,大约快70倍。例如在这里看到1米的气压,重复1-20次(在这个例子中观测值> 10m)。尽管如此,没有任何与660次观测相关的性能相关性(并且剧情可能会成为您的大型数据集性能瓶颈)。 bigtable< -data.frame(x = runif(10e5),y = runif(10e5),perc = sample(1:20,10e5,T)) system.time(rep.with。通过< -with(bigtable,bigtable [rep(1:nrow(bigtable),perc),]))#用户系统已用#11.67 0.18 11.92 系统。 time(rep.with.dt< -data.table(bigtable)[,list(x = rep(x,perc),y = rep(y,perc))]))#已用用户系统#0.12 0.05 0.18 #检查它们是同一个总和(rep.with.dt $ x)==总和($ by $ x) #[1] TRUE #OUTPUT ROWS nrow(rep.with.dt)#[1] 10497966 Consider the following data: contesto x y perc1 M01 81.370 255.659 222 M02 85.814 242.688 163 M03 73.204 240.526 334 M04 66.478 227.916 465 M04a 67.679 218.668 156 M05 59.632 239.325 357 M06 64.316 252.777 238 M08 90.258 227.676 459 M09 100.707 217.828 5810 M10 89.829 205.278 5311 M11 114.998 216.747 1512 M12 119.922 235.482 1813 M13 129.170 239.205 3614 M14 142.501 229.717 2415 M15 76.206 213.144 2416 M16 30.090 166.785 3317 M17 130.731 219.989 5618 M18 74.885 192.336 3619 M19 48.823 142.645 3220 M20 48.463 186.361 2421 M21 74.765 205.698 16I would like to create a 2d density plot for points x and y weighted by perc. I can do this (though I don't think properly) as follows by using rep:library(ggplot2)dataset2 <- with(dataset, dataset[rep(1:nrow(dataset), perc),])ggplot(dataset2, aes(x, y)) + stat_density2d(aes(alpha=..level.., fill=..level..), size=2, bins=10, geom="polygon") + scale_fill_gradient(low = "yellow", high = "red") + scale_alpha(range = c(0.00, 0.5), guide = FALSE) + geom_density2d(colour="black", bins=10) + geom_point(data = dataset) + guides(alpha=FALSE) + xlim(c(10, 160)) + ylim(c(120, 280))This seems like not the correct approach as other geoms allow for weighting as in:dat <- as.data.frame(ftable(mtcars$cyl))ggplot(dat, aes(x=Var1)) + geom_bar(aes(weight=Freq))However if I try using weight here the plot doesn't match the data (desc is ignored):ggplot(dataset, aes(x, y)) + stat_density2d(aes(alpha=..level.., fill=..level.., weight=perc), size=2, bins=10, geom="polygon") + scale_fill_gradient(low = "yellow", high = "red") + scale_alpha(range = c(0.00, 0.5), guide = FALSE) + geom_density2d(colour="black", bins=10, aes(weight=perc)) + geom_point(data = dataset) + guides(alpha=FALSE) + xlim(c(10, 160)) + ylim(c(120, 280))Is this use of rep the correct way to weight the density or is there a better approach akin to the weight argument for geom_bar?The rep approach looks like the kernel density made with base R so I assume this is how it should look:dataset <- structure(list(contesto = structure(1:21, .Label = c("M01", "M02", "M03", "M04", "M04a", "M05", "M06", "M08", "M09", "M10", "M11", "M12", "M13", "M14", "M15", "M16", "M17", "M18", "M19", "M20", "M21"), class = "factor"), x = c(81.37, 85.814, 73.204, 66.478, 67.679, 59.632, 64.316, 90.258, 100.707, 89.829, 114.998, 119.922, 129.17, 142.501, 76.206, 30.09, 130.731, 74.885, 48.823, 48.463, 74.765), y = c(255.659, 242.688, 240.526, 227.916, 218.668, 239.325, 252.777, 227.676, 217.828, 205.278, 216.747, 235.482, 239.205, 229.717, 213.144, 166.785, 219.989, 192.336, 142.645, 186.361, 205.698), perc = c(22, 16, 33, 46, 15, 35, 23, 45, 58, 53, 15, 18, 36, 24, 24, 33, 56, 36, 32, 24, 16)), .Names = c("contesto", "x", "y", "perc"), row.names = c(NA, -21L), class = "data.frame") 解决方案 I think you're doing it right, if your weights are # observations at each co-ordinate (or in proportion). The function seems to expect all the observations, and there's no way to dynamically update the ggplot object if you call it on your original dataset, because it's already modelled the density, and contains derived plot data.You might want to use data.table instead of with() if your real data set is large, it's about 70 times faster. e.g. see here for 1m co-ords, with 1-20 repeats (>10m observations in this example). No performance relevance for 660 observations, though (and the plot will probably be your performance bottleneck with a large data set anyway).bigtable<-data.frame(x=runif(10e5),y=runif(10e5),perc=sample(1:20,10e5,T))system.time(rep.with.by<-with(bigtable, bigtable[rep(1:nrow(bigtable), perc),]))#user system elapsed #11.67 0.18 11.92system.time(rep.with.dt<-data.table(bigtable)[,list(x=rep(x,perc),y=rep(y,perc))])#user system elapsed #0.12 0.05 0.18# CHECK THEY'RE THE SAMEsum(rep.with.dt$x)==sum(rep.with.by$x)#[1] TRUE # OUTPUT ROWSnrow(rep.with.dt)#[1] 10497966 这篇关于重量相当于geom_density2d的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-23 09:50