使用"kde"函数的R中的5-D内核密度估计

本文介绍了使用"kde"函数的R中的5-D内核密度估计的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想通过使用R的"ks"库中的"kde"函数对5维数据(x，y，z，时间，大小)执行内核密度估计. 1至6维数据的密度估计值(手册第24页: http ://cran.r-project.org/web/packages/ks/ks.pdf ).

I want to perform Kernel density estimate for a 5-dimensional data (x,y,z,time,size) by using "kde" function in "ks" library of R. In it's manual it says it can do Kernel density estimate for 1- to 6-dimensional data (Page 24 of manual: http://cran.r-project.org/web/packages/ks/ks.pdf).

我的问题是它说超过3个维度，我需要指定eval.points.我不知道如何指定评估点，因为没有3个以上维度的示例.例如，如果我想在问题的空间中生成常规3D序列数据并将其用作评估点，该怎么办?
这是我的数据:

My problem is that it says for more than 3 dimensions I need to specify eval.points. I don't know how can I specify the evaluation points because there is no example for more than 3 dimensions. For example if I want to Generate regular 3D sequences data in the space of the problem and use them as the eval-point, what should I do?
Here is my data:

422.697323  164.19886   2.457419    8.083796636  0.83367586
423.008236  163.32434   0.5551326   37.58477455  0.893893903
204.733908  218.36365   1.9397874   37.88324312  0.912809449
203.963056  218.4808    0.3723791   43.21775903  0.926406005
100.727581  46.60876    1.4022341   49.41510519  0.782807523
453.335182  244.25521   1.6292517   51.73779175  0.903910803
134.909462  210.96333   2.2389119   53.13433521  0.896529401
135.300562  212.02055   0.6739541   67.55073745  0.748783521
258.237117  134.29735   2.1205291   76.34032587  0.735699304
341.305271  149.26953   3.718958    94.33975483  0.849509216
307.138925  59.60571    0.6311074   106.9636715  0.987923188
307.76875   58.91453    2.6496741   113.8515307  0.802115718
415.025535  217.17398   1.7155688   115.7464603  0.875580325
414.977687  216.73327   1.7107369   115.9776948  0.767143582
311.006135  173.24378   2.7819572   120.8079566  0.925380118
310.116929  174.28122   4.3318722   129.2648401  0.776528535
347.260911  37.34946    3.5155427   136.7851291  0.851787115
351.317624  33.65703    0.5806926   138.7349284  0.909723017
4.471892    59.42068    1.4062959   139.0543783  0.967270976
5.480223    59.72857    2.7326106   139.2114277  0.987787428
199.513023  21.53302    2.5163259   143.5895625  0.864164659
198.718031  23.50163    0.4801849   147.2280466  0.741587333
26.650517   35.2019     0.8246514   150.4876506  0.744788202
25.089379   90.47825    0.8700944   152.1944046  0.777252476
26.307439   88.41552    2.4422487   155.9090026  0.952215177
234.282901  236.11422   1.8115261   155.9658144  0.776284654
235.052948  236.77437   1.9644963   156.6900297  0.944285448
23.048202   98.6261     3.4573048   159.7700912  0.773057491
21.516695   98.05431    2.5029284   160.8202997  0.978779087
213.936324  151.87013   3.1042192   161.0612489  0.80499513
277.887935  197.25753   1.3659279   163.673142   0.758978575
277.239746  197.54001   2.2109361   166.2629868  0.775325157

这是我正在使用的代码:

And this is the code that I am using:

library(ks)
library(rgl)
kern <- read.table(file.choose(), sep=",")
hat <- kde(kern)

它最多可用于3维，但对于4维和5维，它表示:需要为3个以上的维指定评估点.

It works for upto 3 dimensions but for 4 and 5 dimensions it says: need to specify eval.points for more than 3 dimensions.

此外，我想知道如何绘制这些内核?例如，使用z作为条件变量，并在3D散点图中绘制x，y，time，并针对不同的尺寸范围使用不同的颜色

Also, I'd like to know how can I plot these kernels? For example use z as the conditioning variable and plot x,y,time in a 3D scatterplot and also use different colors for different ranges of size

推荐答案

就像您一样，我最初无法找到有效的示例，文档也没有真正描述期望的对象类型.对于您的5d数据集，我尝试设置一个5d网格点，这些点是根据每个维度的第10、25、50、75和90个百分位数构建的.我的数据集被命名为"dat":

Like you I wasn't initially able to find a worked example and the documentation doesn't really describe what sort of object is expected. For your 5d set of data I tried setting up a 5d-grid of points that were constructed from the 10, 25th, 50th, 75th and 90th percentiles for each of the dimensions. My dataset was named "dat":

evpts <- do.call(expand.grid,  lapply(dat, quantile, prob=c(0.1,.25,.5,.75,.9)) )

然后我将其传递给kde函数，似乎满足了该算法.是否正确"确实需要检查.没有保证.

I then passed that to the kde function and seemed to satisfy the algorithm. Whether this is "correct" does need checking. No guarantees.

> hat <- kde(dat, eval.points= evpts)
> str(hat)
List of 8
 $ x          : num [1:31, 1:5] 423 423 205 204 101 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:5] "V1" "V2" "V3" "V4" ...
 $ eval.points:'data.frame':    3125 obs. of  5 variables:
  ..$ V1: Named num [1:3125] 23 118 234 326 415 ...
  .. ..- attr(*, "names")= chr [1:3125] "10%" "25%" "50%" "75%" ...
  ..$ V2: Named num [1:3125] 35.2 35.2 35.2 35.2 35.2 ...
  .. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
  ..$ V3: Named num [1:3125] 0.581 0.581 0.581 0.581 0.581 ...
  .. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
  ..$ V4: Named num [1:3125] 43.2 43.2 43.2 43.2 43.2 ...
  .. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
  ..$ V5: Named num [1:3125] 0.749 0.749 0.749 0.749 0.749 ...
  .. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
  ..- attr(*, "out.attrs")=List of 2
  .. ..$ dim     : Named int [1:5] 5 5 5 5 5
  .. .. ..- attr(*, "names")= chr [1:5] "V1" "V2" "V3" "V4" ...
  .. ..$ dimnames:List of 5
  .. .. ..$ V1: chr [1:5] "V1= 23.0482" "V1=117.8185" "V1=234.2829" "V1=326.1557" ...
  .. .. ..$ V2: chr [1:5] "V2= 35.20190" "V2= 59.51319" "V2=149.26953" "V2=211.49194" ...
  .. .. ..$ V3: chr [1:5] "V3=0.5806926" "V3=1.1180112" "V3=1.9397874" "V3=2.5830000" ...
  .. .. ..$ V4: chr [1:5] "V4= 43.21776" "V4= 71.94553" "V4=129.26484" "V4=151.34103" ...
  .. .. ..$ V5: chr [1:5] "V5=0.7487835" "V5=0.7764066" "V5=0.8517871" "V5=0.9190948" ...
 $ estimate   : Named num [1:3125] 3.23e-08 5.70e-08 1.01e-08 4.07e-10 6.20e-12 ...
  ..- attr(*, "names")= chr [1:3125] "1" "2" "3" "4" ...
 $ H          : num [1:5, 1:5] 5073.879 1010.815 1.211 -651.089 -0.223 ...
 $ gridded    : logi FALSE
 $ binned     : logi FALSE
 $ names      : chr [1:5] "V1" "V2" "V3" "V4" ...
 $ w          : num [1:31] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "class")= chr "kde"

我确实找到了较早版本的package documentaion，将其作为4d执行的工作示例提供了，所以我认为我的努力基本上是相同的，只是取模不同的尺寸:

I did find an earlier version of the package documentaion that offered this as a worked example of a 4d execution, sot I think my effort is essentially the same, modulo different dimensions:

data(iris)
   ir <- iris[,1:4][iris[,5]=="setosa",]
   H.scv <- Hscv(ir)
   fhat <- kde(ir, H.scv, eval.points=ir)

这篇关于使用"kde"函数的R中的5-D内核密度估计的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！