本文介绍了就地子集原子向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

继续子集大vector 使用了不必要的大量内存 :

给定一个原子向量,例如

Given an atomic vector, for example

x <- rep_len(1:10, 1e7)

如何使用 Rcpp 就地修改 x 以通过数字索引删除元素?在 R 中,可以这样做,但不能就地(即不复制 x):

How can I modify x in-place to remove elements by numeric index using Rcpp? In R, one can do this, but not in-place (i.e. without duplicating x):

idrops <- c(5, 4, 9)
x <- x[-idrops]

一种合理有效的方法如下:

A reasonably efficient way to do this would be the following:

IntegerVector dropElements(IntegerVector x, IntegerVector inds) {
  R_xlen_t n = x.length();
  R_xlen_t ndrops = inds.length();
  IntegerVector out = no_init(n - ndrops);
  R_xlen_t k = 0; // index of out
  for (R_xlen_t i = 0; i < n; ++i) {
    bool drop = false;
    for (R_xlen_t j = 0; j < ndrops; ++j) {
      if (i == inds[j]) {
        drop = true;
        break;
      }
    }
    if (drop) {
      continue;
    }
    out[k] = x[i];
    ++k;
  }
  return out;
}

虽然这几乎没有到位(它也不是很安全,但这无关紧要).我知道 STL 的 .erase(),虽然看起来 Rcpp 在转换为 STL 之前设计了一个副本.

though this is hardly in-place (it's also not very safe but that's beside the point). I'm aware of STL's .erase(), though it appears that Rcpp by design makes a copy before converting to STL.

推荐答案

您链接到的问题在 Rcpp 中稍微简单一点,并且是单行的,但是您可以通过循环负索引向量和子集来实现高效的负索引数据的范围.例如:

The question you linked to was a bit simpler and a one-liner in Rcpp, but you can implement efficient negative indexing by looping over your negative index vector and subsetting ranges of the data. E.g.:

#include <Rcpp.h>
using namespace Rcpp;

// solution for the original question
// [[Rcpp::export]]
IntegerVector popBeginningOfVector(IntegerVector x, int npop) {
  return IntegerVector(x.begin() + npop, x.end());
}

// [[Rcpp::export]]
IntegerVector efficientNegativeIndexing(IntegerVector x, IntegerVector neg_idx) {
  std::sort(neg_idx.begin(), neg_idx.end());
  size_t ni_size = neg_idx.size();
  size_t xsize = x.size();
  int * xptr = INTEGER(x);
  int * niptr = INTEGER(neg_idx);
  size_t xtposition = 0;
  IntegerVector xt(xsize - ni_size); // allocate new vector of the correct size
  int * xtptr = INTEGER(xt);
  int range_begin, range_end;
  for(size_t i=0; i < ni_size; ++i) {
    if(i == 0) {
      range_begin = 0;
    } else {
      range_begin = neg_idx[i-1];
    }
    range_end = neg_idx[i] - 1;
    // std::cout << range_begin << " " << range_end << std::endl;
    std::copy(xptr+range_begin, xptr+range_end, xtptr+xtposition);
    xtposition += range_end - range_begin;
  }
  std::copy(xptr+range_end+1, xptr + xsize, xtptr+xtposition);
  return xt;
}

用法:

library(Rcpp)
sourceCpp("~/Desktop/temp.cpp")

x <- rep_len(1:10, 1e7)
idrops <- c(5, 4, 9)
outputR <- x[-idrops]
outputRcpp <- efficientNegativeIndexing(x, idrops)
identical(outputRcpp, outputR)

library(microbenchmark)
microbenchmark(efficientNegativeIndexing(x, idrops), x[-idrops], times=10)

这篇关于就地子集原子向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-17 01:08