优化循环使用并行数组ř

本文介绍了优化循环使用并行数组ř的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

限时删除！！

我有一个数组数据=阵列[1：50,1：50,1：50]的数值里面是实数-1之间，1

I have an array data = array[1:50,1:50,1:50] the values inside are real numbers between -1, 1.

数据可以被视为立方体50x50x50。

"Data" could be treated as cube 50x50x50.

我要创建基于此公式的相关矩阵（除去全0）=>

I need to create a correlation matrix (removing all zeros) based on this equation =>

值=（X + Y） - | X-Y |和矩阵大小为2倍的可能组合（50x50x50）*（（50x50x50）-1）/ 2 = 7.812.437.500此2倍=相关矩阵

value = (x+y)-|x-y| and the matrix size is 2 times the possible combinations (50x50x50)*((50x50x50)-1)/2 = 7.812.437.500 this 2 times = correlation matrix.

我这样做：

假设我们有3x3x3的：

Lets say we have 3x3x3:

arr = array(rnorm(10), dim=c(3,3,3))

data = data.frame(array(arr))


data$voxel <- rownames(data)

#remove zeros
data<-data[!(data[,1]==0),]

rownames(data) = data$voxel

data$voxel = NULL


#######################################################################################
#Create cluster

no_cores <- detectCores() #- 1

clus <- makeCluster(no_cores)

clusterExport(clus, list("data") , envir=environment())

clusterEvalQ(clus,
             compare_strings <- function(j,i) {
               value <- (data[i,]+data[j,])-abs(data[i,]- data[j,])
               pair <- rbind(rownames(data)[j],rownames(data)[i],value)
               return(pair)
             })

i = 0 # start 0
kk = 1
table <- data.frame()

ptm <- proc.time()

while(kk<nrow(data)) {

  out <-NULL
  i = i+1 # fix row
  j = c((kk+1):nrow(data)) # rows to be compared

  #Apply the declared function
  out = matrix(unlist(parRapply(clus,expand.grid(i,j), function(x,y) compare_strings(x[1],x[2]))),ncol=3, byrow = T)

  table <- rbind(table,out)

  kk = kk +1

}

proc.time() - ptm

结果是data.frame：

The result is data.frame:

v1  v2  v3
1   2   2.70430114250358
1   3   0.199941717684129
... up to 351 rows

但是这将需要数天...

but this will take days...

此外，我想为这个相关性创建一个矩阵：

Also I would like to create an matrix for this correlation:

   1                         2              3...
1  1                  2.70430114250358
2  2.70430114250358          1
3...

有没有更快的方式做到这一点？

Is there a faster way to do it?

感谢

推荐答案

有在code多项性能错误：

There are a number of performance mistakes in your code:

您循环时，你应该依靠矢量。

您在一个循环发展的对象。

您并行化循环的每一次迭代，而不是并行外部循环的。

您能避免这些问题，如果你避开了第一个问题。

You can avoid all these problems if you avoid the first problem.

显然，要行的每个组合进行比较。为此，您应该先把排索引的所有组合：

Apparently, you want to compare each combination of rows. For this you should first get all combinations of row indices:

combs <- t(combn(1:27, 2))

那么你可以申请比较函数这些：

Then you can apply the comparison function to these:

compare <- function(j,i, data) {
  as.vector((data[i,]+data[j,])-abs(data[i,]- data[j,]))
}

res <- data.frame(V1 = combs[,1], V2 = combs[,2],
                  V3 = compare(combs[,1], combs[,2], data))

现在，如果我们想检查是否这给了相同的结果作为code，我们首先需要解决您的输出。通过在矩阵Numerics的字符（rownames）相结合，你会得到一个字符矩阵并最终data.frame的列是所有字符。我们可以使用 type.convert 来修复之后（尽管它应该从一开始就避免）：

Now, if we want to check if this gives the same result as your code, we first need to fix your output. By combining characters (the rownames) with numerics in a matrix, you get a character matrix and the columns of your final data.frame are all characters. We can use type.convert to fix that afterwards (although it should be avoided from the beginning):

table[] <- lapply(table, function(x) type.convert(as.character(x)))

现在我们看到的结果是一样的：

Now we can see that results are the same:

all.equal(res, table)
#[1] TRUE

如果你喜欢，你可以把结果变成稀疏矩阵：

If you like, you can turn the result into a sparse matrix:

library(Matrix)
m <- sparseMatrix(i = res$V1, j = res$V2, x = res$V3,
                  dims = c(27, 27), symmetric = TRUE)
diag(m) <- 1

这篇关于优化循环使用并行数组ř的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..