本文介绍了优化循环使用并行数组ř的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我有一个数组数据=阵列[1:50,1:50,1:50]的数值里面是实数-1之间,1

I have an array data = array[1:50,1:50,1:50] the values inside are real numbers between -1, 1.

数据可以被视为立方体50x50x50。

"Data" could be treated as cube 50x50x50.

我要创建基于此公式的相关矩阵(除去全0)=>

I need to create a correlation matrix (removing all zeros) based on this equation =>

值=(X + Y) - | X-Y |和矩阵大小为2倍的可能组合(50x50x50)*((50x50x50)-1)/ 2 = 7.812.437.500此2倍=相关矩阵

value = (x+y)-|x-y| and the matrix size is 2 times the possible combinations (50x50x50)*((50x50x50)-1)/2 = 7.812.437.500 this 2 times = correlation matrix.

我这样做:

假设我们有3x3x3的:

Lets say we have 3x3x3:

arr = array(rnorm(10), dim=c(3,3,3))

data = data.frame(array(arr))


data$voxel <- rownames(data)

#remove zeros
data<-data[!(data[,1]==0),]

rownames(data) = data$voxel

data$voxel = NULL


#######################################################################################
#Create cluster

no_cores <- detectCores() #- 1

clus <- makeCluster(no_cores)

clusterExport(clus, list("data") , envir=environment())

clusterEvalQ(clus,
             compare_strings <- function(j,i) {
               value <- (data[i,]+data[j,])-abs(data[i,]- data[j,])
               pair <- rbind(rownames(data)[j],rownames(data)[i],value)
               return(pair)
             })

i = 0 # start 0
kk = 1
table <- data.frame()

ptm <- proc.time()

while(kk<nrow(data)) {

  out <-NULL
  i = i+1 # fix row
  j = c((kk+1):nrow(data)) # rows to be compared

  #Apply the declared function
  out = matrix(unlist(parRapply(clus,expand.grid(i,j), function(x,y) compare_strings(x[1],x[2]))),ncol=3, byrow = T)

  table <- rbind(table,out)

  kk = kk +1

}

proc.time() - ptm

结果是data.frame:

The result is data.frame:

v1  v2  v3
1   2   2.70430114250358
1   3   0.199941717684129
... up to 351 rows

但是这将需要数天...

but this will take days...

此外,我想为这个相关性创建一个矩阵:

Also I would like to create an matrix for this correlation:

   1                         2              3...
1  1                  2.70430114250358
2  2.70430114250358          1
3...

有没有更快的方式做到这一点?

Is there a faster way to do it?

感谢

推荐答案

有在code多项性能错误:

There are a number of performance mistakes in your code:


  1. 您循环时,你应该依靠矢量。

  2. 您在一个循环发展的对象。

  3. 您并行化循环的每一次迭代,而不是并行外部循环的。

您能避免这些问题,如果你避开了第一个问题。

You can avoid all these problems if you avoid the first problem.

显然,要行的每个组合进行比较。为此,您应该先把排索引的所有组合:

Apparently, you want to compare each combination of rows. For this you should first get all combinations of row indices:

combs <- t(combn(1:27, 2))

那么你可以申请比较函数这些:

Then you can apply the comparison function to these:

compare <- function(j,i, data) {
  as.vector((data[i,]+data[j,])-abs(data[i,]- data[j,]))
}

res <- data.frame(V1 = combs[,1], V2 = combs[,2],
                  V3 = compare(combs[,1], combs[,2], data))

现在,如果我们想检查是否这给了相同的结果作为code,我们首先需要解决您的输出。通过在矩阵Numerics的字符(rownames)相结合,你会得到一个字符矩阵并最终data.frame的列是所有字符。我们可以使用 type.convert 来修复之后(尽管它应该从一开始就避免):

Now, if we want to check if this gives the same result as your code, we first need to fix your output. By combining characters (the rownames) with numerics in a matrix, you get a character matrix and the columns of your final data.frame are all characters. We can use type.convert to fix that afterwards (although it should be avoided from the beginning):

table[] <- lapply(table, function(x) type.convert(as.character(x)))

现在我们看到的结果是一样的:

Now we can see that results are the same:

all.equal(res, table)
#[1] TRUE

如果你喜欢,你可以把结果变成稀疏矩阵:

If you like, you can turn the result into a sparse matrix:

library(Matrix)
m <- sparseMatrix(i = res$V1, j = res$V2, x = res$V3,
                  dims = c(27, 27), symmetric = TRUE)
diag(m) <- 1

这篇关于优化循环使用并行数组ř的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 10:39