在R中，如何在对数据进行聚类后绘制相似性矩阵（如框图）？

本文介绍了在R中，如何在对数据进行聚类后绘制相似性矩阵（如框图）？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想制作一个图表，显示聚类数据和相似度矩阵之间的相关性。
如何在R中做到这一点？ R中是否有任何功能可以在此链接中创建类似于图片的图形？
（只是在Google上搜索并获得了显示我要生成的图形的链接）

I want to produce a graph that shows a correlation between clustered data and similarity matrix.How can I do this in R? Is there any function in R that creates the graph like a picture in this link? http://bp0.blogger.com/_VCI4AaOLs-A/SG5H_jm-f8I/AAAAAAAAAJQ/TeLzUEWbb08/s400/Similarity.gif (just googled and got the link that shows a graph that I want to produce)

谢谢。

推荐答案

@Chase和@ bill_080的注释中建议的一般解决方案需要

The general solutions suggested in the comments by @Chase and @bill_080 need a little bit of enhancement to (partially) fulfil the needs of the OP.

可重现的示例：

require(MASS)
set.seed(1)
dat <- data.frame(mvrnorm(100, mu = c(2,6,3), 
                          Sigma = matrix(c(10,   2,   4,
                                            2,   3, 0.5,
                                            4, 0.5,   2), ncol = 3)))

使用Eucildean距离计算标准化数据的相异性矩阵

Compute the dissimilarity matrix of the standardised data using Eucildean distances

dij <- dist(scale(dat, center = TRUE, scale = TRUE))

，然后使用组平均值方法计算这些数据的分层聚类

and then calculate a hierarchical clustering of these data using the group average method

clust <- hclust(dij, method = "average")

下一步我们根据树状图形成3个（'k'）组来计算样本的顺序，但是我们可以在此处选择其他内容。

Next we compute the ordering of the samples on basis of forming 3 ('k') groups from the dendrogram, but we could have chosen something else here.

ord <- order(cutree(clust, k = 3))

下一步根据树状图计算样本之间的差异，共生距离：

Next compute the dissimilarities between samples based on dendrogram, the cophenetic distances:

coph <- cophenetic(clust)

以下是3张图像：

根据聚类分析分组排序的原始不相似矩阵，

相同距离，再次按上述排序

原始差异关系和隐性距离

类似Shepard的图，比较了原始距离和隐性距离；捕获原始距离的聚类越好，这些点越接近1：1线

The original dissimilarity matrix, sorted on basis of cluster analysis groupings,
The cophenetic distances, again sorted as above
The difference between the original dissimilarities and the cophenetic distances
A Shepard-like plot comparing the original and cophenetic distances; the better the clustering at capturing the original distances the closer to the 1:1 line the points will lie

以下是生成的代码以上情节

Here is the code that produces the above plots

layout(matrix(1:4, ncol = 2))
image(as.matrix(dij)[ord, ord], main = "Original distances")
image(as.matrix(coph)[ord, ord], main = "Cophenetic distances")
image((as.matrix(coph) - as.matrix(dij))[ord, ord], 
      main = "Cophenetic - Original")
plot(coph ~ dij, ylab = "Cophenetic distances", xlab = "Original distances",
     main = "Shepard Plot")
abline(0,1, col = "red")
box()
layout(1)

这是在活动设备上生成的：

Which produces this on the active device:

尽管如此，，只有Shepard图显示了聚类数据和[非相似度矩阵之间的相关性，即n图像图（水平图）。您打算如何为同义和原始[dis]相似性的所有成对比较计算两个数字之间的相关性？

Having said all that, however, only the Shepard plot shows the "correlation between clustered data and [dis]similarity matrix", and that is not an image plot (levelplot). How would you propose to compute the correlation between two numbers for all pairwise comparisons of cophenetic and original [dis]similarities?

这篇关于在R中，如何在对数据进行聚类后绘制相似性矩阵（如框图）？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！