问题描述
我想制作一个图表,显示聚类数据和相似度矩阵之间的相关性。
如何在R中做到这一点? R中是否有任何功能可以在此链接中创建类似于图片的图形?
(只是在Google上搜索并获得了显示我要生成的图形的链接)
I want to produce a graph that shows a correlation between clustered data and similarity matrix.How can I do this in R? Is there any function in R that creates the graph like a picture in this link? http://bp0.blogger.com/_VCI4AaOLs-A/SG5H_jm-f8I/AAAAAAAAAJQ/TeLzUEWbb08/s400/Similarity.gif (just googled and got the link that shows a graph that I want to produce)
谢谢。
推荐答案
@Chase和@ bill_080的注释中建议的一般解决方案需要
The general solutions suggested in the comments by @Chase and @bill_080 need a little bit of enhancement to (partially) fulfil the needs of the OP.
可重现的示例:
require(MASS)
set.seed(1)
dat <- data.frame(mvrnorm(100, mu = c(2,6,3),
Sigma = matrix(c(10, 2, 4,
2, 3, 0.5,
4, 0.5, 2), ncol = 3)))
使用Eucildean距离计算标准化数据的相异性矩阵
Compute the dissimilarity matrix of the standardised data using Eucildean distances
dij <- dist(scale(dat, center = TRUE, scale = TRUE))
,然后使用组平均值方法计算这些数据的分层聚类
and then calculate a hierarchical clustering of these data using the group average method
clust <- hclust(dij, method = "average")
下一步我们根据树状图形成3个('k')组来计算样本的顺序,但是我们可以在此处选择其他内容。
Next we compute the ordering of the samples on basis of forming 3 ('k') groups from the dendrogram, but we could have chosen something else here.
ord <- order(cutree(clust, k = 3))
下一步根据树状图计算样本之间的差异,共生距离:
Next compute the dissimilarities between samples based on dendrogram, the cophenetic distances:
coph <- cophenetic(clust)
以下是3张图像:
- 根据聚类分析分组排序的原始不相似矩阵,
- 相同距离,再次按上述排序
- 原始差异关系和隐性距离
- 类似Shepard的图,比较了原始距离和隐性距离;捕获原始距离的聚类越好,这些点越接近1:1线
,
- The original dissimilarity matrix, sorted on basis of cluster analysis groupings,
- The cophenetic distances, again sorted as above
- The difference between the original dissimilarities and the cophenetic distances
- A Shepard-like plot comparing the original and cophenetic distances; the better the clustering at capturing the original distances the closer to the 1:1 line the points will lie
以下是生成的代码以上情节
Here is the code that produces the above plots
layout(matrix(1:4, ncol = 2))
image(as.matrix(dij)[ord, ord], main = "Original distances")
image(as.matrix(coph)[ord, ord], main = "Cophenetic distances")
image((as.matrix(coph) - as.matrix(dij))[ord, ord],
main = "Cophenetic - Original")
plot(coph ~ dij, ylab = "Cophenetic distances", xlab = "Original distances",
main = "Shepard Plot")
abline(0,1, col = "red")
box()
layout(1)
这是在活动设备上生成的:
Which produces this on the active device:
尽管如此, ,只有Shepard图显示了聚类数据和[非相似度矩阵之间的相关性,即n图像图(水平图)。您打算如何为同义和原始[dis]相似性的所有成对比较计算两个数字之间的相关性?
Having said all that, however, only the Shepard plot shows the "correlation between clustered data and [dis]similarity matrix", and that is not an image plot (levelplot). How would you propose to compute the correlation between two numbers for all pairwise comparisons of cophenetic and original [dis]similarities?
这篇关于在R中,如何在对数据进行聚类后绘制相似性矩阵(如框图)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!