用R中的区间数据创建散点图

本文介绍了用R中的区间数据创建散点图的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！这个问题的答案可能不仅仅是显而易见，但我无法摆脱困境（或者说，我认为我知道一个解决方案，但对我来说似乎很复杂），所以我想我应该寻求帮助。我的数据如下所示： MyItem测量第一个最后项目1 10 267.4 263.2 项目2 15 263.2 254.8 项目3 3 250.5 250.5 项目4 20 266.9 253.2 项目5 16 260.0 250.0 我的第一项测量在267.5到263.2（任意时间单位，可能是秒，年，...）时有效。第二项的测量值从263.2到254.8有效，依此类推。我想在R中创建一个图表，其中x轴表示时间，而y轴表示我们的测量结果。时间应该以长度为1的间隔进行划分。如果我们的测量间隔与x轴的时间间隔重叠，则数据点应出现在我们的图中（在x轴上的时间间隔的中间）。举一个例子：假设我们的x轴从269开始，到249结束。我们在x轴上的第一个时间间隔从269到268.我们的测量值都没有进入这个时间间隔，因此没有绘制数据点。我们在x轴上的第二个时间间隔从268到267.在此时间间隔内已记录Item1的测量值。因此，数据点绘制在我们的时间间隔268-267中，y = 10（我们的测量结果）和x = 267.5（我们的时间间隔268-267的中点）。我们的第三个时间间隔从267到266.我们的两个测量属于这个时间间隔，即Item1和Item4。因此，应绘制两个数据点，坐标y = 10，x = 266.5（Item1），y = 20，x = 266.5（Item4）。我们继续如此处理我们的其余数据。不幸的是，我还没有发现一个智能功能/软件包可以在R中执行此操作 - 通常您可以只为y轴提供一个值（这是有道理的，否则x值的映射变得模糊不清） - 但我确定一定有一些东西。我认为通过使用seq（），我可以为每个时间步骤创建虚拟值（例如，Item1的虚拟值将是267.5,266.5,265.5,264.5,263.5 - 所有这些都与y = 10相关联），并添加这些值到我的数据。但在我看来，这是一个非常复杂的解决方案，远不够优雅。我确信必须有一个简单而优雅的方式来做到这一点，但我可以没有拿出来。我甚至不知道，我应该寻找什么 - 我认为你会看到这个问题出现在时间序列分析中，但似乎并非如此。我不想做的是从时间间隔的开始到结束之间的平均时间（例如，Item1 267.5 + 263.2 / 2 = 265.35）。如果可能的话，我想用ggplot2绘制散点图（但我会采取任何解决方案），然后通过绘制的数据点绘制一条线。谢谢提前寻求任何帮助！解决方案。但我也不认为这是一种非常不雅的策略 - 但也许我们在这一点上不同意。这里有一个简短的解决方案，使用lapply（）和rbind生成一个长版本的数据：＃Convert data.frame在MyItem上分割 dl< - split（df，df $ MyItem）＃对于每个项目，创建一个包含测量结果和间隔序列的数据框 lapply_output< - lapply（dl，function（item）{ out_df< - data.frame（'MyItem'= item $ MyItem，'Measurement'= item $ Measurement，'Interval'= seq（floor（item $ First），floor（item $ Last））+ 0.5） return（out_df）}）＃获取数据框列表和将它们绑定在一起 long_data< - do.call（rbind，lapply_output）＃使用ggplot的p 也许其他人有更快的解决方案，使用许多封装格式之一来重新格式化数据框。 The answer to this question is probably more than obvious, but I just cannot get my head around (or rather, I think I know a solution, but it appears to complicated to me), so I thought I should ask for help.My data looks like this:MyItem Measurement First LastItem1 10 267.4 263.2Item2 15 263.2 254.8Item3 3 250.5 250.5Item4 20 266.9 253.2Item5 16 260.0 250.0My measurement for the first item is valid for the time 267.5 to 263.2 (arbitrary time units; could be seconds, years, ...). The measurement for the second item is valid from 263.2 to 254.8 and so on.I would like to create a plot in R, where the x-axis represents time and the y-axis represents our measurements. Time should be divided in intervals of length 1. If the interval of our measurements overlaps with the time interval of the x-axis, a data point should appear in our plot (in the middle of the time interval on the x-axis).To give an example: Let's assume that our x-axis starts at 269 and ends at 249.Our first time interval on the x-axis goes from 269 to 268. None of our measurements falls into this time interval, therefore no data point is plotted.Our second time interval on the x-axis goes from 268 to 267. A measurement for Item1 has been recorded for this time interval. Therefore a data point is plotted in our time interval 268-267, with y=10 (our measurement) and x=267.5 (midpoint of our time interval 268-267).Our third time interval goes from 267 to 266. Two of our measurements fall into this time interval, namely Item1 and Item4. Therefore, two data points should be plotted, with the coordinates y=10, x=266.5 (Item1) and y=20, x=266.5 (Item4).We proceed like this for the rest of our data.Unfortunately I haven't found a smart function/package to do this in R - usually you can only supply one value for the y-axis (which makes sense, as otherwise the mapping of your x-value becomes ambiguous) - but I'm sure there must be something. I thought that by using seq() I could create dummy values for every single time step (e.g., dummy values for Item1 would be 267.5, 266.5, 265.5, 264.5, 263.5 - all of them associated with y=10) and add those values to my data. But this appears to me as a very complicated solution, far from being elegant.I'm sure there must be an easy and elegant way of doing this, but I can't come up with it. I don't even know, what I should look for - I thought you would see this issue come up in time series analyses, but that does not appear to be the case. What I do NOT want to do, is to take the mean time between the begin and the end of the time interval (e.g., for Item1 267.5+263.2/2 = 265.35).If possible I would like to plot the scatter plot with ggplot2 (but I take any solution) and then fit a line through my plotted data points.Thanks in advance for any help! 解决方案 I'm at loss for a solution that does not involve transforming your data to "long" data. But I also don't think it is particularly inelegant as a tactic - but maybe we disagree on that point. Here's a quick, short solution using lapply() and rbind to generate a long version of your data:# Convert data.frame to list, split on MyItemdl <- split(df, df$MyItem)# For each item, create a data frame with the measurements and a sequence of the intervalslapply_output <- lapply(dl, function(item){ out_df <- data.frame('MyItem' = item$MyItem, 'Measurement' = item$Measurement, 'Interval' = seq(floor(item$First), floor(item$Last))+ 0.5) return(out_df)})# Take the list of data frames and bind them togetherlong_data <- do.call(rbind, lapply_output)# Plot using ggplotp <- ggplot(long_data, aes(Interval, MyItem)) + geom_point()Perhaps someone else has a quicker solution using one of the many packages made for reformatting data frames. 这篇关于用R中的区间数据创建散点图的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！