本文介绍了解析R中的数据,替代rbind()可以将其放入"for"目录中.循环将行写入新数据表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个名为YC的数据表,如下所示:

Let's say I have a data table called YC that looks like this:

Categories:           colsums:   tillTF:
ID: cat               NA         0 
  MA                  NA         0 
    spayed            NA         0
      declawed        NA         0 
        black         NA         0
          3           NA         0
            no        57         1
        claws         NA         0
          calico      NA         0
            4         NA         0
              no      42         1
           striped    NA         0
              0.5     NA         0
                yes   84         1
      not fixed       NA         0
         declawed     NA         0 
            black     NA         0 
              0.2     NA         0
                yes   19         1
              0.2     NA         0
                yes   104        1
  NH                  NA         0
    spayed            NA         0 
       claws          NA         0
          striped     NA         0
             12       NA         0 
               no     17         1
           black      NA         0
              4       NA         0
               yes    65         1
ID: DOG               NA         0 
 MA                   NA         0
...           

只有1)实际上不是数据透视表,它的格式不一致,看起来像是1和2)数据要复杂得多,并且在过去的几十年中输入的数据是不固定的.可以安全地对数据做出的唯一假设是,每个记录有12个变量,并且它们总是以相同的顺序输入.

Only it's 1) not actually pivot table, it's inconsistently formatted to look like one and 2) the data is much more complicated, and was entered inconstantly over the course of a few decades. The only assumption that can be safely made about the data is that there are 12 variables associated with each record, and they are always entered in the same order.

我的目标是解析此数据,以便每个属性和关联的数字记录都位于一行中的适当列中,如下所示:

My goal is to parse this data so that each attribute and associated numeric record are in in appropriate columns in a single row, like this:

Cat  MA  spayed    declawed  black    3    no  57
Cat  MA  spayed    claws     calico   0.5  no  42
Cat  MA  not fixed declawed  black    0.2  yes 19
Cat  MA  not fixed declawed  black    0.2  yes 104
Cat  NH  spayed    claws     striped  12   no  17
Cat  NH  spayed    claws     black    4    yes 65
Dog  MA ....

我编写了一个for循环,该循环标识一个记录",然后通过向后读数据表中的列直到达到另一个记录",来重新写入数组中的值.我是R的新手,所以在不知道是否可行的情况下写出了我的理想循环.

I've written a for loop which identifies a "record" and then re-writes values in an array by reading backwards up the column in the data table until another "record" is reached. I'm new to R, and so wrote out my ideal loop without knowing whether it was possible.

array<-rep(0, length(7))
    for (i in 1:7)
      if(YC$tillTF[i]==1){
        array[7]<-(YC$colsums[i])
        array[6]<-(YC$Categories[i])
        array[5]<-(YC$Categories[i-1])
        array[4]<-(YC$Categories[i-2])
        array[3]<-(YC$Categories[i-3])
        array[2]<-(YC$Categories[i-4])
        array[1]<-(YC$Categories[i-5])
      }

    YC_NT<-rbind(array)

填写array后,我想遍历YC并在YC_NT中为每个唯一记录创建一个新行:

Once array is filled in, I want to loop through YC and create a new row in YC_NT for each unique record:

for (i in 8:length(YC$tillTF))
  if (YC$tillTF[i]==1){
    array[8]<-(YC$colsums[i])
    array[7]<-(YC$Categories[i])
    if (YC$tillTF[i-1]==0){
      array[6]<-YC$Categories[i-1]
            }else{ 
              rbind(array, YC_NT)}
    if (YC$tillTF[i-2]==0){
      array[5]<-YC$Categories[i-2]
          }else{
            rbind(array, YC_NT)}
    if(YC$tillTF[i-3]==0){
      array[4]<-YC$Categories[i-3]
          }else{
            rbind(array, YC_NT)}
    if(YC$tillTF[i-4]==0){
      array[3]<-YC$Categories[i-4]
          }else{
            rbind(array, YC_NT)}
    if(YC$tillTF[i-5]==0){
      array[2]<-YC$Categories[i-5]
          }else{
            rbind(array, YC_NT)}
    if(YC$tillTF[i-6]==0){
      array[1]<-YC$Categories[i-6]
          }else{
            rbind(array, YC_NT)}
}else{ 
  array<-array}

当我在数据上的函数中运行此循环时,我得到的YC_NT数据表只包含一行.经过几天的搜索,我不知道有一个R函数可以将向量array添加到数据表的最后一行,而不必每次都给它一个唯一的名称.我的问题:

When I run this loop within a function on my data, I'm getting my YC_NT data table back containing a single row. After spending a few days searching, I don't know that there is an R function which would be able to add the vector array to last row of a data table without giving it a unique name every time. My questions:

1)是否有一个函数可以将称为array的向量添加到数据表的末尾而无需重新写入称为array的上一行?

1) Is there a function that would add a vector called array to the end of a data table without re-writing a previous row called array?

2)如果不存在这样的函数,每当我的for循环到达新的数字记录时,如何为array创建一个新名称?

2) If no such function exists, how could I create a new name for array every time my for loop reached a new numeric record?

感谢您的帮助,

推荐答案

所以我假设每次tillTF=1都会有一条新记录开始.并且为下一个主题指定的n变量只是最后一个n变量,先前的值都保持不变.我还假设所有记录都是完整的",因为最后一行是tillTF=1. (为使最后一个陈述正确,我从样本中删除了最后两行)

So I'm going to assume a new record begins every time tillTF=1. And that the n variables specified for the next subject are just the last n variables, the previous values all remain the same. I'm aslo assuming that all records are "complete" in that the last line is tillTF=1. (To make the last statement true, I removed the last two lines form your sample)

这就是我读取数据的方式

Here's how I might read the data in

dog <- read.fwf("dog.txt", widths=c(22,11,7), skip=1, stringsAsFactors=F)
dog$V1 <- gsub("\\s{2,}","",dog$V1)
dog$V2 < -gsub("\\s","",dog$V2)
dog$V3 <- as.numeric(gsub("\\s","",dog$V3))

因此,我在这里读取了数据,并删除了多余的空格.现在,我将添加一个ID列,为每个记录提供唯一的ID,并在每次tillTF=1时递增该值.然后,我将数据拆分到该ID值上

So I read in the data here and and strip off the extra spaces. Now I will add an ID column giving each record a unique ID and incrementing that value every time tillTF=1. Then i'll split the data on that ID value

dog$ID<-c(0, cumsum(dog$V3[-nrow(dog)]))
dv <- lapply(split(dog, dog$ID), function(x) {
    c(x$V1, x$V2[nrow(x)])}
)

现在,我将使用Reduce浏览列表,并每次将给定ID的最后一个n变量替换为n变量.

Now I'll go through the list with Reduce and each time replace the last n variables with the n variables for a given ID.

trans < -Reduce(function(a,b) {
    a[(length(a)-length(b)+1):length(a)] <- b
    a
}, dv, accumulate=T)

现在,我将所有数据与制表符放在一起,然后使用read.table处理数据并进行所有适当的数据转换并创建数据框

Now i'll put all the data together with tabs and then use read.table to process the data and do all the proper data conversions and create a data frame

dd<-read.table(text=sapply(a, paste0, collapse="\t"), sep="\t")

那给

# print(dd)
       V1 V2        V3       V4      V5   V6  V7  V8
1 ID: cat MA    spayed declawed   black  3.0  no  57
2 ID: cat MA    spayed    claws  calico  4.0  no  42
3 ID: cat MA    spayed    claws striped  0.5 yes  84
4 ID: cat MA not fixed declawed   black  0.2 yes  19
5 ID: cat MA not fixed declawed   black  0.2 yes 104
6 ID: cat NH    spayed    claws striped 12.0  no  17
7 ID: cat NH    spayed    claws   black  4.0 yes  65

因此,正如您所看到的,我将"ID:"保留为打开状态,但是剥离它应该很容易.但是这些命令可以为您进行基本的重塑.解决方案中的数组和if语句及绑定减少了,这很好,但是我鼓励您确保要理解每一行都可以理解.

So as you can see, I left the "ID:" on but it should be easy enough to strip that off. But these commands do the basic reshaping for you. There are fewer arrays and if statements and rbinding in the solution which is nice, but I encourage you to make sure you understand each line if you want to use it.

还请注意,我的输出与您的预期输出略有不同;您缺少"84"值,并且将带有"42"的印花布列为"0.5"而不是"4.0".因此,请让我知道我在解释数据或纠正示例输出方面是否有错.

Also note that my output is slightly different than your expected output; you are missing the "84" value and have the calico with "42" listed as "0.5" rather than "4.0". So let me know if I was wrong in how I interpreted the data or perhaps correct the example output.

这篇关于解析R中的数据,替代rbind()可以将其放入"for"目录中.循环将行写入新数据表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 16:21