本文介绍了子列数据框由列名的复杂模式组成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我有一个如下所示的数据集: $ b 两轮数据( code>和 .t1 ) 多个比例( this 和即) 每个比例有几项( 1 , 22 , 22a ) 要忽略的几个变量( v2 , v3 , ignore.t0 , ignore.t1 , this.t0 , this.t1 , that.t0 。 dat v2 = rnorm(10), v3 = rnorm(10), ignore.t0 = rnorm(10), this.t0 = rnorm(10), this1.t0 = rnorm(10), this22.t0 = rnorm(10), this22a.t0 = rnorm(10), that.t0 = rnorm(10), th at1.t0 = rnorm(10), that22.t0 = rnorm(10), that22a.t0 = rnorm(10), ignore.t1 = rnorm(10), this.t1 = rnorm(10), this1.t1 = rnorm(10), this22.t1 = rnorm(10), this22a.t1 = rnorm(10), that.t1 = rnorm(10), that1.t1 = rnorm(10), that22.t1 = rnorm(10), that22a.t1 = rnorm(10 )) 我想要子数据框包含 id ,并且只有以下列: 比例名称( this 或即)AND 数字( 1。)或数字和在期限前的信件( 22a。) 所以最后,数据帧应该如下所示: dat2 id = seq(from = 1, Ť (10),#v3 = rnorm(10),#ignore.t0 = rnorm(10), = this.t0 = rnorm(10), this1.t0 = rnorm(10), this22.t0 = rnorm(10), this22a.t0 = rnorm(10), #that.t0 = rnorm(10), that1.t0 = rnorm(10), that22.t0 = rnorm(10), that22a.t0 = rnorm( 10),#ignore.t1 = rnorm(10),#this.t1 = rnorm(10), this1.t1 = rnorm(10), this22。 t1 = rnorm(10), this22a.t1 = rnorm(10),#that.t1 = rnorm(10), that1.t1 = rnorm(10), that22.t1 = rnorm(10), that22a.t1 = rnorm(10)) , this.t1 , that.t0 和 that.t1 会被抓到。 #不太正确 dat2 $ id< - dat $ id scale< - c(this,that) keep.index < - grep(paste(scales,collapse =|),names(dat)) temp< - dat [keep.index] dat2< - cbind(dat2,temp) 如何修改grep模式以在期间之前查找数字OR(数字和字符)?或者还有更好的方法吗?解决方案这适用于您的示例: dat [c(id,grep((this | that)\\d + [az]?\\),names(dat) ,value = TRUE))] 其中: > [az]?用于零或一个小写字母 \\。点 如果您想为各种比例动态构建模式,你可以这样做: $ p $ scale pattern dat [c(id,grep(pattern, (dat),value = TRUE))] I have a dataset that looks like the following:two rounds of data (.t0 and .t1)multiple scales (this and that)several items per scale (1, 22, 22a)several variables to ignore (v2, v3, ignore.t0, ignore.t1, this.t0, this.t1, that.t0, that.t1).dat <- data.frame(id = seq(from=1, to=10, by=1), v2 = rnorm(10), v3 = rnorm(10), ignore.t0 = rnorm(10), this.t0 = rnorm(10), this1.t0 = rnorm(10), this22.t0 = rnorm(10), this22a.t0 = rnorm(10), that.t0 = rnorm(10), that1.t0 = rnorm(10), that22.t0 = rnorm(10), that22a.t0 = rnorm(10), ignore.t1 = rnorm(10), this.t1 = rnorm(10), this1.t1 = rnorm(10), this22.t1 = rnorm(10), this22a.t1 = rnorm(10), that.t1 = rnorm(10), that1.t1 = rnorm(10), that22.t1 = rnorm(10), that22a.t1 = rnorm(10))I want to subset the data frame to include id and only columns with:the scale name (this or that) ANDa number (1.) OR a number and letter (22a.) before the periodSo in the end, the data frame should look like:dat2 <- data.frame( id = seq(from=1, to=10, by=1), #v2 = rnorm(10), #v3 = rnorm(10), #ignore.t0 = rnorm(10), #this.t0 = rnorm(10), this1.t0 = rnorm(10), this22.t0 = rnorm(10), this22a.t0 = rnorm(10), #that.t0 = rnorm(10), that1.t0 = rnorm(10), that22.t0 = rnorm(10), that22a.t0 = rnorm(10), #ignore.t1 = rnorm(10), #this.t1 = rnorm(10), this1.t1 = rnorm(10), this22.t1 = rnorm(10), this22a.t1 = rnorm(10), #that.t1 = rnorm(10), that1.t1 = rnorm(10), that22.t1 = rnorm(10), that22a.t1 = rnorm(10))The data frame is much bigger than what is represented here, so typing the column indices is not possible. It's also not possible to just look for the scale names because this.t0, this.t1, that.t0, and that.t1 would be caught.# not quite rightdat2$id <- dat$idscales <- c("this", "that")keep.index <- grep(paste(scales,collapse="|"), names(dat))temp <- dat[keep.index]dat2 <- cbind(dat2, temp)How can I modify the grep pattern to look for a number OR (number and character) before the period? Or is there a better approach all together? 解决方案 This works for your example:dat[c("id", grep("(this|that)\\d+[a-z]?\\.", names(dat), value = TRUE))]where:\\d+ is for one or more digits[a-z]? is for zero or one lowercase letter\\. is for the dotIf you want to build a pattern dynamically for various scales, you can do:scales <- c("this", "that")pattern <- sprintf("(%s)\\d+[a-z]?\\.", paste(scales, collapse = "|"))dat[c("id", grep(pattern, names(dat), value = TRUE))] 这篇关于子列数据框由列名的复杂模式组成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
11-02 09:32