本文介绍了根据另一个数据框中的列在一个数据框中应用正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框---表A是模式表,表B是名称表.我想对表 B 进行子集化,它与表 a 中的模式相匹配.

I have two data frames --- table A is the pattern table, and table B is the name table. I want to subset table B, where it matches the pattern in table a.

A <- data.frame(pattern = c("aa", "bb", "cc", "dd"))
B <- data.frame(name = "aa1", "bb1", "abc", "def" ,"ddd")

我正在尝试做一个 for 循环,如下所示:

I'm trying to do a for loop looks like:

for (i in 1:nrow(A)){
for (j in 1:nrow(B)){
DT <- data.frame(grep(A$pattern[i], B$name[j], ignore.case = T, value = T))
}}

我希望我的结果表 DT 只包含 aa1bb1ddd

And I want my resulting table DTto only contains aa1, bb1, and ddd

但是它超级慢.我只是想知道是否有更有效的方法来做到这一点?多谢!

But it's super slow. I just wondering if there's any more efficient way to do it? Many thans!

推荐答案

您的示例输入数据中似乎存在轻微错误(缺少的 B$name 未正确声明,需要包含 stringsAsFactors = F 对于两个 data.frame 对象):

it appears there's a slight error in your sample input data (missing B$name is not properly declared and need to include stringsAsFactors = F for both data.frame objects):

> A <- data.frame(pattern = c("aa", "bb", "cc", "dd"), stringsAsFactors = F)
> B <- data.frame(name = c("aa1", "bb1", "abc", "def" ,"ddd"), stringsAsFactors = F)

代码

# using sapply with grepl
> indices <- sapply(1:nrow(A), function(z) grepl(A$pattern[z], B$name[z]))
> indices
[1]  TRUE  TRUE FALSE FALSE

> B[indices, ]
[1] "aa1" "bb1" "ddd"

这篇关于根据另一个数据框中的列在一个数据框中应用正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 17:54