具有这样的df:
df_in <- data.frame(x = c('x1','x2','x3','x4'),
col1 = c('http://youtube.com/something','NA','https://www.yahooexample.com','https://www.yahooexample2.com'),
col2 = c('https://google.com', 'http://www.bbcnews2.com?id=321','NA','https://google.com/text'),
col3 = c('http://www.bbcnews.com?id=321', 'http://google.com?id=1234','NA','https://bbcnews.com/search'),
col4 = c('NA', 'https://www.youtube/com','NA', 'www.youtube.com/searcht'))
在col1,col2和col3中,如何仅保留其中包含“google”或“youtube”或“bbc”其他内容的单元格,使该单元格NA成为可能?
预期输出示例:
x col1 col2 col3 col4
1 x1 http://youtube.com/something https://google.com http://www.bbcnews.com?id=321 NA
2 x2 NA http://www.bbcnews2.com?id=321 http://google.com?id=1234 https://www.youtube/com
3 x3 NA NA NA NA
4 x4 NA https://google.com/text https://bbcnews.com/search www.youtube.com/searcht
最佳答案
我们可以使用mutate_at
将列'col1'更改为'col4',并通过str_detect
检查其是否包含'google'或'youtube'或'bbc'并将其他元素替换为NA
library(dplyr)
library(stringr)
df_in %>%
mutate_at(vars(col1:col4), funs(ifelse(str_detect(.,
"google|youtube|bbc"), as.character(.), NA)))
-输出
# x col1 col2 col3 col4
# 1 x1 http://youtube.com/something https://google.com http://www.bbcnews.com?id=321 <NA>
# 2 x2 <NA> http://www.bbcnews2.com?id=321 http://google.com?id=1234 https://www.youtube/com
# 3 x3 <NA> <NA> <NA> <NA>
# 4 x4 <NA> https://google.com/text https://bbcnews.com/search www.youtube.com/searcht
关于r - 仅保留字符串中的值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48907934/