本文介绍了请SAS PRX提取子串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 SAS PRX 函数从我的数据集中提取子字符串.但它只返回精确匹配,而我需要它更灵活并提取匹配各种条件的那些.

I am trying to use the SAS PRX function to extract a substring from my dataset. But it only returns the exact matches, whereas I need it to be more flexible and extract those that match a variety of conditions.

我已经复制了下面的数据.如您所见,我的数据中的一个变量是brandmodel",其中包含特定相机的品牌名称和型号#.我需要一个单独的列只用于模型#s.所以我使用 PRX 函数来提取它们,因为它们通常遵循以下模式之一:

I have copied my data below. As you can see, one of the variables in my data is "brandmodel" which contains both the brand name and the model# of a particular camera. I need to have a separate column just for the model#s. So I am using the PRX function to extract them as they usually follow one of the following patterns:

例如:JX100 或 JX10 或 JX1(即 1-2 个字母,后跟 1-3 个数字.这是我的程序(复制在数据下方)可以处理的.但我遇到的问题是:如何提取那些模型#的字母与数字之间用空格或连字符隔开,我如何将它们提取到与它们完全相同的模型"列中?此外,一些观察结果没有模型#s,我如何才能将它们设置为缺失而不是完全删除?

For example: JX100 or JX10 or JX1 (i.e., 1-2 alphabets, followed immediately by 1-3 digits. This my program (copied below the data) can handle. But where I run into problems is: how to extract those model#'s where the alphabets are separated from the digits by a space or a hyphen, and how do I extract those into the same column "Model" as those with them altogether? Also, some of the observations do not have model#s, how can I get them to be set to missing instead of being dropped altogether?

Brandmodel|Price

iTwist F124 Digital Camera -red|49.00
Vivitar IF045 Digital Camera -Blue|72.83
Liquid Image Underwater Camera Mask|128.00
Impact Series Video Camera MX Gogglesâ„¢|188.00
Olympus VR 340  Silver|148.00
Olympus TG820 Digital Camera Black|278.00
Olympus VR 340 16MP 10x 3.0 LCD Red|148.00
Vivitar VX137-Pur Digital Camera|39.00

Olympus SZ-12 Digital Camera -Black|198.00
Olympus VG160 Digital Camera Red|98.00
Olympus VR340   Purple|148.00
Olympus TG820 Digital Camera Silver|298.00
Olympus TG820 Digital Camera Blue|278.00
Olympus VG160 Digital Camera    Orange|98.00
Olympus TG820 Digital Camera Red|298.00
Fujifilm FinePix AX500 Red|78.63
Canon A2300 Silver|98.63
Canon A810 Red|75.00
Nikon Coolpix S2600 Digital Camera - Red|88.00
Nikon Coolpix L25 Digital Camera - Silver|82.00
Casio Exilim ZS10BK|128.00

Olympus TG-310 14 MP blue Digital Camera|148.00
Hipstreet Kidz Digital Camera - Blue|14.93
Casio Exilim ZS10PK|128.00
Olympus TG-310 14 MP Digital Camera orange|148.00

SAS 计划

data walnov21p2; 
 length brandmodel $ 80;
 infile "G:\File2\data\store_nov21\storenv21p2.csv" firstobs=2 dlm="|" dsd;
 input brandmodel price;
 re= prxparse('/[[:alpha:]]{1,3} \d{1,4}/');
 if prxmatch(re, brandmodel) then
 do;
   model=prxposn(re, 0, brandmodel);
   output;
 end;
run;

推荐答案

对于您的最后一个问题(将变量设置为缺失而不是删除观察,从条件 do 中删除 output 语句 末尾.只需将其更改为:

For your very last question (set variable to missing rather than dropping observation, remove the output statement from the conditional do at the end. Just change it to:

if prxmatch(re, brandmodel) then model=prxposn(re, 0, brandmodel);

这将导致输出所有观察值,无论是否定义了模型.

This will cause all observations to be output, regardless of whether model is defined.

对于您的其余问题,它实际上是关于与 Perl 正则表达式的模式匹配,而不是特定于 SAS.这也很棘手,因为某些模型中有空格.尝试发布一个不同的问题,询问与您想要的匹配的 Perl 正则表达式(带有这些标签).

For the rest of your question, it is really about pattern matching with Perl regular expressions, and is not specific to SAS. It's also a tricky because some models have spaces in them. Try posting a different question asking about the Perl regular expression (with those tags) that would match what you want.

另外,发布一些你想要的输出的例子.例如,您对这样的输入有何期望:

Also, post some examples of what you want the output to be. For example, what do you expect for input like this:

Olympus VR 340 16MP 10x 3.0 LCD Red|148.00 
Vivitar VX137-Pur Digital Camera|39.00

这篇关于请SAS PRX提取子串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 13:40