我想找到在 pig 对阵每支球队中得分最多的球员。

Input : Inputs are in the below fashion
Sachin 100 KXIP Hyderabad 1991 sehwag 150 KXIP Hyderabad 1991 Sehwag 100 MI Mumbai 2011 Kohli 0 CSK Chennai 2014 Dhoni 150 MI Hyderabad 1991 Sachin 32 PW Chennai 2014 Dhoni 150 MI Mumbai 2011 我的实现: record1= LOAD 'ipl.txt' using PigStorage(' ') as (name:chararray,runs:int,team:chararray,loc:chararray,year:int); record2 = GROUP record1 by team as team; record3 = FOREACH record2 GENERATE group,MAX(record1.runs) as mx; record4= ORDER record3 by mx ASC; DUMP record4;

输出:
(PW,32)
(KXIP,150)
(MI 150)

但是以以下方式期待结果。
Sachin PW 32钦奈2014

最佳答案

record1= LOAD 'ipl.txt' using PigStorage(' ') as    (name:chararray,runs:int,team:chararray,loc:chararray,year:int);
record2 = GROUP record1 by team;
record3 = FOREACH record2 GENERATE group,MAX(record1.runs) as mx;
record4 = JOIN record3 by (mx,group) LEFT OUTER, record1 by (runs,team);
record5 = FOREACH record4 GENERATE record1::name as name, record1::team as team, record3::mx as mx, record1::year as year;
record6= ORDER record5 by mx ASC;
DUMP record6;

产生以下结果
(Kohli,CSK,0,2014)
(Sachin,PW,32,2014)
(sehwag,KXIP,150,1991)
(Dhoni,MI,150,1991)
(Dhoni,MI,150,2011)

请注意,Dhoni有两条记录,这是因为他两次获得150分。如果要删除,则需要根据需要选择最早或最近的年份。

关于hadoop - 想要在使用 pig 的记录中找到最大记录,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/23925337/

10-12 23:01