本文介绍了本机Impala UDF(Cpp)在同一表中为同一查询中的多个调用随机提供结果为NULL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有两个功能的本机Impala UDF(Cpp)
这两个功能是互为补充的。

I have a Native Impala UDF (Cpp) with two functionsBoth functions are complimentary to each other.

String myUDF(BigInt)
BigInt myUDFReverso(String)

myUDF( myInput)给出一些输出,当 myUDFReverso(myUDF( myInput))应该返回 myInput

myUDF("myInput") gives some output which when myUDFReverso(myUDF("myInput")) should give back myInput

当我在这样的镶木桌子上运行黑斑羚查询时,

When I run a impala query on a parquet table like this,

从my_parquet_table顺序中按column1 LIMIT 10选择column1,myUDF(column1),length(myUDF(column1)),myUDFreverso(myUDF(column1));

输出随机为NULL。

The output is NULL at random.

输出是在第一次运行时说的,

The output is say at 1st run as ,

+------------+----------------------+------------------------+-------------------------------------+
| column1    | myDB.myUDF(column1)  | length(myUDF(column1)) | myDB.myUDFReverso(myUDF(column1))   |
+------------+----------------------+------------------------+-------------------------------------+
| 27011991   | 1.0.128.9            | 9                      | 27011991                            |
| 27011991   | 1.0.128.9            | 9                      | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | 14022013                            |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
+------------+----------------------+------------------------+-------------------------------------+

并假设在第二次运行中,

and suppose on the 2nd run,

+------------+----------------------+------------------------+-------------------------------------+
| column1    | myDB.myUDF(column1)  | length(myUDF(column1)) | myDB.myUDFReverso(myUDF(column1))   |
+------------+----------------------+------------------------+-------------------------------------+
| 27011991   | 1.0.128.9            | 9                      | 27011991                            |
| 27011991   | 1.0.128.9            | 9                      | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | 14022013                            |
| 14022013   | 1.0.131.239          | 11                     | 14022013                            |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
| 14022013   | 1.0.131.239          | 11                     | 14022013                            |
| 14022013   | 1.0.131.239          | 11                     | NULL                                |
+------------+----------------------+------------------------+-------------------------------------+

有时它也会为所有行提供正确的值。

And sometimes it gives the correct value for all rows too.

我已经在 Impala v1.2.4 v2.1
上对此进行了测试?某些内存问题?

I have tested this on Impala v1.2.4 as well as v2.1What is the cause of this? Some memory issue?

编辑1:

BigIntVal myUDF(FunctionContext* context, const StringVal& myInput)
{
  if (myInput.is_null) return BigIntVal::null();

  unsigned int temp_op= 0;
  unsigned long result= 0;
  uint8_t *p;
  char c= '.';

  p=myInput.ptr;

  while (*p != '\0')
  {
    c= *p++;
    int digit= c*2;

    if (digit >= 22 && digit <= 31)
    {
      if ((temp_op= temp_op * 10 - digit) > 493)
      {
        return BigIntVal::null();
      }
    }
    else if (c == '.')
    {
      result= (result << 8) + (unsigned long) temp_op;
      temp_op= 0;
    }
    else
    {
      return BigIntVal::null();
    }
  }

  return BigIntVal((result << 8) + (unsigned long) temp_op);
}

In .h file the macro lowerbytify is defined as

#define lowerbytify(T,A)        { *(T)= (char)((A));\
                                  *((T)+1)= (char)(((A) >> 8));\
                                  *((T)+2)= (char)(((A) >> 16));\
                                  *((T)+3)= (char)(((A) >> 24)); }

StringVal myUDFReverso(FunctionContext* context, const BigIntVal& origMyInput)
{
  if (origMyInput.is_null)
   return StringVal::null();

  int64_t myInput=origMyInput.val;
  char myInputArr[16];
  unsigned int l=0;

  unsigned char temp[8];
  lowerbytify(temp, myInput);

  char calc[4];
  calc[3]= '.';

  for (unsigned char *p= temp + 4; p-- > temp;)
  {
    unsigned int c= *p;
    unsigned int n1, n2;
    n1= c / 100;
    c-= n1 * 100;
    n2= c / 10;
    c-= n2 * 10;
    calc[0]= (char) n1 + '0';
    calc[1]= (char) n2 + '0';
    calc[2]= (char) c + '0';
    unsigned int length= (n1 ? 4 : (n2 ? 3 : 2));
    unsigned int point= (p <= temp) ? 1 : 0;

    char * begin = &calc[4-length];

    for(int step = length - point;step>0;step--,l++,begin++)
    {
        myInputArr[l]=*begin;
    }
   }

   myInputArr[l]='\0';

   StringVal result(context,l);
   memcpy(result.ptr, myInputArr,l);

    return result;
}


推荐答案

我不认为你可以假设字符串以null终止。您应该使用 StringVal :: len 遍历字符,而不是 while(* p!='\0')。另外,我建议在 impala-udf-samples github,请参见此示例

I don't think you can assume the string is null-terminated. You should use StringVal::len to iterate over the chars rather than while (*p != '\0'). Also, I'd recommend writing some unit tests using the UDF test framework in the impala-udf-samples github, see this example.

这篇关于本机Impala UDF(Cpp)在同一表中为同一查询中的多个调用随机提供结果为NULL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-11 15:04