Spark Row 到 JSON

本文介绍了Spark Row 到 JSON的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从 Spark v.1.6(使用 scala)数据帧创建一个 JSON.我知道有一个简单的解决方案 df.toJSON.

I would like to create a JSON from a Spark v.1.6 (using scala) dataframe. I know that there is the simple solution of doing df.toJSON.

但是，我的问题看起来有点不同.例如，考虑具有以下列的数据框:

However, my problem looks a bit different. Consider for instance a dataframe with the following columns:

|  A  |     B     |  C1  |  C2  |    C3   |
-------------------------------------------
|  1  | test      |  ab  |  22  |  TRUE   |
|  2  | mytest    |  gh  |  17  |  FALSE  |

我希望最后有一个带有

|  A  |     B     |                        C                   |
----------------------------------------------------------------
|  1  | test      | { "c1" : "ab", "c2" : 22, "c3" : TRUE }    |
|  2  | mytest    | { "c1" : "gh", "c2" : 17, "c3" : FALSE }   |

其中 C 是包含 C1、C2、C3 的 JSON.不幸的是，我在编译时不知道数据框是什么样子(除了总是固定"的列 A 和 B).

where C is a JSON containing C1, C2, C3. Unfortunately, I at compile time I do not know what the dataframe looks like (except the columns A and B that are always "fixed").

至于我需要这个的原因:我使用 Protobuf 发送结果.不幸的是，我的数据框有时比预期的列多，我仍然会通过 Protobuf 发送这些列，但我不想在定义中指定所有列.

As for the reason why I need this: I am using Protobuf for sending around the results. Unfortunately, my dataframe sometimes has more columns than expected and I would still send those via Protobuf, but I do not want to specify all columns in the definition.

我怎样才能做到这一点?

How can I achieve this?

JSON

Spark Row 到 JSON

问题描述

推荐答案