本文介绍了使用StructType为Pyspark.sql设置架构时的语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新手,正在玩Pyspark.sql.根据pyspark.sql文档此处,可以像这样设置Spark数据帧和架构:

I am new to spark and was playing around with Pyspark.sql. According to the pyspark.sql documentation here, one can go about setting the Spark dataframe and schema like this:

spark= SparkSession.builder.getOrCreate()
from pyspark.sql.types import StringType, IntegerType, 
StructType, StructField

rdd = sc.textFile('./some csv_to_play_around.csv'

schema = StructType([StructField('Name', StringType(), True),
                     StructField('DateTime', TimestampType(), True)
                     StructField('Age', IntegerType(), True)])

# create dataframe
df3 = sqlContext.createDataFrame(rdd, schema)

我的问题是,schema在上面的schema列表中代表什么?我似乎在文档中找不到它.预先感谢

My question is, what does the True stand for in the schema list above? I can't seem to find it in the documentation. Thanks in advance

推荐答案

这意味着该列是否允许空值,true表示可为空,而false表示不可为空

It means if the column allows null values, true for nullable, and false for not nullable

有关更多信息,请参考 Spark SQL和DataFrame指南.

Refer to Spark SQL and DataFrame Guide for more informations.

这篇关于使用StructType为Pyspark.sql设置架构时的语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 10:13