本文介绍了使用StructType为Pyspark.sql设置架构时的语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我是新手,正在玩Pyspark.sql.根据pyspark.sql文档此处,可以像这样设置Spark数据帧和架构:
I am new to spark and was playing around with Pyspark.sql. According to the pyspark.sql documentation here, one can go about setting the Spark dataframe and schema like this:
spark= SparkSession.builder.getOrCreate()
from pyspark.sql.types import StringType, IntegerType,
StructType, StructField
rdd = sc.textFile('./some csv_to_play_around.csv'
schema = StructType([StructField('Name', StringType(), True),
StructField('DateTime', TimestampType(), True)
StructField('Age', IntegerType(), True)])
# create dataframe
df3 = sqlContext.createDataFrame(rdd, schema)
我的问题是,schema
在上面的schema
列表中代表什么?我似乎在文档中找不到它.预先感谢
My question is, what does the True
stand for in the schema
list above? I can't seem to find it in the documentation. Thanks in advance
推荐答案
这意味着该列是否允许空值,true
表示可为空,而false
表示不可为空
It means if the column allows null values, true
for nullable, and false
for not nullable
有关更多信息,请参考 Spark SQL和DataFrame指南.
Refer to Spark SQL and DataFrame Guide for more informations.
这篇关于使用StructType为Pyspark.sql设置架构时的语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!